ASymbolicExecutionFrameworkforJavaScriptPrateekSaxenaDevdattaAkhaweSteveHannaFengMaoStephenMcCamantDawnSongElectricalEngineeringandComputerSciencesUniversityofCaliforniaatBerkeleyTechnicalReportNo.
UCB/EECS-2010-26http://www.
eecs.
berkeley.
edu/Pubs/TechRpts/2010/EECS-2010-26.
htmlMarch8,2010Copyright2010,bytheauthor(s).
Allrightsreserved.
Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.
Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecificpermission.
ASymbolicExecutionFrameworkforJavaScriptPrateekSaxena,DevdattaAkhawe,SteveHanna,FengMao,StephenMcCamant,andDawnSongComputerScienceDivision,EECSDepartmentUniversityofCalifornia,Berkeley{prateeks,devdatta,sch,fmao,smcc,dawnsong}@cs.
berkeley.
eduAbstract—AsAJAXapplicationsgainpopularity,client-sideJavaScriptcodeisbecomingincreasinglycomplex.
However,fewautomatedvulnerabilityanalysistoolsforJavaScriptexist.
Inthispaper,wedescribetherstsystemforexploringtheexecutionspaceofJavaScriptcodeusingsymbolicexecution.
TohandleJavaScriptcode'scomplexuseofstringoperations,wedesignanewlanguageofstringconstraintsandimplementasolverforit.
Webuildanautomaticend-to-endtool,Kudzu,andapplyittotheproblemofndingclient-sidecodeinjectionvulnerabilities.
Inexperimentson18livewebapplications,Kudzuautomaticallydiscovers2previouslyunknownvulner-abilitiesand9morethatwerepreviouslyfoundonlywithamanually-constructedtestsuite.
Keywords-websecurity;symbolicexecution;stringdecisionproceduresI.
INTRODUCTIONRichwebapplicationshaveasignicantfractionoftheircodewritteninclient-sidescriptinglanguages,suchasJavaScript.
Asanincreasingfractionofcodeisfoundontheclient,client-sidesecurityvulnerabilities(suchasclient-sidecodeinjection[19],[25]–[27])arebecomingapromi-nentthreat.
However,amajorityoftheresearchonwebvulnerabilitiessofarhasfocusedonserver-sideapplicationcodewritteninPHPandJava.
Thereisagrowingneedforpowerfulanalysistoolsfortheclient-sidecomponentsofwebapplications.
Thispaperpresentsthersttechniquesandsystemforautomaticallyexploringtheexecutionspaceofclient-sideJavaScriptcode.
Toexplorethisexecutionspace,ourtechniquesgeneratenewinputstocoverapro-gram'svaluespaceusingdynamicsymbolicexecutionofJavaScript,andtocoveritseventspacebyautomaticGUIexploration.
DynamicsymbolicexecutionforJavaScripthasnumerousapplicationsinwebsecurity.
Inthispaperwefocusononeoftheseapplications:automaticallyndingclient-sidecodeinjectionvulnerabilities.
Aclient-sidecodeinjectionattackoccurswhenclient-sidecodepassesuntrustedinputtoadynamiccodeevaluationconstruct,withoutpropervalidationorsanitization,allowinganattackertoinjectJavaScriptcodethatrunswiththeprivilegesofawebapplication.
JavaScriptexecutionspaceexplorationischallengingformanyreasons.
Inparticular,JavaScriptapplicationsacceptmanykindsofinput,andthoseinputsarestructuredjustasstrings.
Forinstance,atypicalapplicationmighttakeuserinputfromformelds,messagesfromitsserverviaXMLHttpRequest,anddatafromcoderunningconcur-rentlyinotherbrowserwindows.
Eachkindofinputstringhasitsownformat,sodevelopersuseacombinationofcus-tomroutinesandthird-partylibrariestoparseandvalidatetheinputstheyreceive.
Toeffectivelyexploreaprogram'sexecutionspace,atoolmustbeabletosupplyvaluesforallofthesedifferentkindsofinputsandreasonabouthowtheyareparsedandvalidated.
Approach.
Inthispaper,wedeveloptherstcom-pletesymbolic-executionbasedframeworkforclient-sideJavaScriptcodeanalysis.
Webuildanautomated,stand-alonetoolthat,givenaURLforawebapplication,automat-icallygenerateshigh-coveragetestcasestosystematicallyexploreitsexecutionspace.
AutomaticallyreasoningabouttheoperationsweseeinrealJavaScriptapplicationsrequiresapowerfulconstraintsolver,especiallyforthetheoryofstrings.
However,thepowerneededtoexpressthesemanticsofJavaScriptoperationsisbeyondwhatexistingstringconstraintsolvers[14],[17]offer.
Asacentralcontributionofthiswork,weovercomethisdifcultybyproposingaconstraintlanguageandbuildingapracticalsolverthatsupportsthespecicationofboolean,machineinteger(bit-vector),andstringconstraints,includingregularexpressions,overmultiplevariable-lengthstringinputs.
Thislanguage'srichsupportforstringoperationsiscrucialforreasoningabouttheparsingandvalidationchecksthatJavaScriptapplicationsperform.
Toshowthepracticalityofourconstraintlanguage,wedetailatranslationfromthemostcommonlyusedJavaScriptstringoperationstoourconstraints.
Thistranslationalsoharnessesconcreteinformationfromadynamicexecutionoftheprograminawaythatallowstheanalysistoscale.
Weanalyzethetheoreticalexpressivenessofthetheoryofstringssupportedbyourlanguage(includingincomparisontoexistingconstraintsolvers),andbounditscomputationalcomplexity.
Wethengiveasoundandcompletedecisionprocedureforthebounded-lengthversionoftheconstraintlanguage.
Wedevelopanend-to-endsystem,calledKudzu,thatperformssymbolicexecutionwiththisconstraintsolveratitscore.
End-to-endsystem.
Weidentifyfurtherchallengesinbuild-inganend-to-endautomatedtoolforrichwebapplications.
Forinstance,becauseJavaScriptcodeinteractscloselywithauserinterface,itsinputspacecanbedividedintotwoclasses,theeventsspaceandthevaluespace.
Theformerincludesthestate(checkboxes,listselections)andsequenceofactionsofuser-interfaceelements,whilethelatterincludesthecontentsofexternalinputs.
Thesekindsofinputjointlydeterminethecode'sbehavior,buttheyaresuitedtodiffer-entexplorationtechniques.
KudzuusesGUIexplorationtoexploretheeventspace,andsymbolicexecutiontoexplorethevaluespace.
WeevaluateKudzu'send-to-endeffectivenessbyapplyingittoacollectionof18JavaScriptapplications.
TheresultsshowthatKudzuiseffectiveatgettinggoodcoveragebydiscoveringnewexecutionpaths,anditautomaticallydiscovers2previously-unknownvulnerabilities,aswellas9client-sidecodeinjectionvulnerabilitiesthatwerepreviouslyfoundonlywithamanually-createdtestsuite.
Contributions.
Insummary,thispapermakesthefollowingmaincontributions:Weidentifythelimitationsofpreviousstringconstraintlanguagesthatmaketheminsufcientforparsing-heavyJavaScriptcode,anddesignanewconstraintlanguagetoresolvethoselimitations.
(SectionIV)Wedesignandimplementapracticaldecisionproce-dureforthisconstraintlanguage.
(SectionV)WebuildtherstsymbolicexecutionengineforJavaScript,usingourconstraintsolver.
(SectionsIIIandVI)CombiningsymbolicexecutionofJavaScriptwithau-tomaticGUIexplorationandotherneededcomponents,webuildtherstend-to-endautomatedsystemforexplorationofclient-sideJavaScript.
(SectionIII)Wedemonstratethepracticaluseofourimplementationbyapplyingittoautomaticallydiscovering11client-sidecodeinjectionvulnerabilities,includingtwothatwerepreviouslyunknown.
(SectionVII)II.
PROBLEMSTATEMENTANDOVERVIEWInthissectionwestatetheproblemwefocuson,exploringtheexecutionspaceofJavaScriptapplications;describeoneofitsapplications,ndingclient-sidecodeinjectionvulnerabilities;andgiveanoverviewofourapproach.
Problemstatement.
Wedeveloptechniquestosystemati-callyexploretheexecutionspaceofJavaScriptapplicationcode.
JavaScriptapplicationsoftentakemanykindsofinput.
WeviewtheinputspaceofaJavaScriptprogramassplitintotwocategories:theeventspaceandthevaluespace.
Eventspace.
RichwebapplicationstypicallydenetenstohundredsofJavaScripteventhandlers,whichmayexecuteinanyorderasaresultofuseractionssuchasclickingbuttonsorsubmittingforms.
EventhandlercodemaycheckthestateofGUIelements(suchascheck-boxesorselectionlists).
TheorderingofeventsandthestateoftheGUIelementstogetheraffectsthebehavioroftheapplicationcode.
Valuespace.
Thevaluesofinputssuppliedtoaprogramalsodetermineitsbehavior.
JavaScripthasnumerousinterfacesthroughwhichinputisreceived:–Userdata.
Formelds,textareas,andsoon.
–URLandcross-windowcommunicationabstrac-tions.
WebprincipalshostedinotherwindowsorframescancommunicatewithJavaScriptcodeviainter-framecommunicationabstractionssuchasURLfragmentidentiersandHTML5'sproposedpostMessage,orviaURLparameters.
–HTTPchannels.
Client-sideJavaScriptcodecanexchangedatawithitsoriginatingwebserverusingXMLHttpRequest,HTTPcookies,oradditionalHTTPGETorPOSTrequests.
Thispaperprimarilyfocusesontechniquestosystemat-icallyexplorethevaluespaceusingsymbolicexecutionofJavaScript,withthegoalofgeneratinginputsthatexercisenewprogrampaths.
However,automaticallyexploringtheeventspaceisalsorequiredtoachievegoodcoverage.
Todemonstratetheefcacyofourtechniquesinanend-to-endsystem,wecombinesymbolicexecutionofJavaScriptforthevaluespacewithaGUIexplorationtechniquefortheeventspace.
Thisfullsystemisabletoautomaticallyexplorethecombinedinputspaceofclient-sidewebapplicationcode.
Application:ndingclient-sidecodeinjectionvulnerabil-ities.
Exploringaprogram'sexecutionspacehasanumberofapplicationsinthesecurityofclient-sidewebapplications.
Inthispaper,wefocusspecicallyononesecurityapplica-tion,ndingclient-sidecodeinjectionvulnerabilities.
Client-sidecodeinjectionattacks,whicharesometimesreferredtoasDOM-basedXSS,occurwhenclient-sidecodeusesuntrustedinputdataindynamiccodeevaluationconstructswithoutsufcientvalidation.
LikereectedorstoredXSSattacks,client-sidecodeinjectionvulnerabilitiescanbeusedtoinjectscriptcodechosenbyanattacker,givingtheattackerthefullprivilegesofthewebapplication.
Wecalltheprograminputthatsuppliesthedataforanattacktheuntrustedsource,andthepotentiallyvulnerablecodeevaluationconstructthecriticalsink.
Examplesofcriticalsinksincludeeval,andHTMLcreationinterfaceslikedocument.
writeand.
innerHTML.
Inourthreatmodel,wetreatallURLsandcross-windowcommunicationabstractionsasuntrustedsources,assuchin-putsmaybecontrolledbyanuntrustedwebprincipal.
Inad-dition,wealsotreatuserdataasanuntrustedsourcebecauseweaimtondcaseswhereuserdatamaybeinterpretedascode.
Theseverityofattacksfromuser-dataonclient-sideisoftenlessseverethanaremoteXSSattack,butdeveloperstendtoxtheseandKudzutakesaconservativeapproachofreportingthem.
HTTPchannelssuchasXMLHttpRequestarecurrentlyrestrictedtocommunicatingwithawebserverfromthesamedomainastheclientapplication,sowedonottreatthemasuntrustedsources.
DevelopersmaywishtotreatHTTPchannelsasuntrustedinthefuturewhendeterminingsusceptibilitytocross-channelscriptingattacks[5],orwhenenhancedabstractions(suchastheproposedcross-originXMLHttpRequest[29])allowcross-domainHTTPcommunicationdirectlyfromJavaScript.
ToeffectivelyndXSSvulnerabilities,werequiretwocapabilities:(a)generatingdirectedtestcasesthatexploretheexecutionspaceoftheprogram,and(b)checking,onagivenexecutionpath,whethertheprogramvalidatesalluntrusteddatasufcientlybeforeusingitinacriticalsink.
CustomvalidationchecksandparsingroutinesarethenormratherthantheexceptioninJavaScriptapplications,soourtoolmustcheckthebehaviorofvalidationratherthansimplyconrmingthatitisperformed.
Inpreviouswork,wedevelopedatoolcalledFLAXwhichemploystaint-guidedfuzzingforndingclient-sidecodeinjectionattacks[26].
However,FLAXreliesonanexternal,manuallydevelopedtestharnesstoexplorethepathspace.
Kudzu,incontrast,automaticallygeneratesatestsuitethatexplorestheexecutionspacesystematically.
Kudzualsousessymbolicreasoning(withitsconstraintsolver)tocheckifthevalidationlogicemployedbytheapplicationissufcienttoblockmaliciousinputs—thisisaone-stepmechanismfordirectedexploitgenerationasopposedtomultipleroundsofundirectedfuzzingemployedinFLAX.
StaticanalysistechniqueshavealsobeenemployedforJavaScript[12]toreasonaboutmultiplepaths,butcansufferfromfalsepositivesanddonotproducetestinputsorattackinstances.
Symbolicanalysesandmodel-checkinghavebeenusedforserver-sidecode[2],[20];however,thecomplexityofpathconditionsweobserverequiresmoreexpressivesymbolicreasoningthansupportedbytoolsforserver-sidecode.
ApproachOverview.
Thevaluespaceandeventspaceofawebapplicationaretwodifferentcomponentsofitsinputspace:codereachablebyexploringonepartoftheinputspacemaynotbereachablebyexploringtheothercom-ponentalone.
Forinstance,exploringtheGUIeventspaceresultsindiscoveringnewviewsofthewebapplication,butthisdoesnotdirectlyaffectthecoveragethatcanbeachievedbysystematicallyexploringallthepathsinthecodeimple-mentingeachview.
Conversely,maximizingpathcoverageisunlikelytodiscoverfunctionalityoftheapplicationthatonlyhappenswhentheuserexploresadifferentapplicationview.
Therefore,Kudzuemploysdifferenttechniquestoexploreeachpartoftheinputspaceindependently.
Valuespaceexploration.
Tosystematicallyexplorediffer-entexecutionpaths,wedevelopacomponentthatperformsdynamicsymbolicexecutionofJavaScriptcode,andanewconstraintsolverthatoffersthedesiredexpressivenessforautomaticsymbolicreasoning.
Indynamicsymbolicexecution,certaininputsaretreatedassymbolicvariables.
Dynamicsymbolicexecutiondiffersfromnormalexecutioninthatwhilemanyvariablehavetheirusual(concrete)values,like5foranintegervariable,thevaluesofothervariableswhichdependonsymbolicinputsarerepresentedbysymbolicformulasoverthesymbolicinputs,likeinput1+5.
WheneveranyoftheoperandsofaJavaScriptoperationissymbolic,theoperationissimulatedbycreatingaformulafortheresultoftheoperationintermsoftheformulasfortheoperands.
Whenasymbolicvaluepropagatestotheconditionofabranch,Kudzucanuseitsconstraintsolvertosearchforaninputtotheprogramthatwouldcausethebranchtomaketheoppositechoice.
Eventspaceexploration.
AsacomponentofKudzuwedevelopaGUIexplorerthatsearchesthespaceofalleventsequencesusingarandomexplorationstrategy.
Kudzu'sGUIexplorercomponentrandomlyselectsanorderingamongtheusereventsregisteredbythewebpage,andautomaticallyrestheseeventsusinganinstrumentedversionofthewebbrowser.
Kudzualsohasaninput-feedbackcomponentthatcanreplaythesequenceofGUIeventsexploredinanygivenrun,alongwithfeedingnewvaluesgeneratedbytheconstraintsolvertotheapplication'sdatainputs.
Testingforclient-sidecodeinjectionvulnerabilities.
Foreachinputexplored,Kudzudetermineswhetherthereisaowofdatafromanuntrusteddatasourcetoacriticalsink.
Ifitndsone,itseekstodeterminewhethertheprogramsanitizesand/orvalidatestheinputcorrectlytopreventattackersfrominjectingdangerouselementsintothecriticalsink.
Specically,itattemptstoprovethatthevalidationisinsufcientbyconstructinganattackinput.
AswewilldescribeinmoredetailinSectionIII-B,itcombinestheresultsofsymbolicexecutionwithaspecicationforattackstocreateaconstraintsolverquery.
Iftheconstraintsolverndsasolutiontothequery,itrepresentsanattackthatcanreachthecriticalsinkandexploitaclient-sidecodeinjectionvulnerability.
III.
END-TO-ENDSYSTEMDESIGNThissectiondescribesthevariouscomponentsthatworktogethertomakeacompleteKudzu-basedvulnerability-discoverysystemwork.
ThefullexplanationoftheconstraintsolverisinSectionsIVthroughVI.
Forreference,therelationshipsbetweenthecomponentsaresummarizedinFigure1.
A.
SystemComponentsFirst,wediscussthecorecomponentsthatwouldbeusedinanyapplicationofKudzu:theGUIexplorerthatgeneratesinputeventstoexploretheeventspace,thedynamicsymbolicinterpreterthatperformssymbolicexecutionofJavaScript,thepathconstraintextractorthatbuildsqueriesbasedontheresultsofsymbolicexecution,theconstraintsolverthatndssatisfyingassignmentstothosequeries,andtheinputFigure1:ArchitecturediagramforKudzu.
Thecomponentsdrawninthedashedboxperformfunctionsspecictoourapplicationofndingclient-sidecodeinjection.
Theremainingcomponentsareapplication-agnostic.
Componentsshadedinlightgrayarethecorecontributionofthispaper.
feedbackcomponentthatusestheresultsfromtheconstraintsolverasnewprograminputs.
TheGUIexplorer.
TherststepinautomatingJavaScriptapplicationanalysisisexploringtheeventspaceofuserinteractions.
Eacheventcorrespondstoauserinteractionsuchasclickingacheck-boxorabutton,settingfocusonaeld,addingdatatodataelds,clickingalink,andsoon.
Kudzucurrentlyexploresthespaceofallse-quencesofeventsusingarandomexplorationstrategy.
OneofthechallengesistocomprehensivelydetectalleventsthatcouldresultinJavaScriptcodeexecution.
Toaddressthis,KudzuinstrumentsthebrowserfunctionsthatprocessHTMLelementsonthecurrentwebpagetorecordwhenaneventhandleriscreatedordestroyed.
Kudzu'sGUIexplorercomponentrandomlyselectsanorderingamongtheusereventsregisteredbythewebpageandexecutesthem1.
Therandomseedcanbecontrolledtoreplaythesameorderingofevents.
Whileinvokinghandlers,theGUIcomponentalsogenerates(benign)randomteststringstolltextelds.
(Later,symbolicexecutionwillgeneratenewinputvaluesfortheseeldstoexploretheinputspacefurther.
)Linksthatnavigatethepageawayfromtheapplication'sdomainarecancelled,therebyconstrainingthetestingtoasingleapplicationdomainatatime.
Inthefuture,weplantoinvestigatealternativestrategiestoprioritizetheexecutionofeventsdiscoveredaswell.
Dynamicsymbolicinterpreter.
Kudzuperformsdynamicsymbolicexecutionbyrstrecordinganexecutionoftheprogramwithconcreteinputs,andthensymbolicallyin-terpretingtherecordedexecutioninadynamicsymbolic1Invokinganeventhandlermayinvalidateanotherhandler(forinstance,whenthepagenavigatesasaresult).
Inthatcase,theinvalidatedhandlersareignoredandifnewhandlersarecreatedbytheeventthatcausesinvalidation,theseeventsareexploredsubsequently.
interpreter.
Forrecordinganexecutiontrace,Kudzuemploysanexistinginstrumentationcomponent[26]implementedinthewebbrowser'sJavaScriptinterpreter.
ForeachJavaScriptbytecodeinstructionexecuted,itrecordsthesemanticsoftheoperation,itsoperandsandoperandvaluesinasim-pliedintermediatelanguagecalledJASIL[26].
ThesetofJavaScriptoperationscapturedincludesalloperationsonintegers,booleans,strings,arrays,aswellascontrol-owdecisions,objecttypes,andcallstobrowser-nativemethods.
Forthesecondstep,dynamicsymbolicexecution,wehavedevelopedfromscratchasymbolicinterpreterfortherecordedJASILinstructions.
SymbolicinputsforKudzuarecongurabletomatchtheneedsofanapplication.
Forinstance,intheapplicationweconsider,detectingclient-sidecodeinjection,allURLdata,datareceivedovercross-windowcommunicationabstrac-tions,anduserdataeldsaremarkedsymbolic.
Symbolicinputsmaybestrings,integers,orbooleans.
Symbolicexecu-tionproceedsontheJASILinstructionsintheordertheyarerecordedintheexecutiontrace.
Atanypointduringdynamicsymbolicexecution,agivenoperandiseithersymbolicorconcrete.
Iftheoperandissymbolic,itisassociatedwithasymbolicvalue;otherwise,itsvalueispurelyconcreteandisstoredinthedynamicexecutiontrace.
WheninterpretingaJASILoperationinthedynamicsymbolicinterpreter,theoperationissymbolicallyexecutedifoneormoreofitsinputoperandsissymbolic.
OtherwisetheoperationofthesymbolicinterpreteronconcretevalueswouldbeexactlythesameastherealJavaScriptinterpreter,sowesimplyreusetheconcreteresultsalreadystoredintheexecutiontrace.
Thesymbolicvalueofanoperandisaformulathatrepresentsitscomputationfromthesymbolicinputs.
Forinstance,fortheJASILassignmentoperationy:=x,ifxissymbolic(say,withthevalueinput1+5),thensymbolicexecutionoftheoperationcopiesthisvaluetoy,givingythesamesymbolicvalue.
Foranarithmeticoperation,sayy:=x1+x2wherex1issymbolic(saywithvalueinput2+3)andx2isnot(saywiththeconcretevalue7),thesymbolicvalueforyistheformularepresentingthesum(input2+10).
Operationsoverstringsandbooleansaretreatedinthesameway,generatingformulasthatinvolvestringoperationslikematchandbooleanoperationslikeand.
Atthispoint,stringoperationsaretreatedsimplyasuninterpretedfunctions.
Duringthesymbolicexecution,wheneverthesymbolicinterpreterencountersanoperationoutsidethesupportedformulagrammar,itforcesthedesti-nationoperandtobeconcrete.
Forinstance,iftheoperationisx=parseFloat(s)forasymbolicstrings,xandscanbereplacedwiththeirconcretevaluesfromthetrace(say,4.
3and"4.
3").
Thisallowssymboliccomputationtocontinueforothervaluesintheexecution.
Pathconstraintextractor.
Theexecutiontracerecordseachcontrol-owbranch(e.
g.
,ifstatement)encounteredduringexecution,alongwiththeconcretevalue(trueorfalse)representingwhetherthebranchwastaken.
Duringsymbolicexecution,thecorrespondingbranchconditionisrecordedbythepathconstraintextractorifitissymbolic.
Asexecutioncontinues,theformulaformedbyconjoiningthesymbolicbranchconditions(negatingtheconditionsofbranchesthatwerenottaken)iscalledthepathconstraint.
Ifaninputvaluesatisesthepathconstraint,thenanexecutionoftheprogramonthatinputwillfollowthesameexecutionpath.
Toexploreadifferentexecutionpath,Kudzuselectsabranchontheexecutionpathandbuildsamodiedpathconstraintthatisthesameuptothatbranch,butthathasthenegationofthatbranchcondition(laterconditionsfromtheoriginalbranchareomitted).
Aninputthatsatisesthisconditionwillexecutealongthesamepathuptotheselectedbranch,andthenexploretheoppositealternative.
Thereareseveralstrategiesforpickingtheorderinwhichbranchconditionscanbenegated—Kudzucurrentlyusesagenerationalsearchstrategy[11].
Constraintsolver.
Mostsymbolicexecutiontoolsinthepasthavereliedonanexistingconstraintsolver.
However,Kudzugeneratesarichsetofconstraintsoverstring,integerandbooleanvariablesforwhichexistingoff-the-shelfsolversarenotpowerfulenough.
Therefore,wehavebuiltanewsolverforourconstraints(wepresentthealgorithmanddesigndetailsinSectionV).
Indesigningthiscomponent,weexaminedthesymbolicconstraintsKudzugeneratesinpractice.
Fromthestringconstraintsarisinginthese,wedistilledasetofprimitiveoperationsrequiredinacoreconstraintlanguage.
(ThiscorelanguageisdetailedinSectionIV,whilethesolver'sfullinterfaceisgiveninSectionVI.
)WejustifyourintuitionthatsolvingthecoreconstraintsissufcienttomodelJavaScriptstringoperationsinSectionVI,whereweshowapracticaltranslationofJavaScriptstringoperationsintoourconstraintlanguage.
Inputfeedback.
Solvingthepathconstraintformulausingthesolvercreatesanewinputthatexploresanewprogrampath.
ThesenewlygeneratedinputsmustbefedbacktotheJavaScriptprogram:forinstancesimulateduserinputsmustgointheirtextelds,andGUIeventsshouldbereplayedinthesamesequenceasontheoriginalrun.
Theinputfeedbackcomponentisresponsibleforthistask.
AsaparticularHTMLelement(e.
gatexteld)inadocumentislikelyallocatedadifferentmemoryaddressoneveryexecution,theinputfeedbackcomponentusesXPath[31]andDOMidentierstouniquelyidentifyHTMLelementsacrossexecutionsandfeedappropriatevaluesintothem.
IfaninputcomesfromanattributeforaDOMobject,theinputfeedbackcomponentsetsthatattributewhentheobjectiscreated.
Iftheinputcomesviaapropertyofaneventthatisgeneratedbythebrowserwhenhandlingcross-windowcommunication,suchastheoriginanddatapropertiesofapostMessageevent,thecomponentupdatesthatpropertywhentheJavaScriptengineaccessesit.
Kudzuinstrumentsthewebbrowsertodeterminethecontextofaccesses,todistinguishbetweenaccessescomingfromtheJavaScriptengineandaccessescomingfromthebrowsercoreorinstrumentationcode.
B.
Application-speciccomponentsNext,wediscussthreecomponentsthatarespecializedforthetaskofndingclient-sidecodeinjectionvulnerabilities:asink-sourceidenticationcomponentthatdetermineswhichcriticalsinksmightreceiveuntrustedinput,avulnerabilityconditionextractorthatcapturesdomainknowledgeaboutclient-sidecodeinjectionattacks,andtheattackvericationcomponentthatcheckswhetherinputsgeneratedbythetoolinfactrepresentexploits.
Sink-sourceidentication.
Toidentifyifexternalinputsareusedincriticalsinkoperationssuchasevalordocument.
write,weperformadynamicdataowanal-ysisontheexecutiontrace.
Asoutlinedearlier,wetreatallURLdata,datareceivedovercross-windowcommunicationabstractions(suchaspostMessage),anddatalledintouserdataeldsaspotentiallyuntrusted.
Thedataowanalysisissimilartoadynamictaintanalysis.
Anyexecutiontracethatrevealsaowofdatatoacriticalsinkissubjecttofurthersymbolicanalysisforexploitgeneration.
Weuseanexistingframework,FLAX,forthisinstrumentationandtaint-tracking[26]inamannerthatisfaithfultotheimplementationofJavaScriptintheWebKitinterpreter.
Vulnerabilityconditionextractor.
Aninputrepresentsanattackagainstaprogramifitpassestheprogram'svalidationchecks,butnonethelessimplementstheattacker'sgoals(i.
e.
,causesaclient-sidecodeinjectionattack)whenitreachesacriticalsink.
Thevulnerabilityconditionextractorcollectsfromthesymbolicinterpreteraformularepresentingthe(possiblytransformed)valueusedatacriticalsink,andcombinesitwiththepathconstrainttocreateaformuladescribingtheprogram'svalidationoftheinput.
2.
Tode-terminewhetherthisvalueconstitutesanattack,thevulner-abilityconditionextractorappliesasink-specicvulnera-bilityconditionspecication,whichisa(regular)grammarencodingasetofstringsthatwouldconstituteanattackagainstaparticularsink.
Thisspecicationisconjoinedwiththeformularepresentingthetransformedinputtocreateaformularepresentingvaluesthatarestilldangerousafterthetransformation.
Forinstance,inthecaseoftheevalsink,thevulnera-bilityspecicationassertsthatavalidattackshouldbezeroormorestatementseachterminatedbya';',followedbythepayload.
Suchgrammarscanbeconstructedbyusingpubliclyavailableattackpatterns[13].
Thetool'sattackgrammarsarecurrentlysimpleandcanbeextendedeasilyforcomprehensivenessandtoincorporatenewattacks.
Tosearchonlyforrealisticattacks,thespecicationalsoincorporatesdomainknowledgeaboutthepossiblevaluesofcertaininputs.
Forinstance,whenastringvariablecor-respondstothewebURLfortheapplication,weassertthatthestringstartswiththesamedomainastheapplication.
Attackverication.
Kudzuautomaticallyteststheexploitinstancebyfeedingtheinputbacktotheapplication,andcheckingiftheattackpayload(suchasascriptwithanalertmessage)isexecuted.
Ifthisvericationfails,Kudzudoesnotreportanalarm.
IV.
CORECONSTRAINTLANGUAGEInordertosupportarichlanguageofinputconstraintswithasimplesolvingbackend,wehavedesignedanintermediateformwecallthecoreconstraintlanguage.
ThislanguageisrichenoughtoexpressconstraintsfromJavaScriptprograms,butsimpleenoughtomakesolvingtheconstraintsefcient.
Inthissectionwedenetheconstraintlanguage,analyzeitsexpressivenessandthetheoreticalcomplexityofdecidingit,andcompareitsexpressivenesstothecorelanguagesofprevioussolvers.
A.
LanguageDenitionTheabstractsyntaxforourcoreconstraintlanguageisshowninFigure2.
Aformulainthelanguageisanar-bitrarybooleancombinationofconstraints.
Variableswhichrepresentstringsmayappearinvetypesofconstraints.
Therstthreeconstrainttypesindicatethatastringisamemberofthelanguagedenedbyaregularexpression,thattwostringsareequal,oronestringisequaltotheconcatenationoftwootherstrings.
Thetworemainingconstraintsrelate2Validationandsanitizationforcriticalclient-sidesinkoperationsmayhappenontheserverside(whendataissentbackviaXMLHttpRequest).
Ourimplementationdealswiththisbyrecognizingsuchtransformationsusingapproximatetaintingtechniques[26]acrossthedatasentandreceivedoverXMLHttpRequests.
Formula::=Formula|Formula∧Formula|ConstraintConstraint::=Var∈RegExp|Var=Var|Var=VarVar|length(Var)RelNumber|length(Var)Rellength(Var)RegExp::=Character||RegExpRegExp|RegExp|RegExp|RexExp*RelFigure2:Abstractgrammarofthecoreconstraintlanguage.
thelengthofonestringtoaconstantnaturalnumber,ortothelengthofanotherstring,byanyoftheusualequalityororderingoperations.
Regularexpressionsareformedfromcharactersortheemptystring(denotedby)viatheusualoperationsofconcatenation(representedbyjuxtaposition),alternation(|),andrepetitionzeroormoretimes(Kleenestar*).
Theconstraintsallhavetheirusualmeanings.
Anynumberofvariablesmaybeintroduced,andCharactersaredrawnfromanarbitrarynon-emptyalphabet,butNumbersmustbenon-negativeintegers.
Forpresentpurposes,stringsmaybeofunboundedlength,thoughwewillintroduceupperboundsontheirlengthslater.
B.
ExpressivenessandComplexityThoughthecoreconstraintlanguageisintentionallysmall,itisnotminimal:sometypesofconstraintsareincludedfortheconvenienceoftranslatingtoandfromthecorelanguage,butdonotfundamentallyincreaseitsexpressiveness.
Stringequality,comparisonsbetweenlengthsandconstants,andinequalitycomparisonsbetweenlengthscanbeexpressedusingconcatenation,regularexpressions,andequalitybe-tweenstringlengthsrespectively;thedetailsareomittedforspace.
Eachoftheremainingconstrainttypes(regularexpres-sionmembership,concatenation,andlength)makesitsowncontributiontotheexpressivenessofthecoreconstraints.
AppendixAgivesexamplesofthesetsofstringsthateachconstrainttypecanuniquelydene.
Thecoreconstraintlan-guageisexpressiveenoughthatthecomplexityofdecidingitisnotknownprecisely;itisatleastPSPACE-hard.
TheserelationshipsaresummarizedinFigure3.
ThecomplexityofourcoreconstraintlanguagefallstoNP-completewhenthelengthsofstringvariablesarebounded,astheyareinourimplementation.
FurtherdetailsareinAppendixB.
Figure3:Relationsbetweentheunboundedversionsofseveraltheoriesofstrings.
Theorieshigherinthegrapharestrictlymoreexpressivebutarealsoatleastascomplextodecide.
Kudzu'scoreconstraintlanguage(shaded)isstrictlymoreexpressivethaneitherthecorelanguageofHAMPI[17]orthetheoryofwordequationsandanequallengthpredicate(the"purelibrarylanguage"of[4]).
C.
ExpressivenessComparisonOursystem'scoreconstraintlanguageismoreexpressivethantheconstraintsusedinsimilarprevioussystems,andthisexpressivenessiskeyinallowingittohandleamorecomplexclassofapplications.
Bjrneretal.
[4]presenta"purelibrarylanguage"that,likeourcoreconstraintlanguage,includeswordequationsandtheabilitytoassertthattwostringshavethesamelength,solikeourlanguageitsdecidabilityisopen.
However,theirlanguagedoesnotincluderegularexpressions.
Regularexpressionsmaybelesscommoninthe.
NETapplicationsBjrneretal.
study,buttheyareusedubiquitouslyinJavaScript,soregularexpressionsupportismandatoryinourdomain.
SimilarlytheworkofCaballeroetal.
[3],[7]dealswithprogramscompiledfromC-familylanguages,whosestringoperationsaremuchmorelimited.
TheDPRLEtool[14]focusesonaclassofconstraintsthatcombineconcatenationandregularexpressionmatching,butinamorelimitedwaythanourtoolsupports.
DPRLEad-dressesadifferentproblemdomain,sinceitgivessolutionsforconstraintsoverlanguages(potentiallyinnitesetsofstrings)ratherthansinglestrings,butthismakesthetasksignicantlymoredifcult.
WewereunabletoexpresstheconstraintsfromourapplicationinDPRLE'sinputformatoranystraightforwardextensionofit.
Forinstance,thereisnowaytoexpresstheconstraintthattwolanguageexpressionsshouldbeequal,notsurprisingsincesuchconstraintsingeneralareundecidable[8].
HAMPI[17]providessupportforregularexpressionconstraints(andinfactwebuildonitsimplementationforthisfeature),butitssupportforotherconstraintsislimited,particularlybythefactthatitsupportsonlyasinglestringvariable.
Thevariablecanbeconcatenatedwithconstantstrings,butthesestringexpressionscannotbecomparedwitheachother,onlywithregularexpressions,soHAMPIlacksthefullgeneralityofwordequations.
Forinstance,HAMPIconstraintscannotdenetheset{uv#u#v:u,v∈{0,1}}.
Itisworthreemphasizingthattheselimitationsarenotjusttheoretical:theymaketheseprevioussystemsunsuitableforourapplications.
Oneofthemostcommonoperationsintheprogramsweexamineistoparseasingleinputstring(suchasaURL)intoseparateinputvariablesusingsplitorrepeatedregularexpressionmatching.
Representingthesemanticsofsuchanoperationrequiresrelatingthecontentsofonestringvariabletothoseofanother,somethingthatneitherDPRLEnorHAMPIsupports.
V.
CORECONSTRAINTSOLVINGAPPROACHInthissection,weexplainouralgorithmforsolvingthecoresetofconstraints.
Weintroduceaboundedversionoftheconstraintswhereweassumeauser-suppliedupperboundkonthelengthofthevariables.
ThisallowsustoemployaSAT-basedsolutionstrategywithoutreducingthepracticalexpressivenessofthelanguage.
Thealgorithmsatisesthreeimportantproperties,whoseproofappearsinAppendixC:1)Soundness.
Wheneverthealgorithmterminateswithanassignmentofvaluestostringvariables,thesolutionisconsistentwiththeinputconstraints.
2)Bound-kcompleteness.
Ifthereexistsasolutionforthestringvariableswhereallstringshavelengthkorless,thenouralgorithmndsonesuchsolution.
3)Termination.
Thealgorithmrequiresonlyanitenum-berofsteps(afunctionofthebound)foranyinput.
Thesolvertranslatesconstraintsonthecontentsofstringsintobit-vectorconstraintsthatcanbesolvedwithaSAT-basedSMTsolver.
Forthispurpose,thesolvertranslateseachinputstringintoasequenceofn-bitintegers(n=8inthecurrentimplementation).
EachstringvariableSalsohasanassociatedintegervariableLSrepresentingitslength.
Asinglestringisconvertedtoabit-vectorbyconcatenatingthebinaryrepresentationsofeachcharacter.
Then,thebit-vectorsrepresentingeachstringarethemselvesconcatenatedintoasinglelongbit-vector.
(Theorderinwhichthestringsareconcatenatedintothelongvectorreectstheconcatenationconstraints,asdetailedinstep1below.
)ThesolverpassestheconstraintsoverthisbitvectortoaSMT(satisabilitymodulotheories)decisionprocedureforthetheoryofbitvectors,STP[10]inourimplementation.
Informally,itisconvenienttorefertothecombinedbitvectorasifitwereanarrayindexedbycharacteroffsets,butwedonotuseSTP'stheoryofarrays,andcharacteroffsetsaremultipliedbyntogivebitoffsetsbeforeproducingthenalconstraints.
Input:C:constraintlistOutput:(IsSat:bool,Solutions:stringlist)G←BuildConcatGraph(C);(C′,StrOrderMap)←DecideOrder(G);C←C∪C′;FailLenDB:lengthassignmentlist←;whiletruedo(X,lengths)←SolveLengths(C,FailLenDB);if(X=UNSAT)thenprint"Unsatisfiable";halt(false,);endFinal:bitvectorconstraints;Final←CreateBVConstraints(StrOrderMap,C,lengths);(Result,BVSolutions)←BVSolver(Final);if(Result=SAT)thenprint"Satisfiable";printSolutions(BVSolutions,lengths,StrOrderMap);halt(true,BVsToStrings(BVSolutions));endelseFailLenDB←FailLenDB∪lengths;endendFigure4:Algorithmforsolvingthecoreconstraints.
OuralgorithmisshowninFigure4.
Atahighlevel,ithasthreesteps.
First,ittranslatesstringconcatenationconstraintsintoalayoutofvariables(withoverlap)inthenalcharacterarraymentionedabove.
Second,itextractsintegerconstraintsonthelengthsofstringsandndsasatisfyinglengthassignmentusingtheSMTsolver.
Finally,givenapositionandlengthforeachstring,thesolvertranslatestheconstraintsonthecontentsofeachstringintobit-vectorconstraintsandchecksiftheyaresatisable.
Ingeneral,becauseoftheinteractionoflengthconstraintsandregularexpressions,thelengthassignmentchoseninstep2mightnotcorrespondtosatisablecontentsconstraints,evenwhenadifferentlengthassignmentwould.
Soifstep3failstondasatisfyingassignment,thealgorithmreturnstostep2togenerateanewlengthassignment(distinctfromanytriedpreviously).
Steps2and3repeatuntilthesolverndsasatisfyingassignment,orithastriedallpossiblelengthassignments(uptothelengthboundk).
Step1:Translatingconcatenationconstraints.
.
Theintu-itionbehindKudzu'shandlingofconcatenationconstraintsisthatforaconstraintS1=S2S3,itwouldbesufcienttoensurethatS2comesimmediatelybeforeS3inthenalcharacterarray,andtolayoutS1asoverlappingwithS2andS3(sothatS1beginsatthesamecharacterasS2andendsatthesamecharacterasS3).
Thisoverlappinglayoutalsohastheadvantageofreducingthetotallengthofbit-vectorsrequired.
Eachconcatenationconstraintsuggestsanorderingrelationamongthestringvariables,butitmightnotbepossibletosatisfyallsuchorderingconstraintssimultaneously.
Tosystematicallychooseanordering,thesolverbuildsS1S2S3S4S5S6S7LLLRRR(1,1)(2,2)(3,3)(4,4)(3,4)(2,3)(1,3)S1=S2.
S3S3=S4.
S5S6=S5.
S7INPUTCONCATCONSTRAINTSFigure5:Asampleconcatgraphforasetofconcatenationconstraints.
Therelativeorderingofthestringsinthenalcharacterarrayisshownasstartandendpositionsinparenthesesalongsideeachnode.
agraphofconcatenationconstraints(aconcatgraphforshort).
Thegraphhasanodeforeachstringvariable,andforeachconstraintS1=S2S3,S2andS3aretheleftandrightchildren(respectively)ofS1.
AnexampleofsuchagraphisshowninFigure5.
Withoutlossofgenerality,wecanassumethatthegraphisacyclic:ifthereisacyclefromS1toS2toS3.
.
.
backtoS1,thenS1=S2S3S1(orsomeotherorder),soallthevariablesotherthanS1mustbetheemptystring,andcanberemovedfromtheconstraints.
(Inourapplicationstheconstraintswillinanycasebeacyclicbyconstruction.
)Giventhisgraph,thealgorithmthenchoosestherelativeorderingofthestringsinthecharacterarraybyassigningstartandendpositionstoeachnodewithapost-ordertraversalofthegraph.
(InFigure5,thesepositionsareshowninparenthesesnexttoeachnode.
)Forthelayoutgeneratedbythealgorithmtobecorrect,theconcatgraphmustbeaDAGinwhicheachinternalnodehasexactlytwochildren,andthosechildrenareadjacentinthelayout.
(Thisimpliesthatthegraphisplanar.
)Thegraphmaynothavethesepropertiesatconstruction;forinstance,Figure6givesasetofexampleconstraintswithcontradictoryorderingrequirements:S2cannotbesimulta-neouslytotheleftandtotherightofS3.
Thealgorithmresolvessuchrequirementsbyduplicatingasubtreeofthegraph(forinstanceasshownintherighthalfofFigure6).
Tomaintainthecorrectsemantics,thealgorithmaddsstringequalityconstraintstoensurethatanyduplicatedstringshavethesamecontentsastheoriginals.
Thealgorithmperformsduplicationstoensurethatthegraphsatisesthecorrectnessinvariant,butourcurrentalgorithmdoesnotattempttoperformtheminimalnumberofcopies(forinstance,inFigure6itwouldsufcetocopyeitheronlyS2oronlyS3),whichinourexperiencehasnothurtthesolver'sperformance.
Step2:Findingasatisablelengthassignment.
EachstringvariableShasanassociatedlengthvariableLS.
Eachcorestringconstraintimpliesacorrespondingconstraintonthelengthsofthestrings,asdetailedinTableI.
Fortheregularexpressioncontainmentconstraint(S1∈R),thesetofpossiblelengthsisanultimatelyperiodicset:S1S2S3S4LLRRS1S2S3S4LLRRS3_COPYS2_COPYCOPYCREATION(1,1)(2,2)(3,3)(4,4)(3,4)(1,2)S2=S2_COPYS3=S3_COPYNEWCONSTRAINTSDUETOCOPYCREATIONS1=S2.
S3S4=S3.
S2INPUTCONCATCONSTRAINTSFigure6:Asetofconcatconstraintswithcontradictoryorderingrequirements.
Nodesareduplicatedtoresolvethecontradiction.
whetheralengthispossibledependsonlyonitsremainderwhendividedbyaxedperiod,exceptforanitesetofexceptions.
(Yuetal.
usetheequivalentconceptofasemi-linearsetinaconservativeautomaton-basedapproach[32].
)Thedetailsofcomputingthissetarecoveredinthelit-erature[22];wenotethatsuchsetscanbeconvenientlyrepresentedwithourSMTsolversinceitsupportsamodulusoperation.
Ateachiterationofstep2,thesolverconjoinsthelengthconstraintscorrespondingtoalloftheoriginalstringconstraints,alongwithaconstrainttoruleouteachlengthassignmentthathadpreviouslybeentried,andpassesthisformulatotheSMTsolver.
Ifitreturnsasatisfyingassignment,itrepresentsanewlengthassignmenttotry;iftheconstraintisunsatisable,thensoweretheoriginalstringconstraints.
CoreConstraintImplicationonlengthsS1=S2S3LS1=LS2+LS3S1∈RLS1∈LengthSet(R)S1=S2LS1=LS2length(S1)iLS1ilength(S1)length(S2)LS1LS2TableI:Lengthconstraintsimpliedbycorestringcon-straints,whereLSisthelengthofastringS,andrangesovertheoperatorsItisnotnecessaryforcorrectnessthatthelengthabstrac-tionperformedbythesolverbeprecise,butdeterminingpreciselengthboundsimprovesperformancebyavoidingwastediterations.
Inthecompletesystem,theintegercon-straintsoverlengthsaresolvedtogetherwithintegercon-straintsarisingdirectlyfromtheoriginalprogram,discussedinSectionVI.
Inourexperienceitisimportantforgoodperformancetosolvethesetwosetsofintegerconstraintstogether.
Thetwosetsofconstraintsmaybeinterrelated,andsolvingthemtogetherpreventsthesolverfromwastingtimeexploringsolutionsthatsatisfyonesetbutnottheother.
Step3:Encodingasbitvectors.
Giventhearraylayoutandlengthscomputedinsteps1and2,theremainingconstraintsoverthecontentsofstringscanbeexpressedasconstraintsoverxed-sizebit-vectors.
Stringequalitytranslatesdirectlyintobit-vectorequality.
Fortheencodingofregularexpres-sionconstraints,wereusepartoftheimplementationoftheHAMPItool[17].
Atahigh-level,thetranslationrstunrollsusesoftheKleenestarinaregularexpressionintoanitenumberofrepetitions(nevermorethanthestringlength).
Next,wheretheregularexpressionhascon-catenation,HAMPIdeterminesallpossiblecombinationsoflengthsthatsumtothetotallength,andinstantiateseachasaconjunctionofconstraints.
Alongwiththealternationsthatappearedintheoriginalregularexpression,eachoftheseconjunctionsalsorepresentsanalternativewayinwhichtheregularexpressioncouldmatchthestring.
Tocompletethetranslation,thechoicebetweenallofthesealternativesisrepresentedwithadisjunction.
(See[17]foramoredetailedexplanationandsomeoptimizations.
)HAMPIsupportsonlyasingle,xed-lengthinput,soweinvokeitrepeatedlytotranslateeachconstraintbetweenaregularexpressionandastringintoanSTPformula.
Wethencombineeachofthesetranslationswithourtranslationsofotherstringcontentsconstraints(e.
g.
,stringequality),andconjoinalloftheseconstraintssothattheyapplytothesamesinglelongcharacterarray.
ItisthissinglecombinedformulathatwepasstotheSMTsolver(STP)tondasatisfyingassignment.
VI.
REDUCINGJAVASCRIPTTOSTRINGCONSTRAINTSInthissectionwedescribeourtool'stranslationfromJavaScripttothelanguageofourconstraintsolver,focusingonthetreatmentofstringoperations.
Westartbygivingthefullconstraintlanguagethesolversupports,thendescribeourgeneralapproachtomodelingstringoperations,ouruseofconcretevaluesfromthedynamictrace,andtheprocessoftranslatingrealregularexpressionsintotextbook-styleones.
Fullconstraintlanguage.
ThecoreconstraintlanguagepresentedinSectionIVcapturestheessenceofoursolvingapproach,butitexcludesseveralfeaturesforsimplicity,mostnotablyintegerconstraints.
Thefullconstraintlanguagesup-portedbyoursolversupportsvaluesofstring,integer,andbooleantypes,anditsgrammarisgiveninFigure8,alongwithitstypesysteminFigure7.
Theadditionalconstraintsaresolvedatstep2ofthestringsolutionprocedure,togetherwiththeintegerconstraintsonthelengthsofstrings.
TomatchcommonJavaScriptimplementations(whichreserveabitasatypetag),wemodelintegersas31-bitsignedbit-vectorsinourSMTsolver,whichsupportsalltheintegeroperationsthatJavaScriptdoes.
ThesolverreplaceseachtoStringconstraintwiththeappropriatestringonceavalueforitsargumentisselected:forinstance,ifiisgiventhevalue12,toString(i)isreplacedwith"12".
JavaScriptstringoperations.
JavaScripthasalargelibraryofstringoperations,andwedonotaimtosupporteveryoperation,orthefullgeneralityoftheirbehavior.
Beyondtheengineeringchallengeofbuildingsuchacompletetransla-tion,havingverycomplexsymbolictranslationsforcommonoperatorswouldlikelycausethesystemtobogdown,andτ::=string|int|boolConstRegex::=Regex|CapturedBrack(R,i)|BetweenCapBrack(R,i,j)Figure7:TypesystemforthefullconstraintlanguageS1:string=(S2:string)(S3:string)I1:int=length(S:string)S1:string∈R:ConstRegexS1:string/∈R:ConstRegexI1:int=(I2:int)I3:int)B1:bool=(A1:τ)A2:τ)B1:bool=(I1:int)I2:int)B1:bool=(B2:bool)B1:bool=(B1:bool)B2:bool)S1:string=toString(I1:int)Figure8:Grammarandtypesforthefullconstraintlanguageincludingoperationsonstrings,integers,andbooleans.
functionvalidate(input){//input='{"action":"","val":""}';mustMatchre1bfnrt]|u[0-9a-fA-F]{4})/g;re2n\r]*"|true|false|null|-\d+(:\.
\d*)(:[eE][+\-]\d+)/g;re3s*\[)+/g;rep1=str.
replace(re1,"@");rep2=rep1.
replace(re2,"]");rep3=rep2.
replace(re3,"");if(rep3==mustMatch){eval(input);returntrue;}returnfalse;}Figure9:Exampleofaregular-expression-basedvalidationcheck,adaptedfromareal-worldJavaScriptapplication.
Thisillustratesthecomplexityofrealregularexpressionsyntax.
thegeneralitywouldusuallybewasted.
Instead,ourchoicehasbeentomodelthestringoperationsthatoccurcommonlyinwebapplications,andthecoreaspectsoftheirbehavior.
Forotheroperationsandbehavioraspectsourtoolusesvaluesfromtheoriginalexecutiontrace(describedfurtherbelow),sothattheyareaccuratewithrespecttotheoriginalexecutionevenifthetoolcannotreasonsymbolicallyabouthowtheymightchangeonmodiedexecutions.
Thedetailedtranslationfromseveralcommonoperators(asubsetofthosesupportedbyourimplementation)toourconstraintlanguageisshowninTableII.
Usingdynamicinformation.
Oneofthebenetsofdy-namicsymbolicexecutionisthatitprovidestheexibil-itytochoosebetweensymbolicvalues(whichintroducegenerality)andconcretevalues(whicharelessgeneral,butguaranteedtobeprecise)tocontrolthescopeofthesearchprocess.
Ourtool'shandlingofstringconstraintstakesadvantageofconcretevaluesfromthedynamictracesinseveralways.
Anexampleisstringreplace,whichisoftenusedinsanitizationtotransformunsafecharactersintosafeones.
Ourtranslationusesaconcretevalueforthenumberofoccurrencesofthesearched-forpattern:ifapatternwasreplacedsixtimesintheoriginalrun,thetoolwillsearchforotherinputsinwhichthepatternoccurssixtimes.
Thissacricessomegenerality(forinstance,ifacertainattackisonlypossiblewhenthestringappearsseventimes).
However,webelievethisisabenecialtrade-offsinceitallowsourtooltoanalyzeandndbugsinmanyusesofreplace.
Forcomparison,mostpreviousstringcon-straintsolversdonotsupportreplaceatall,andaddingareplacethatappliedtoanynumberofoccurrencesofastring(evenlimitedtosingle-characterstrings)wouldmakeourcoreconstraintlanguageundecidableintheunboundedcase[6].
Regularexpressionsinpractice.
The"regularexpressions"supportedbylanguageslikeJavaScripthavemanymorefeaturesthanthetypicaldenitiongiveninacomputabilitytextbook(orFigure2).
Figure9showsanexample(adaptedfromarealwebsite)ofoneofmanyregularexpressionsKudzumustdealwith.
KudzuhandlesamajorityofthesyntaxforregularexpressionsinJavaScript,whichincludessupportfor(possiblynegated)characterclasses,escapedsequences,repetitionoperators({n}andsub-matchextractionusingcapturingparentheses.
Kudzukeepstrackofthenestingofcapturingparentheses,sothatitcanexpresstherelationbetweentheinputstringandthepartsofitthatmatchthecapturedgroups(asshowninTableII).
Kudzudoesnotcurrentlysupportback-references(theyarestrictlymoreexpressivethantrueregularexpres-sions),thoughifweseeaneedinthefuture,manyusesofback-referencescouldbetranslatedusing(non-regular)concatenationconstraints.
VII.
EXPERIMENTALEVALUATIONWehavebuiltafull-implementationofKudzuusingtheWebKitbrowser,with650,7430and2200linesofcodeinthepathconstraintextractioncomponent,constraintsolver,andGUIexplorercomponent,respectively.
ThesystemiswritteninamixtureofC++,Ruby,andOCamllanguages.
WeevaluateKudzuwiththreeobjectives.
OneobjectiveistodeterminewhetherKudzuispracticallyeffectiveinexploringtheexecutionspaceofreal-worldapplicationsanduncoveringnewcode.
ThesecondobjectiveistodeterminetheeffectivenessofKudzuasastand-alonevulnerabilitydis-coverytool—whetherKudzucanautomaticallyndclient-sidecodeinjectionvulnerabilitiesandpruneawayfalsereports.
Finally,wemeasuretheefciencyoftheconstraintsolver.
Ourevaluationresultsarepromising,showingthatKudzuisapowerfulsystemthatndspreviouslyunknownvulnerabilitiesinreal-worldapplicationsfullyautomatically.
A.
ExperimentSetupWeselect18subjectapplicationsconsistingofpopulariGooglegadgetsandAJAXapplications,asthesewerestudiedbyourprevioustoolFLAX[26].
FLAXassumesavailabilityofanexternal(manuallydeveloped)testsuitetoseeditstesting;incontrast,KudzuautomaticallygeneratesamuchmorecomprehensivetestsuiteandndsthepointsJavaScriptoperationReductiontoourconstraintlanguageS1:string=charAt(S:string,I:int)(((LS=length(S))∧(I≥0)∧(ItoString(I1)))∨(((I≥LS)∨(ItoString(I1))∧(S1="NaN")))S:string=concat(S1:string,S2:string,.
.
.
,Sk:string)(S=S1S2···Sk)I:int=indexOf(S:string,s:string,startpos:int)(((I≥0)∧((startpostoString(I1))∧(S1="NaN")))I:int=lastIndexOf(S:string,s:string,startpos:int)(((I≥0)∧((startpostoString(I1))∧(S1="NaN")))[S1,S2,.
.
.
,Sk]:stringlist=match(S:string,r:ConstRegex)(((k>0)∧(S∈r)∧((S1∈CapturedBrack(r,1))∨(S1=""))(non-greedy)∧(S=T0S1T1···SkTk)∧(Vki=0Ti∈BetweenCapBrack(r,i,i+1)))∧···∧((Sk∈CapturedBrack(r,k))∨(Sk="")))∨((k≤0)∧(S/∈Regex(.
*r.
*))))[S1,S2,.
.
.
,Sn]:stringlist=match(S:string,r:ConstRegex,n:int)(((S=T1M1T2M2···TnMnTn+1)∧(T1,T2,.
.
.
,Tn+1/∈Regex(.
*r.
*))(greedymatch)∧(M1,M2,.
.
.
,Mn∈Regex(r)))nistheconcretenumberofoccurrencesofstringsmatchingr.
S1:string=replace(S:string,r:ConstRegex,s:string,n:int)((S=T1M1T2M2···TnMnTn+1)∧(T1,T2,.
.
.
,Tn+1/∈Regex(.
*r.
*))∧(M1,M2,.
.
.
,Mn∈Regex(r))))∧(S1=T1sT2s···Tn+1)nistheconcretenumberofoccurrencesofstringsmatchingrinS.
[S1,S2,.
.
.
,Sk]:stringlist=split(S:string,s:string,n:int)(((S=S1sS2s···SnsSn+1)∧(S1,S2,.
.
.
,Sn+1/∈Regex(.
*s.
*))))nistheconcretenumberofoccurrencesofstringsmatchingr.
I1:int=search(S:string,r:ConstRegex)(((I1toString(I))∧((S="NaN")∨(S=Regex([0-9]+)))TableII:OurreductionfromcommonJavaScriptoperationstoourfullconstraintlanguage.
Capitalizedvariablesmaybeconcreteorsymbolic,whilelowercasevariables(includingregularexpressions)musttakeaconcretevalue.
Ourimplementationsupportsalargersetofoperations,whosetranslationsaresimilar.
Substr/Substring/CharAt/CharCodeAt5%IndexOf/LastIndexOf/Strlen78%Replace/EncodeURI/DecodeURI8%Match/Test/Split1%Concat8%Figure10:Distributionofstringoperationsinoursubjectapplications.
ofvulnerabilitywithoutrequiringanyexternaltestharnessapriori.
Further,inourexperimentsKudzudiscovers2newvulnerabilitieswithinafewhoursoftestingwhichweremissedbytheFLAXbecauseofitslackofcoverage.
Inaddition,asweshowlaterinthissection,manyofthegeneratedconstraintsarehighlycomplexandnotsuitableformanualinspectionorfuzzing,whereasKudzueitherassertsthesafetyofthevalidationchecksorndsexploitsforvulnerabilitiesinoneiterationasopposedtomanyroundsofrandomtesting.
Totesteachsubjectapplication,weseedthesystemwiththeURLoftheapplication.
Forthegadgets,theURLsarethesameasthoseusedbyiGooglepagetoembedthegadget.
WecongureKudzutogiveapre-preparedusernameandloginpasswordforapplicationsthatrequiredauthentication.
WereporttheresultsforrunningeachapplicationunderKudzu,cappingthetestingtimetoamaximumof6hoursforeachapplication.
AlltestsranonaUbuntu9.
10Linuxworkstationwith2.
2GHzInteldual-coreprocessorsand2GBofRAM.
B.
ResultsTableIIIpresentsthenalresultsoftestingthesubjectapplications.
ThesummaryofourevaluationhighlightsthreefeaturesofKudzu:(a)itautomaticallydiscoversnewprogrampathsinrealapplications,signicantlyenhancingcodecoverage;(b)itnds2client-sidecodeinjectioninthewildandseveralinapplicationsthatwereknowntocontainvulnerabilities;and(c)Kudzusignicantlyprunesawayfalsepositives,successfullydiscardingcasesthatdoemploysufcientvalidationchecks.
Characteristicsofstringoperationsinourapplications.
Constraintsarisingfromourapplicationshaveanaverageof63JavaScriptstringoperations,whiletheremainingareboolean,logicalandarithmeticconstraints.
Figure10groupstheobservedstringoperationsbysimilarity.
TheApplication#ofnewInitial/FinalBuginputsCodeCoveragefoundAcademia2030.
27/76.
47%AJAXIm1549.
58/77.
67%FaceBookChat5426.
85/76.
84%-ParseUri1353.
90/86.
10%Plaxo315.
72/76.
43%AskAWord1029.
30/67.
95%BirthdayReminder2759.
47/73.
94%-BlockNotes45765.
06/71.
50%CalorieWatcher1664.
54/73.
53%-ExpensesManager13361.
09/76.
56%-Listy1965.
31/79.
73%NotesLP2546.
62/76.
67%-ProgressBar1263.
60/75.
09%-SimpleCalculator146.
96/80.
52%TodoList1572.
51/86.
41%TVGuide630.
39/75.
13%WordMonkey2014.
84/75.
36%ZipCodeGas1159.
05/74.
28%-Average5146.
95/76.
68%11TableIII:Thetop5applicationsareAJAXapplications,whiletherestareGoogle/IGgadgetapplications.
Column2reportsthenumberofdistinctnewinputsgenerated,andcolumn3reportstheincreaseincodecoveragefromtheinitialruntoandthenalrun.
largestfractionareoperationslikeindexOfthattakestringinputsandreturnaninteger,whichmotivatetheneedforasolverthatreasonsaboutintegersandstringssimulta-neously.
Asignicantfractionoftheoperations,includingsubtring,splitandreplace,implicitlygiverisetonewstringsfromtheoriginalone,therebygivingrisetoconstraintsinvolvingmultiplestringvariables.
Ofthematch,splitandreplaceoperations,31%areregularexpressionbased.
Over33%oftheregularexpressionshaveoneormorecapturingparentheses.
Capturingparenthesesinregularexpressionbasedmatchoperationsleadtoconstraintsinvolvingmultiplestringvariables,similartooperationssuchassplit.
Thesecharacteristicsshowthatasignicantfractionofthestringconstraintsarisinginourtargetapplicationsrequireasolverthatcanreasonaboutmultiplestringvariables.
Weempiricallyseeexamplesofcomplexregularexpressionsaswellasconcatenationoperations,whichstressestheneedforoursolverthathandlesbothwordequationsandregularexpressionconstraints.
Priortothiswork,off-the-shelfsolversdidnotsupportwordequationsandregularexpressionssimultaneously.
VulnerabilityDiscovery.
Kudzuisabletondclient-sidecodeinjectionvulnerabilitiesin11oftheapplicationstested.
2ofthesewerenotknownpriortotheseexperimentsandweremissedbyFLAX.
Oneofthemisonasocial-networkingapplication(http://plaxo.
com)thatwasmissedbyourFLAXtoolbecausethevulnerabilityexistsonapagelinkedseveralclicksawayfromtheinitialpost-0123456x104askawordbirthdayblocknotescaloriewatcherlistyexpensemanagenoteslabprogressbarcalculatortodolisttvguidewordmonkeyzipcodegasacademiaajaximfacebookchatparseUriplaxoNumberofinstructionsExecutedCompiledFigure11:Kudzucodecoverageimprovementsoverthetestingperiod.
Foreachexperiment,therightbarshowstheincreaseintheexecutedcodefromtheinitialruntototalcodeexecuted.
Theleftbarshowstheincreaseinthecodecompiledfrominitialruntothetotalcodecompiledintheentiretestperiod.
authenticationpage.
Thevulnerablecodeisexecutedonlyaspartofafeatureinwhichausersetsfocusonatextboxandusesittoupdatehisorherprole.
Thisisoneofthemanydifferentwaystoupdatetheprolethattheapplicationprovides.
Kudzufoundthatonlyoneofthesewaysresultedinaclient-sidecodeinjectionvulnerability,whiletherestweresafe.
Inthisparticularfunctionality,theapplicationfailstoproperlyvalidateastringfromapostMessageeventbeforeusingitinanevaloperation.
Theapplicationimplicitlyexpectstoreceivethismessagefromawindowhostedatasub-domainoffacebook.
com;however,KudzuautomaticallydeterminesthatanywebprincipalcouldinjectanydatastringmatchingtheformatFB_msg:Thissubsequentlyresultsincodeinjec-tionbecausethevulnerableapplicationfailstovalidatetheoriginofthesenderandthestructureofJSONstringbeforeitsuseineval.
ThesecondnewvulnerabilitywasfoundinaToDoGoogle/IGgadget.
Similartothepreviouscase,thevul-nerabilitybecomesreachableonlywhenaspecicvalueisselectedfromadropdownbox.
ThisinteractionisamongmanythatthegadgetprovidesandwebelievethatKudzu'sautomaticexplorationisthekeytodiscoveringthisusecase.
Inseveralothercases,suchasAjaxIM,thevulnerablecodeisexecutedonlyafterseveraleventsareexecutedafterinitialsign-inpage—Kudzuautomaticallyreachesthemduringitsexploration.
KudzudidnotndvulnerabilitiesinonlyonecasethatFLAXreportedabug.
Thisisbecausethevulnerabilitywaspatchedinthetimeperiodbetweenourexperimental0123456x104askawordbirthdayblocknotescaloriewatcherlistyexpensemanagenoteslabprogressbarcalculatortodolisttvguidewordmonkeyzipcodegasacademiaajaximfacebookchatparseUriplaxoNumberofinstructionsExecutedCompiledFigure12:Benetsfromsymbolicexecutionalone(darkbars)vs.
completeKudzu(lightbars).
Foreachexperiment,therightbarshowstheincreaseinthetotalexecutedcodewhentheevent-spaceexplorationisalsoturnedon.
Theleftbarshowstheobservedincreaseinthecodecompiledwhentheevent-spaceexplorationisturnedon.
evaluationofFLAXandKudzu.
CodeandEvent-spaceCoverage.
TableIIIshowsthecodecoveragebyexecutingtheinitialURL,andthenalcoverageafterthetestperiod.
Measuringcodecoverageinadynamicallycompiledlanguageischallengingbecausealltheapplicationcodeisnotknownpriortotheexperiments.
Inourexperiments,wemeasuredthetotalcodecompiledduringourexperimentsandthetotalcodeexecuted3.
TableIIIshowsanaverageimprovementofover29%incodecoverage.
Thecoveragevariessignicantlydependingontheapplication.
Figure11providesmoredetail.
Onseverallargeapplications,suchasFacebookChat,AjaxIM,andPlaxo,Kudzudiscoversalotofnewcodeduringtesting.
Kudzuisabletoconcretelyexecuteseveralcodepaths,asshownbytheincreaseintheright-sidebarsinFigure11.
Ontheotherlesscomplexgadgetapplications,mostoftherelevantcodeisobservedduringcompilationintheinitialrunitself,leavingarelativelysmalleramountofnewcodeforKudzutodiscover.
Wealsomanuallyanalyzedthesourcecodeoftheseapplicationsandfoundthatalargefractionoftheircodebrancheswerenotdependentondatawetreatasuntrusted.
Tomeasurethebenetsofsymbolicexecutionalone,werepeatedtheexperimentswiththeevent-spaceexplorationturnedoffduringthetestperiodandreportthecomparisontofull-featuredKudzuinFigure12.
Weconsistentlyobserve3OneunitofcodeinourexperimentsisaJavaScriptbytecodecompiledbytheinterpreter.
Toavoidcountingthesamebytecodeacrossseveralruns,weadoptedaconservativecountingscheme.
Weassignedauniqueidentiertoeachbytecodebasedonthesourcelename,sourcelinenumber,lineoffsetandahashofthecodeblock(typicallyonefunctionbody)compiled.
Application#ofinitial#oftotalTotaleventseventsredeventsreddiscoveredAcademia2078310AJAXIm72481988FaceBookChat159891354ParseUri51617Plaxo88381688AskAWord2811BirthdayReminder122020BlockNotes785319CalorieWatcher141822ExpensesManager101071473Listy15470638NotesLP105921034ProgressBar82436SimpleCalculator173467TodoList82661TVGuide179461517WordMonkey31022ZipCodeGas121212Average18.
61238.
72477.
17TableIV:EventspaceCoverage:Column2and3showthenumberofeventsredintherstrunandintotal.
Thelastcolumnshowsthetotaleventsdiscoveredduringthetesting.
thatsymbolicexecutionalonediscoversandexecutesasignicantfractionoftheapplicationbyitself.
Theevent-explorationcombinedwithsymbolicexecutiondoesperformstrictlybetterthansymbolicexecutioninallbut3cases.
Inamajorityofthecases,turningontheevent-spaceexplorationsignicantlycomplementssymbolicexecution,especiallyfortheAJAXapplicationswhichhaveasignicantGUIcomponent.
Inthe3caseswhereimprovementsarenotsignicant,wefoundthattheeventexplorationgenerallyeitherledtooff-sitenavigationsorthecodeexecutedcouldbeexploredbysymbolicexecutionalone.
Forexample,intheparseUricase,samecodeisexecutedbytext-boxinputaswellasbyclickingabuttonontheGUI.
TableIVshowstheincreaseinnumberofeventsexecutedbyKudzufromtheinitialruntothetotalattheendoftestperiod.
Theseeventsincludeallkeyboardandmouseeventswhichresultinexecutionofeventhandlers,navigation,formsubmissionsandsoon.
Wendthatneweventsaredynamicallygeneratedduringoneparticularexecutionaswellaswhennewcodeisdiscovered.
Asaresult,Kudzugraduallydiscoversneweventsandwasabletoexecuteapproximately50%oftheeventsitobservesduringtheperiodoftesting.
Effectiveness.
Kudzuautomaticallygeneratesatestsuiteofover50newdistinctinputsonaverageforanapplicationinthetestperiod(shownincolumn2oftableIII).
Intheexploitablecasesweobserved,Kudzuwasabletoshowtheexistenceofavulnerabilitywithanattackstringonceitreachedthepointofvulnerability.
Thatis,itsconstraintsolvercorrectlydeterminesthesufciencyorinsufciencyofvalidationchecksinasinglequerywithoutmanualinterventionorundirectediteration.
Thiseliminatesfalsepositivessignicantlyinpractice.
Forinstance,KudzufoundthattheFacebookwebapplicationhasseveralusesofpostMessagedatainevalconstructs,butalluseswerecorrectlyprecededbychecksthatassertthattheoriginofthemessageisadomainendingin.
facebook.
com.
Incon-trast,thevulnerabilityinPlaxofailstocheckthisandKudzuidentiesthevulnerabilitythersttimeitreachesthatpoint.
SomeofthevalidationchecksKudzudealswitharequitecomplex—Figure9showsanexamplewhichissimpliedfromarealapplication.
Theseexamplesareillustrativeoftheneedforautomatedreasoningtools,becausecheckingthesufciencyofsuchvalidationcheckswouldbeonerousbyhandandimpracticalbyrandomfuzzing.
Lastly,wepointoutthatlikemostothervulnerabilitydiscoverytools,Kudzucanhavefalsenegativesbecauseitmayfailtocovercode,orbecauseofoverlystrictattackgrammars.
ConstraintSolverEvaluation.
Figure13showstherunningtimesforsolvingqueriesofvariousinputconstraintsizes.
EachconstraintisaeitheraJavaScriptstring,arithmetic,logical,orbooleanoperation.
Thesizesoftheequationsvariedfrom1toupto250constraints.
Thesolverdecidessatisabilityoftheconstraintstypicallyunderasecondforsatisablecases.
Asexpected,toassertunsatisability,thesolveroftentakestimevaryingfromnearlyasecondto50seconds.
Thevariationislargebecauseinmanycasesthesolverassertsunsatisablebyassertingtheunsatisabilityoflengthconstraints,whichisinexpensivebecausethestepofbit-vectorencodingisavoided.
Inothercases,theunsatisabilityresultsonlywhenthesolverdeterminestheunsatisabilityofbit-vectorsolutions.
Oursolverrequiresonlyanupperboundonthelengthsofinputvariables,andisabletoinfersatisablelengthsofvariablesinternally.
Intheseexperiments,weincreasetheupperboundoftheinputvariablesfrom10to100charactersinstepsof20each.
Ifthesolverassertsunsatisabilityuptoanupperboundlengthof100,wetreattheconstraintsasunsatisable.
VIII.
RELATEDWORKKudzuistherstapplicationofdynamicsymbolicexecu-tiontoclient-sideJavaScript.
Here,wediscusssomerelatedprojectsthathaveappliedsimilartechniquestoserver-sidewebapplications,orhaveuseddifferenttechniquestosearchforJavaScriptbugs.
Finally,wesummarizewhyweneededtobuildanewstringconstraintsolver.
Server-sideanalysis.
JavaScriptapplicationcodeissimilarinsomewaystoserver-sidecode(especiallyPHP);forinstance,bothtendtomakeheavyuseofstringoperations.
Severalprevioustoolshavedemonstratedtheuseofsym-bolicexecutionforndingSQLinjectionandreectedorstoredcross-sitescriptingattackstocodewritteninPHPandJava[1],[18],[30].
However,JavaScriptcodeusually0501001502002500.
050.
5550SolveTime(SATcases)SolveTime(UNSATcases)Figure13:Theconstraintsolver'srunningtime(inseconds)asafunctionofthesizeoftheinputconstraints(intermsofthenumberofsymbolicJavaScriptoperations)parsesitsowninput,soJavaScriptsymbolicexecutionrequiresmoreexpressiveconstraints,specicallytorelatedifferentstringsthatwerepreviouslypartofasinglestring.
AdditionalchallengesuniquetoJavaScriptarisebecauseJavaScriptprogramstakemanydifferentkindsofinput,someofwhichcomeviauserinterfaceevents.
LikeKudzu,theSaner[2]toolforPHPaimstocheckwhethersanitizationroutinesaresufcient,notjustthattheyarepresent.
Howevertheirtechniquesarequitedifferent:theyselectpathsandmodeltransformationsstatically,thenperformtestingtoverifysomevulnerabilities.
Theirdef-initionofsanitizationcoversonlystringtransformations,notvalidationchecksinvolvingbranches,whichoccurfre-quentlyinourapplications.
AnalysisframeworksforJavaScript.
SeveralworkshaverecentlyappliedstaticanalysistodetectbugsinJavaScriptapplications(e.
g.
,[9],[12]).
Staticanalysisiscomplemen-tarytosymbolicexecution:ifastaticanalysisissound,anabsenceofbugreportsimpliestheabsenceofbugs,butstaticanalysiswarningsmaynotbeenoughtoletadeveloperreproduceafailure,andinfactmaybefalsepositives.
FLAXusestaint-enhancedblackboxfuzzingtodetectiftheJavaScriptapplicationemployssufcientvalidationornot[26];likeKudzu,itsearchesforinputstotriggerafailure.
However,FLAXrequiresanexternaltestsuitetobeabletoreachthevulnerablecode,whereasKudzugen-eratesahigh-coveragetestsuiteautomatically.
Also,FLAXperformsonlyblack-boxfuzztestingtondvulnerabilities,whileKudzu'suseofaconstraintsolverallowsittoreasonaboutpossiblevulnerabilitiesbasedontheanalyzedcode.
Crawljaxisarecentlydevelopedtoolforevent-spaceexplorationofAJAXapplications[23].
Specically,Crawl-jaxbuildsastaticrepresentationofaWeb2.
0applicationbyclickingelementsonthepageandbuildingastategraphfromtheresultingtransitions.
Kudzu'svaluespaceexplorationcomplementssuchGUIexplorationtechniquesandenablesamorecompleteanalysisoftheapplicationusingcombinedsymbolicexecutionandGUIexploration.
Stringconstraintsolvers.
Stringconstraintsolvershaverecentlyseensignicantdevelopment,andpracticaltoolsarebeginningtobecomeavailable,butasdetailedinSec-tionIV-C,noprevioussolverswouldbesufcientforJavaScript,sincetheylacksupportforregularexpres-sions[3],[4],[7],stringequality[14],ormultiplevari-ables[17],whichareneededincombinationtoreasonaboutJavaScriptinputparsing.
Inconcurrentwork,Veanesetal.
giveanapproachbasedonautomataandquantiedaxiomstoreduceregularexpressionstotheZ3decisionprocedure[28].
Combinedwith[4],thiswouldprovidesimilarexpressivenesstoKudzu.
IX.
CONCLUSIONWiththerapidgrowthofAJAXapplications,JavaScriptcodeisbecomingincreasinglycomplex.
Inthisregard,securityvulnerabilities(suchasclient-sidecodeinjection)andanalysisofJavaScriptcodeisanimportantareaofresearch.
Inthispaper,wepresentthedesignanddevelop-mentoftherstcompletesymbolicexecutionbasedsystemforexploringtheexecutionspaceofJavaScriptprograms.
InmakingthesystempracticalweaddressedchallengesrangingfromdesigningamoreexpressivelanguageforstringconstraintstoimplementingexplorationandreplayofGUIeventsinawebbrowser.
WehaveimplementedourideasinatoolcalledKudzu.
GivenaURLforawebapplication,Kudzuautomaticallygeneratesahigh-coveragetestsuite.
WehavealsoappliedKudzutondclient-sidecodeinjectionvulnerabilities.
Inacollectionoflivewebapplications,itnds11vulnerabilities(2ofwhichwerepreviouslyunknown)withoutproducingfalsepositives.
X.
ACKNOWLEDGMENTSWethankDavidWagner,AdamBarth,DomagojBabic,AdrianMettler,JuanCaballeroandPongsinPoosankamforhelpfulfeedbackonthepaperatvariousstages.
Wearealsothankfultoouranonymousreviewersforsuggestingimprovementstoourwork.
ThismaterialisbaseduponworkpartiallysupportedbytheNationalScienceFoundationunderGrantsNo.
0311808,No.
0448452,No.
0627511,andCCF-0424422,bytheAirForceOfceofScienticResearchunderGrantNo.
22178970-4170,andbytheArmyResearchOfceunderGrantNo.
DAAD19-02-1-0389.
Anyopinions,ndings,andconclusionsorrecommendationsexpressedinthismaterialarethoseoftheauthorsanddonotnecessarilyreecttheviewsoftheNationalScienceFoundation,theAirForceOfceofScienticResearch,ortheArmyResearchOfce.
REFERENCES[1]S.
Artzi,A.
Kie˙zun,J.
Dolby,F.
Tip,D.
Dig,A.
Paradkar,andM.
D.
Ernst.
Findingbugsindynamicwebapplications.
InISSTA2008,InternationalSymposiumonSoftwareTestingandAnalysis,2008.
[2]D.
Balzarotti,M.
Cova,V.
Felmetsger,N.
Jovanovic,E.
Kirda,C.
Kruegel,andG.
Vigna.
Saner:ComposingStaticandDynamicAnalysistoValidateSanitizationinWebApplica-tions.
InProceedingsoftheIEEESymposiumonSecurityandPrivacy,Oakland,CA,May2008.
[3]A.
Barth,J.
Caballero,andD.
Song.
Securecontentsnifngforwebbrowsersorhowtostoppapersfromreviewingthemselves.
InProceedingsofthe30thIEEESymposiumonSecurityandPrivacy,Oakland,CA,May2009.
[4]N.
Bjrner,N.
Tillmann,andA.
Voronkov.
Pathfeasibilityanalysisforstring-manipulatingprograms.
InTACAS'09:Proceedingsofthe15thInternationalConferenceonToolsandAlgorithmsfortheConstructionandAnalysisofSystems,2009.
[5]H.
Bojinov,E.
Bursztein,andD.
Boneh.
XCS:Crosschannelscriptinganditsimpactonwebapplications.
InCCS,2009.
[6]J.
R.
B¨uchiandS.
Senger.
Denabilityintheexistentialtheoryofconcatenationandundecidableextensionsofthistheory.
MathematicalLogicQuarterly,34(4):337–342,1988.
[7]J.
Caballero,S.
McCamant,A.
Barth,andD.
Song.
Ex-tractingmodelsofsecurity-sensitiveoperationsusingstring-enhancedwhite-boxexplorationonbinaries.
TechnicalRe-portUCB/EECS-2009-36,EECSDepartment,UniversityofCalifornia,Berkeley,March2009.
[8]A.
Chandra,J.
Halpern,A.
Meyer,andR.
Parikh.
Equationsbetweenregulartermsandanapplicationtoprocesslogic.
InProceedingsofthe13thannualACMSymposiumonTheoryofComputing(STOC),pages384–390,May1981.
[9]R.
Chugh,J.
A.
Meister,R.
Jhala,andS.
Lerner.
StagedinformationowforJavaScript.
InPLDI,2009.
[10]V.
GaneshandD.
L.
Dill.
Adecisionprocedureforbit-vectorsandarrays.
InComputerAidedVerication,19thInternationalConference(CAV),pages519–531,July2007.
[11]P.
Godefroid,M.
Y.
Levin,andD.
Molnar.
Automatedwhiteboxfuzztesting.
InNetworkandDistributedSystemSecurity,Feb.
2008.
[12]S.
GuarnieriandB.
Livshits.
Gatekeeper:mostlystaticenforcementofsecurityandreliabilitypoliciesforJavaScriptcode.
InUsenixSecurity,2009.
[13]R.
Hansen.
XSScheatsheet.
http://ha.
ckers.
org/xss.
html.
[14]P.
HooimeijerandW.
Weimer.
Adecisionprocedureforsubsetconstraintsoverregularlanguages.
InACMSIGPLANConferenceonProgrammingLanguageDesignandImple-mentation(PLDI),pages188–198,June2009.
[15]H.
B.
HuntIII.
Theequivalenceproblemforregularexpres-sionswithintersectionisnotpolynomialintape.
TechnicalReport73-161,CornellUniversityDepartmentofComputerScience,Mar.
1973.
[16]J.
Karhum¨aki,W.
Plandowski,andF.
Mignosi.
Theexpress-ibilityoflanguagesandrelationsbywordequations.
InAutomata,LanguagesandProgramming,24thInternationalColloquium,(ICALP),pages98–109,1997.
[17]A.
Kie˙zun,V.
Ganesh,P.
J.
Guo,P.
Hooimeijer,andM.
D.
Ernst.
HAMPI:Asolverforstringconstraints.
InInterna-tionalSymposiumonSoftwareTestingandAnalysis(ISSTA),2009.
[18]A.
Kie˙zun,P.
J.
Guo,K.
Jayaraman,andM.
D.
Ernst.
AutomaticcreationofSQLinjectionandcross-sitescript-ingattacks.
In30thInternationalConferenceonSoftwareEngineering(ICSE),May2009.
[19]A.
Klein.
DOMbasedcrosssitescriptingorXSSofthethirdkind.
Technicalreport,WebApplicationSecurityConsortium,2005.
[20]M.
MartinandM.
S.
Lam.
AutomaticgenerationofXSSandSQLinjectionattackswithgoal-directedmodelchecking.
In17thUSENIXSecuritySymposium,2008.
[21]Y.
Matiyasevich.
Wordequations,Fibonaccinumbers,andHilbert'stenthproblem(extendedabstract).
InWorkshoponFibonacciWords,Turku,Finland,Sept.
2006.
[22]A.
B.
Matos.
Periodicsetsofintegers.
TheoreticalComputerScience,127(2):287–312,May1994.
[23]A.
Mesbah,E.
Bozdag,andA.
vanDeursen.
Crawlingajaxbyinferringuserinterfacestatechanges.
InProceedingsofthe8thInternationalConferenceonWebEngineering(ICWE'08),2008.
[24]W.
Plandowski.
Satisabilityofwordequationswithcon-stantsisinPSPACE.
In40thAnnualSymposiumonFoun-dationsofComputerScience(FOCS),pages495–500,Oct.
1999.
[25]Samy.
I'mpopular.
DescriptionoftheMySpacewormbytheauthor,includingatechnicalexplanation.
,Oct2005.
[26]P.
Saxena,S.
Hanna,P.
Poosankam,andD.
Song.
FLAX:Systematicdiscoveryofclient-sidevalidationvulnerabilitiesinrichwebapplications.
In17thAnnualNetwork&Dis-tributedSystemSecuritySymposium,(NDSS),2010.
[27]TwitPwn.
DOMbasedXSSinTwitterfall.
2009.
[28]M.
Veanes,P.
deHalleux,andN.
Tillmann.
Rex:Symbolicregularexpressionexplorer.
InInternationalConferenceonSoftwareTesting,VericationandValidation(ICST),Apr.
2010.
[29]W3C.
HTML5specication.
http://www.
w3.
org/TR/html5/.
[30]G.
Wassermann,D.
Yu,A.
Chander,D.
Dhurjati,H.
Inamura,andZ.
Su.
Dynamictestinputgenerationforwebapplica-tions.
InISSTA'08:Proceedingsofthe2008internationalsymposiumonSoftwaretestingandanalysis,2008.
[31]XMLpathlanguage2.
0.
http://www.
w3.
org/TR/xpath20/.
[32]F.
Yu,T.
Bultan,andO.
H.
Ibarra.
Symbolicstringverica-tion:Combiningstringanalysisandsizeanalysis.
InToolsandAlgorithmsfortheConstructionandAnalysisofSystems(TACAS),pages322–336,Mar.
2009.
APPENDIXA.
ExpressivenessEachofthefollowingconstrainttypesallowsnewsetsofstringstobedened:Thetheoryofjustconcatenationconstraintsv1=v2v3(plusconstantstrings)isequivalenttothetheoryofwordequations,whichisknowntobeneitherasubsetnorasupersetofthetheoryofregularexpressionsintermsofexpressiveness[16].
Forinstance,thesetofstringsthatconsistofasinglestringrepeated{ww:w∈Σ}canbeeasilyexpressedwithawordequation,butisnotregular.
Conversely,theregularlanguagesexpressedbyregularexpressionsincludesetsofstringsthatcannotbeex-pressedsolelywithwordequations.
Anexamplegivenin[16]isthesetofstringsconsistingofaandboverathree-letteralphabet{a,b,c},whichisrepresentedbytheregularexpression(a|b)*.
Theconstraintlength(v1)=length(v2)alsoaddsexpressiveness,sincethelanguage{u$v:|u|=|v|}isnotregular,andtherelationoftwostringshavingequallengthalsocannotbeexpressedinwordequationsforanynon-unaryalphabet[16].
B.
ComplexityTheproblemofdecidingKudzu'scoreconstraintlanguageisatleastPSPACE-hard,whichcanbeseenbyadaptingtheconstructiongivenbyHunt[15]toprovethesamehardnessfortheemptinessproblemforregularexpressionswithanintersectionoperator.
Theconstructionisareductionfromtheacceptanceproblemforalinearly-boundedautomaton;sinceitusestheintersectionoperatoronlyatthetopleveloftheconstructedexpression,itcanalsobeexpressedasaconjunctionofregularexpressionconstraintsinourlanguage.
(ThebestknownalgorithmsforthewordequationproblemarealsoinPSPACE[24],butthebestknownlowerboundforthatfragmentisonlyNP-hardness.
)Ontheupperboundside,thecomplexityofourdecidingourconstraintsisstillanopenproblem.
Thefragmentconsistingofwordequationsandlengthequalitiesisknowntobestrictlymoreexpressivethatwordequationsalone[6],[16],butitsdecidabilityhasbeenmentionedasopenbyseveralauthors[4],[6],[21].
Ourcoreconstraintlanguagedoesnotcontainanyfeaturesthatobviouslycauseittobeundecidable,butitisstrictlymoreexpressivethanfragmentofwordequationsandlengthequalities,soitsdecidabilityisalsoanopenproblem.
Thepotentialdifcultyofdecidingtheunboundedversionofourcoreconstraintlanguage(evenPSPACEproblemsareusuallyinfeasibleatscale)motivatesourdecisioninthepracticalsystemtosearchjustforsolutionsusingstringsoflengthsboundedbyauser-speciedparameter.
TheboundedversionoftheconstraintdecisionproblemisNP-complete(wherethesizeoftheinputismeasuredbythebound,for-mallybyspecifyingtheboundinunary).
ThereductionfromSATisimmediate,sincetheconstraintlanguageincludesarbitrarybooleancombinations,andtheboundedlanguageisinNPbecausesolutions(ofsizeproportionaltothebound)canbecheckedinpolynomialtimeusingstandardregularexpressionmatchingalgorithms.
C.
CorrectnessWediscussthecorrectnessofouralgorithmintermsofsoundness,completeness(uptothebound),andtermination.
Thesepropertiesbuildonthecorrespondingpropertiesfortheunderlyingbit-vectorsolverandthetranslationfromregularexpressionstobit-vectors,whichhavealreadybeenestablished.
Concatenationconstraints.
ThekeyinvariantestablishedbythetranslationfromconcatenationconstraintstoanadjacencyorderingisthatforeachoriginalconstraintS1=S2S3,therstlength(S2)charactersofS1shouldbeconstrainedtobeequaltothecharactersofS2,andsimilarlybetweentheremaininglength(S3)charactersofS1andS3.
Whenthenodesareadjacentintheconcatenationgraph,thisholdsbecausethecharactersareinfactthesamechar-acters;whennodesarecopiedtoresolveorderingconicts,theinvariantholdsbecausethecopiesareconstrainedtobeequaltotheoriginals.
Lengthabstraction.
Thekeypropertyofthelengthcon-straintssynthesizedinstep2isthattheybeasoundabstractionofthepossiblelengthsofstringsthatcouldsatisfytheconstraints:foranystringsthatsatisfyastringconstraint,theirlengthsmustsatisfythecorrespondinglengthconstraint.
Itisnotnecessaryforcorrectnessthattheabstractionbeprecise(thateverylengthcorrespondtoapossiblestringsolution),thoughprecisionimprovesthesolver'sperformancebypruningthesearchspaceoflengths.
Fortheconcatenation,stringequality,andlength-relatedconstraints,thecorrectnessofthelengthabstractionsisimmediate.
ForadetaileddiscussionoftheoperationLengthSet(R)wereferreadersto[22].
Soundness.
Forsoundness,supposethatthealgorithmpro-ducesasolutionthatassignsstringsiwithlengthlitoeachstringvariableSithatappearsinaconstraintset.
Bythesoundnessofthebit-vectorsolver,thesolutioncorrespondstoanassignmentoftruthvaluestoeachstringconstraintintheformulathatgivestheformulathebooleanresulttrue.
LettheconstraintliteralsbeasetwhichcontainsCforeachconstraintCassignedtrue,andCforeachconstraintCassignedfalse.
Bythesoundnessofthebit-vectorsolver,thelengthsliareacorrectsolutiontothelengthconstraints,sothelength-relatedconstraintliteralswillbesatised.
Bythesoundnessofthetranslationofregularexpressionconstraintsandthebit-vectorsolver,si∈L(Ri)foreachconstraintSi∈Ri,soeachregularexpressionconstraintliteralissatised.
Finally,forconcatenationandstringequalityconstraints,step1guaranteedthatthecorrespondingcharacterswereeitheridentiedorconstrainedtobeequal,sobyconstructionorbythesoundnessofthebit-vectorsolver(respectively),theirconstraintliteralswillbesatised.
Thustheassignmentproducedbythealgorithmgivesanappropriatevaluetoeachconstraintliteralsuchthattheconstraintformulaissatised.
Boundedcompleteness.
Forboundedcompleteness,sup-posetothecontrarythatthereexistsasetofsolu-tionstringssiwithlengthsliforasetofconstraints,butthatthealgorithmwhenrunonthoseconstraintsprintsUnsatisfiable.
Considertwocases,accordingtowhetherthelengthsliwereeverreturnedbytheprocedureSolveLengthsduringexecution.
Ifthelengthswerereturnedonsomeiterationj,thenbytheboundedcompletenessoftheregularexpressiontranslation,andthecorrectnessofthetranslationofstringequalityandconcatenationconstraints,thebit-vectorconstraintsforthoselengthsmustbesatisablebythestringvaluessi.
However,thealgorithmcouldonlyhaveprintedUnsatisfiableifthebit-vectorsolverfoundthebit-vectorconstraintsunsatisable,contradictingitscompleteness.
Ontheotherhand,supposethatthelengthsliwereneverreturnedbySolveLengths.
BecausetheprogramprintedUnsatisfiable,thenalcalltoSolveLengthsmusthavereturnedUNSATwhensuppliedwithasetoflengthconstraintsL,andsomenumberoffailedlengthassignmentsnotincludingthelengthassignmentli.
Bythesoundnessofthelengthabstraction,thelengthslisatisfyL,butsincetheliassignmentsatisesLandwasnotexcluded,thecompletenessofthebit-vectorsolverimpliesthatitshouldhavebeenreturnedbySolveLengths,contradictingourassumptionthatitwasneverreturnedbySolveLengths.
Thusneithercaseispossible,sowerejecttheassumptionandconcludethatifasolutionexists,thealgorithmwillnotprintUnsatisfiable.
Termination.
Fortermination,onlythemainwhileloopmustbeconsidered,sinceallofthesubroutinesareguaran-teedtoterminate.
Toseethattheloopterminates,weobservethatthesetFailLenDBoffailedlengthassignmentsgrowsoneachiterationofthemainloop,sincewhenSolveLengthsgivesSAT,italwaysreturnsanewlengthassignmentdistinctfromthosepreviouslyinFailLenDB.
Ontheotherhand,thelengthassignmentsreturnedbySolveLengthsalwayssatisfythelengthboundsinC.
Butbecausethelengthofeachstringisanon-negativeintegernogreaterthanthecorrespondinglengthbound,thereareonlynitelymanypossiblelengthassignmentsthatsatisfythebounds.
ThusthecorrectnessofSolveLengthsandthefactthatFailLenDBgrowsmonotonicallytogetherguaranteethatSolveLengthswilleventuallyreturnUNSAT,causingthealgorithmtoterminate.
profitserver正在对德国vps(法兰克福)、西班牙vps(马德里)、荷兰vps(杜廷赫姆)这3处数据中心内的VPS进行5折优惠促销。所有VPS基于KVM虚拟,纯SSD阵列,自带一个IPv4,不限制流量,在后台支持自定义ISO文件,方便大家折腾!此外还有以下数据中心:俄罗斯(多机房)、捷克、保加利亚、立陶宛、新加坡、美国(洛杉矶、锡考克斯、迈阿密)、瑞士、波兰、乌克兰,VPS和前面的一样性...
最近看到群里的不少网友在搭建大数据内容网站,内容量有百万篇幅,包括图片可能有超过50GB,如果一台服务器有需要多个站点的话,那肯定默认的服务器50GB存储空间是不够用的。如果单独在购买数据盘会成本提高不少。这里我们看到腾讯云促销活动中有2款带大数据盘的套餐还是比较实惠的,一台是400GB数据盘,一台是800GB数据盘,适合他们的大数据网站。 直达链接 - 腾讯云 大数据盘套餐服务器这里我们看到当前...
wordpress外贸集团企业主题,wordpress通用跨屏外贸企业响应式布局设计,内置更完善的外贸企业网站优化推广功能,完善的企业产品营销展示 + 高效后台自定义设置。wordpress高级推广外贸主题,采用标准的HTML5+CSS3语言开发,兼容当下的各种主流浏览器,根据用户行为以及设备环境(系统平台、屏幕尺寸、屏幕定向等)进行自适应显示; 完美实现一套主题程序支持全部终端设备,保证网站在各...
www.topit.me为你推荐
申请sns企业邮局系统企业邮件系统用什么软件好?outlookexpress家里电脑老是弹出“outlook express”这个东西,怎么除去啊?sqlserver2000挂起安装sqlserver2000时总提示有挂起操作!文档下载请问手机版wps如何把云文档下载到手机上的本地文档?瑞东集团请问富源集团到底是一个怎么样的集团?三五互联科技股份有限公司厦门三五互联科技股份有限公司 怎么样?中国保健养猪网中央7台致富经养猪站点管理有关站点的知识介绍?discuz论坛discuz论坛怎么做
org域名 免备案空间 idc评测网 512av 英文简历模板word 日志分析软件 私有云存储 网通代理服务器 免空 165邮箱 怎么测试下载速度 网站加速 cdn服务 nnt japanese50m咸熟 ftp是什么东西 iptables 免费论坛空间 腾讯qq空间登录首页 文件服务器硬件配置 更多