Baseuserinit
userinit 时间:2021-04-04 阅读:(
)
xv6asimple,Unix-liketeachingoperatingsystemRussCoxFransKaashoekRobertMorrisxv6-book@pdos.
csail.
mit.
eduDraftasofSeptember4,2018Contents0Operatingsysteminterfaces71Operatingsystemorganization172Pagetables293Traps,interrupts,anddrivers394Locking515Scheduling616Filesystem757Summary93APChardware95BThebootloader99Index105DRAFTasofSeptember4,20183https://pdos.
csail.
mit.
edu/6.
828/xv6ForewordandacknowledgementsThisisadrafttextintendedforaclassonoperatingsystems.
Itexplainsthemaincon-ceptsofoperatingsystemsbystudyinganexamplekernel,namedxv6.
xv6isare-im-plementationofDennisRitchie'sandKenThompson'sUnixVersion6(v6).
xv6loose-lyfollowsthestructureandstyleofv6,butisimplementedinANSICforanx86-basedmultiprocessor.
Thetextshouldbereadalongwiththesourcecodeforxv6.
ThisapproachisinspiredbyJohnLions'sCommentaryonUNIX6thEdition(PeertoPeerCommunications;IS-BN:1-57398-013-7;1stedition(June14,2000)).
Seehttps://pdos.
csail.
mit.
edu/6.
828forpointerstoon-lineresourcesforv6andxv6,includingseveralhands-onhomeworkas-signmentsusingxv6.
Wehaveusedthistextin6.
828,theoperatingsystemsclassatMIT.
Wethankthefac-ulty,teachingassistants,andstudentsof6.
828whohavealldirectlyorindirectlycon-tributedtoxv6.
Inparticular,wewouldliketothankAustinClementsandNickolaiZeldovich.
Finally,wewouldliketothankpeoplewhoemailedusbugsinthetextorsuggestionsforprovements:AbutalibAghayev,SebastianBoehm,AntonBurtsev,RaphaelCarvalho,RasitEskicioglu,ColorFuzzy,Giuseppe,TaoGuo,RobertHilder-man,WolfgangKeller,AustinLiew,PavanMaddamsetti,JacekMasiulaniec,MichaelMcConville,miguelgvieira,MarkMorrissey,HarryPan,AskarSan,SalmanShah,Rus-lanSavchenko,PawelSzczurko,WarrenToomey,tyfkda,andZouChangWei.
Ifyouspoterrorsorhavesuggestionsforimprovement,pleasesendemailtoFransKaashoekandRobertMorris(kaashoek,rtm@csail.
mit.
edu).
DRAFTasofSeptember4,20185https://pdos.
csail.
mit.
edu/6.
828/xv6Chapter0OperatingsysteminterfacesThejobofanoperatingsystemistoshareacomputeramongmultipleprogramsandtoprovideamoreusefulsetofservicesthanthehardwarealonesupports.
Theoperatingsystemmanagesandabstractsthelow-levelhardware,sothat,forexample,awordprocessorneednotconcernitselfwithwhichtypeofdiskhardwareisbeingused.
Italsosharesthehardwareamongmultipleprogramssothattheyrun(orap-peartorun)atthesametime.
Finally,operatingsystemsprovidecontrolledwaysforprogramstointeract,sothattheycansharedataorworktogether.
Anoperatingsystemprovidesservicestouserprogramsthroughaninterface.
Designingagoodinterfaceturnsouttobedicult.
Ontheonehand,wewouldliketheinterfacetobesimpleandnarrowbecausethatmakesiteasiertogettheimple-mentationright.
Ontheotherhand,wemaybetemptedtooermanysophisticatedfeaturestoapplications.
Thetrickinresolvingthistensionistodesigninterfacesthatrelyonafewmechanismsthatcanbecombinedtoprovidemuchgenerality.
Thisbookusesasingleoperatingsystemasaconcreteexampletoillustrateoper-atingsystemconcepts.
Thatoperatingsystem,xv6,providesthebasicinterfacesintro-ducedbyKenThompsonandDennisRitchie'sUnixoperatingsystem,aswellasmim-ickingUnix'sinternaldesign.
Unixprovidesanarrowinterfacewhosemechanismscombinewell,oeringasurprisingdegreeofgenerality.
Thisinterfacehasbeensosuccessfulthatmodernoperatingsystems—BSD,Linux,MacOSX,Solaris,andeven,toalesserextent,MicrosoftWindows—haveUnix-likeinterfaces.
Understandingxv6isagoodstarttowardunderstandinganyofthesesystemsandmanyothers.
AsshowninFigure0-1,xv6takesthetraditionalformofakernel,aspecialpro-gramthatprovidesservicestorunningprograms.
Eachrunningprogram,calledapro-cess,hasmemorycontaininginstructions,data,andastack.
Theinstructionsimple-menttheprogram'scomputation.
Thedataarethevariablesonwhichthecomputa-tionacts.
Thestackorganizestheprogram'sprocedurecalls.
Whenaprocessneedstoinvokeakernelservice,itinvokesaprocedurecallintheoperatingsysteminterface.
Suchaprocedureiscalledasystemcall.
Thesystemcallentersthekernel;thekernelperformstheserviceandreturns.
Thusaprocessal-ternatesbetweenexecutinginuserspaceandkernelspace.
ThekernelusestheCPU'shardwareprotectionmechanismstoensurethateachprocessexecutinginuserspacecanaccessonlyitsownmemory.
Thekernelexecuteswiththehardwareprivilegesrequiredtoimplementtheseprotections;userprogramsexecutewithoutthoseprivileges.
Whenauserprograminvokesasystemcall,thehardwareraisestheprivilegelevelandstartsexecutingapre-arrangedfunctioninthekernel.
Thecollectionofsystemcallsthatakernelprovidesistheinterfacethatuserpro-gramssee.
Thexv6kernelprovidesasubsetoftheservicesandsystemcallsthatUnixkernelstraditionallyoer.
Figure0-2listsallofxv6'ssystemcalls.
DRAFTasofSeptember4,20187https://pdos.
csail.
mit.
edu/6.
828/xv6interfacedesignkernelprocesssystemcalluserspacekernelspaceKernelshellcatuserspacekernelspacesystemcallFigure0-1.
Akernelandtwouserprocesses.
processTherestofthischapteroutlinesxv6'sservices—processes,memory,ledescrip-tors,pipes,andlesystem—andillustratesthemwithcodesnippetsanddiscussionsofhowtheshell,whichistheprimaryuserinterfacetotraditionalUnix-likesystems,usesthem.
Theshell'suseofsystemcallsillustrateshowcarefullytheyhavebeendesigned.
Theshellisanordinaryprogramthatreadscommandsfromtheuserandexe-cutesthem.
Thefactthattheshellisauserprogram,notpartofthekernel,illustratesthepowerofthesystemcallinterface:thereisnothingspecialabouttheshell.
Italsomeansthattheshelliseasytoreplace;asaresult,modernUnixsystemshaveavarietyofshellstochoosefrom,eachwithitsownuserinterfaceandscriptingfeatures.
Thexv6shellisasimpleimplementationoftheessenceoftheUnixBourneshell.
Itsim-plementationcanbefoundatline(8550).
ProcessesandmemoryAnxv6processconsistsofuser-spacememory(instructions,data,andstack)andper-processstateprivatetothekernel.
Xv6cantime-shareprocesses:ittransparentlyswitchestheavailableCPUsamongthesetofprocesseswaitingtoexecute.
Whenaprocessisnotexecuting,xv6savesitsCPUregisters,restoringthemwhenitnextrunstheprocess.
Thekernelassociatesaprocessidentier,orpid,witheachprocess.
Aprocessmaycreateanewprocessusingtheforksystemcall.
Forkcreatesanewprocess,calledthechildprocess,withexactlythesamememorycontentsasthecallingprocess,calledtheparentprocess.
Forkreturnsinboththeparentandthechild.
Intheparent,forkreturnsthechild'spid;inthechild,itreturnszero.
Forexample,considerthefollowingprogramfragment:intpid=fork();if(pid>0){printf("parent:child=%d\n",pid);pid=wait();printf("child%disdone\n",pid);}elseif(pid==0){printf("child:exiting\n");exit();}else{printf("forkerror\n");}Theexitsystemcallcausesthecallingprocesstostopexecutingandtoreleasere-sourcessuchasmemoryandopenles.
ThewaitsystemcallreturnsthepidofanDRAFTasofSeptember4,20188https://pdos.
csail.
mit.
edu/6.
828/xv6shelltime-sharepid+codefork+codechildprocessparentprocessfork+codeexit+codewait+codeSystemcallDescriptionfork()Createaprocessexit()Terminatethecurrentprocesswait()Waitforachildprocesstoexitkill(pid)Terminateprocesspidgetpid()Returnthecurrentprocess'spidsleep(n)Sleepfornclockticksexec(lename,*argv)Loadaleandexecuteitsbrk(n)Growprocess'smemorybynbytesopen(lename,ags)Openale;theagsindicateread/writeread(fd,buf,n)Readnbytesfromanopenleintobufwrite(fd,buf,n)Writenbytestoanopenleclose(fd)Releaseopenlefddup(fd)Duplicatefdpipe(p)Createapipeandreturnfd'sinpchdir(dirname)Changethecurrentdirectorymkdir(dirname)Createanewdirectorymknod(name,major,minor)Createadevicelefstat(fd)Returninfoaboutanopenlelink(f1,f2)Createanothername(f2)forthelef1unlink(lename)RemovealeFigure0-2.
Xv6systemcallsexitedchildofthecurrentprocess;ifnoneofthecaller'schildrenhasexited,waitwaitsforonetodoso.
Intheexample,theoutputlinesparent:child=1234child:exitingmightcomeoutineitherorder,dependingonwhethertheparentorchildgetstoitsprintfcallrst.
Afterthechildexitstheparent'swaitreturns,causingtheparenttoprintparent:child1234isdoneAlthoughthechildhasthesamememorycontentsastheparentinitially,theparentandchildareexecutingwithdierentmemoryanddierentregisters:changingavari-ableinonedoesnotaecttheother.
Forexample,whenthereturnvalueofwaitisstoredintopidintheparentprocess,itdoesn'tchangethevariablepidinthechild.
Thevalueofpidinthechildwillstillbezero.
Theexecsystemcallreplacesthecallingprocess'smemorywithanewmemoryimageloadedfromalestoredinthelesystem.
Thelemusthaveaparticularfor-mat,whichspecieswhichpartoftheleholdsinstructions,whichpartisdata,atwhichinstructiontostart,etc.
xv6usestheELFformat,whichChapter2discussesinmoredetail.
Whenexecsucceeds,itdoesnotreturntothecallingprogram;instead,theinstructionsloadedfromthelestartexecutingattheentrypointdeclaredintheELFheader.
Exectakestwoarguments:thenameofthelecontainingtheexecutableandanarrayofstringarguments.
Forexample:DRAFTasofSeptember4,20189https://pdos.
csail.
mit.
edu/6.
828/xv6wait+codeprintf+codewait+codeexec+codeexec+codechar*argv[3];argv[0]="echo";argv[1]="hello";argv[2]=0;exec("/bin/echo",argv);printf("execerror\n");Thisfragmentreplacesthecallingprogramwithaninstanceoftheprogram/bin/echorunningwiththeargumentlistechohello.
Mostprogramsignoretherstargument,whichisconventionallythenameoftheprogram.
Thexv6shellusestheabovecallstorunprogramsonbehalfofusers.
Themainstructureoftheshellissimple;seemain(8701).
Themainloopreadsalineofinputfromtheuserwithgetcmd.
Thenitcallsfork,whichcreatesacopyoftheshellpro-cess.
Theparentcallswait,whilethechildrunsthecommand.
Forexample,iftheuserhadtyped''echohello''totheshell,runcmdwouldhavebeencalledwith''echohello''astheargument.
runcmd(8606)runstheactualcommand.
For''echohello'',itwouldcallexec(8626).
Ifexecsucceedsthenthechildwillexecuteinstructionsfromechoinsteadofruncmd.
Atsomepointechowillcallexit,whichwillcausethepar-enttoreturnfromwaitinmain(8701).
Youmightwonderwhyforkandexecarenotcombinedinasinglecall;wewillseelaterthatseparatecallsforcreatingaprocessandloadingaprogramisacleverdesign.
Xv6allocatesmostuser-spacememoryimplicitly:forkallocatesthememoryre-quiredforthechild'scopyoftheparent'smemory,andexecallocatesenoughmemorytoholdtheexecutablele.
Aprocessthatneedsmorememoryatrun-time(perhapsformalloc)cancallsbrk(n)togrowitsdatamemorybynbytes;sbrkreturnsthelocationofthenewmemory.
Xv6doesnotprovideanotionofusersorofprotectingoneuserfromanother;inUnixterms,allxv6processesrunasroot.
I/OandFiledescriptorsAledescriptorisasmallintegerrepresentingakernel-managedobjectthataprocessmayreadfromorwriteto.
Aprocessmayobtainaledescriptorbyopeningale,directory,ordevice,orbycreatingapipe,orbyduplicatinganexistingdescrip-tor.
Forsimplicitywe'lloftenrefertotheobjectaledescriptorreferstoasa''le'';theledescriptorinterfaceabstractsawaythedierencesbetweenles,pipes,andde-vices,makingthemalllooklikestreamsofbytes.
Internally,thexv6kernelusestheledescriptorasanindexintoaper-processta-ble,sothateveryprocesshasaprivatespaceofledescriptorsstartingatzero.
Byconvention,aprocessreadsfromledescriptor0(standardinput),writesoutputtoledescriptor1(standardoutput),andwriteserrormessagestoledescriptor2(standarderror).
Aswewillsee,theshellexploitstheconventiontoimplementI/Oredirectionandpipelines.
Theshellensuresthatitalwayshasthreeledescriptorsopen(8707),whicharebydefaultledescriptorsfortheconsole.
Thereadandwritesystemcallsreadbytesfromandwritebytestoopenlesnamedbyledescriptors.
Thecallread(fd,buf,n)readsatmostnbytesfromtheDRAFTasofSeptember4,201810https://pdos.
csail.
mit.
edu/6.
828/xv6getcmd+codefork+codeexec+codefork+codeexec+codemalloc+codesbrk+codeledescriptorledescriptorfd,copiesthemintobuf,andreturnsthenumberofbytesread.
Eachledescriptorthatreferstoalehasanosetassociatedwithit.
Readreadsdatafromthecurrentleosetandthenadvancesthatosetbythenumberofbytesread:asubsequentreadwillreturnthebytesfollowingtheonesreturnedbytherstread.
Whentherearenomorebytestoread,readreturnszerotosignaltheendofthele.
Thecallwrite(fd,buf,n)writesnbytesfrombuftotheledescriptorfdandreturnsthenumberofbyteswritten.
Fewerthannbytesarewrittenonlywhenaner-roroccurs.
Likeread,writewritesdataatthecurrentleosetandthenadvancesthatosetbythenumberofbyteswritten:eachwritepicksupwherethepreviousonelefto.
Thefollowingprogramfragment(whichformstheessenceofcat)copiesdatafromitsstandardinputtoitsstandardoutput.
Ifanerroroccurs,itwritesamessagetothestandarderror.
charbuf[512];intn;for(;;){n=read(0,buf,sizeofbuf);if(n==0)break;if(noutput.
txt.
Thedupsystemcallduplicatesanexistingledescriptor,returninganewonethatreferstothesameunderlyingI/Oobject.
Bothledescriptorsshareanoset,justastheledescriptorsduplicatedbyforkdo.
Thisisanotherwaytowritehelloworldintoale:fd=dup(1);write(1,"hello",6);write(fd,"world\n",6);Twoledescriptorsshareanosetiftheywerederivedfromthesameoriginalledescriptorbyasequenceofforkanddupcalls.
Otherwiseledescriptorsdonotshareosets,eveniftheyresultedfromopencallsforthesamele.
DupallowsshellsDRAFTasofSeptember4,201812https://pdos.
csail.
mit.
edu/6.
828/xv6toimplementcommandslikethis:lsexisting-filenon-existing-file>tmp12>&1.
The2>&1tellstheshelltogivethecommandaledescriptor2thatisadupli-cateofdescriptor1.
Boththenameoftheexistingleandtheerrormessageforthenon-existinglewillshowupintheletmp1.
Thexv6shelldoesn'tsupportI/Oredi-rectionfortheerrorledescriptor,butnowyouknowhowtoimplementit.
Filedescriptorsareapowerfulabstraction,becausetheyhidethedetailsofwhattheyareconnectedto:aprocesswritingtoledescriptor1maybewritingtoale,toadeviceliketheconsole,ortoapipe.
PipesApipeisasmallkernelbuerexposedtoprocessesasapairofledescriptors,oneforreadingandoneforwriting.
Writingdatatooneendofthepipemakesthatdataavailableforreadingfromtheotherendofthepipe.
Pipesprovideawayforprocessestocommunicate.
Thefollowingexamplecoderunstheprogramwcwithstandardinputconnectedtothereadendofapipe.
intp[2];char*argv[2];argv[0]="wc";argv[1]=0;pipe(p);if(fork()==0){close(0);dup(p[0]);close(p[0]);close(p[1]);exec("/bin/wc",argv);}else{close(p[0]);write(p[1],"helloworld\n",12);close(p[1]);}Theprogramcallspipe,whichcreatesanewpipeandrecordsthereadandwriteledescriptorsinthearrayp.
Afterfork,bothparentandchildhaveledescriptorsrefer-ringtothepipe.
Thechilddupsthereadendontoledescriptor0,closesthelede-scriptorsinp,andexecswc.
Whenwcreadsfromitsstandardinput,itreadsfromthepipe.
Theparentclosesthereadsideofthepipe,writestothepipe,andthenclosesthewriteside.
Ifnodataisavailable,areadonapipewaitsforeitherdatatobewrittenorallledescriptorsreferringtothewriteendtobeclosed;inthelattercase,readwillre-turn0,justasiftheendofadatalehadbeenreached.
Thefactthatreadblocksuntilitisimpossiblefornewdatatoarriveisonereasonthatit'simportantforthechildtoclosethewriteendofthepipebeforeexecutingwcabove:ifoneofwc'sledescriptorsreferredtothewriteendofthepipe,wcwouldneverseeend-of-le.
DRAFTasofSeptember4,201813https://pdos.
csail.
mit.
edu/6.
828/xv6pipeThexv6shellimplementspipelinessuchasgrepforksh.
c|wc-linaman-nersimilartotheabovecode(8650).
Thechildprocesscreatesapipetoconnecttheleftendofthepipelinewiththerightend.
Thenitcallsforkandruncmdfortheleftendofthepipelineandforkandruncmdfortherightend,andwaitsforbothton-ish.
Therightendofthepipelinemaybeacommandthatitselfincludesapipe(e.
g.
,a|b|c),whichitselfforkstwonewchildprocesses(oneforbandoneforc).
Thus,theshellmaycreateatreeofprocesses.
Theleavesofthistreearecommandsandtheinteriornodesareprocessesthatwaituntiltheleftandrightchildrencomplete.
Inprinciple,youcouldhavetheinteriornodesruntheleftendofapipeline,butdoingsocorrectlywouldcomplicatetheimplementation.
Pipesmayseemnomorepowerfulthantemporaryles:thepipelineechohelloworld|wccouldbeimplementedwithoutpipesasechohelloworld>/tmp/xyz;wcxxxtorefertoelementsoftheprocstructure.
Eachprocesshasathreadofexecution(orthreadforshort)thatexecutesthepro-cess'sinstructions.
Athreadcanbesuspendedandlaterresumed.
Toswitchtranspar-entlybetweenprocesses,thekernelsuspendsthecurrentlyrunningthreadandresumesanotherprocess'sthread.
Muchofthestateofathread(localvariables,functioncallreturnaddresses)isstoredonthethread'sstacks.
Eachprocesshastwostacks:auserstackandakernelstack(p->kstack).
Whentheprocessisexecutinguserinstructions,onlyitsuserstackisinuse,anditskernelstackisempty.
Whentheprocessentersthekernel(forasystemcallorinterrupt),thekernelcodeexecutesontheprocess'skernelstack;whileaprocessisinthekernel,itsuserstackstillcontainssaveddata,butisn'tactivelyused.
Aprocess'sthreadalternatesbetweenactivelyusingitsuserstackanditskernelstack.
Thekernelstackisseparate(andprotectedfromusercode)sothatthekernelcanexecuteevenifaprocesshaswreckeditsuserstack.
Whenaprocessmakesasystemcall,theprocessorswitchestothekernelstack,raisesthehardwareprivilegelevel,andstartsexecutingthekernelinstructionsthatim-plementthesystemcall.
Whenthesystemcallcompletes,thekernelreturnstouserspace:thehardwarelowersitsprivilegelevel,switchesbacktotheuserstack,andre-sumesexecutinguserinstructionsjustafterthesystemcallinstruction.
Aprocess'sthreadcan''block''inthekerneltowaitforI/O,andresumewhereitleftowhentheDRAFTasofSeptember4,201821https://pdos.
csail.
mit.
edu/6.
828/xv6structproc+codep->xxx+codethreadp->kstack+code00x800000000xFFFFFFFF0x80100000textanddataBIOSVirtualaddressspacePhysicalmemoryTopphysicalmemorykerneltextanddata4Mbyte0BIOStextanddataFigure1-3.
LayoutofavirtualaddressspaceI/Ohasnished.
p->stateindicateswhethertheprocessisallocated,readytorun,running,wait-ingforI/O,orexiting.
p->pgdirholdstheprocess'spagetable,intheformatthatthex86hardwareex-pects.
xv6causesthepaginghardwaretouseaprocess'sp->pgdirwhenexecutingthatprocess.
Aprocess'spagetablealsoservesastherecordoftheaddressesofthephysicalpagesallocatedtostoretheprocess'smemory.
Code:therstaddressspaceTomakethexv6organizationmoreconcrete,we'lllookhowthekernelcreatestherstaddressspace(foritself),howthekernelcreatesandstartstherstprocess,andhowthatprocessperformstherstsystemcall.
Bytracingtheseoperationsweseeindetailhowxv6providesstrongisolationforprocesses.
Therststepinprovidingstrongiso-lationissettingupthekerneltoruninitsownaddressspace.
WhenaPCpowerson,itinitializesitselfandthenloadsabootloaderfromdiskintomemoryandexecutesit.
AppendixBexplainsthedetails.
Xv6'sbootloaderloadsthexv6kernelfromdiskandexecutesitstartingatentry(1044).
Thex86paginghard-wareisnotenabledwhenthekernelstarts;virtualaddressesmapdirectlytophysicaladdresses.
Thebootloaderloadsthexv6kernelintomemoryatphysicaladdress0x100000.
Thereasonitdoesn'tloadthekernelat0x80100000,wherethekernelexpectstonditsinstructionsanddata,isthattheremaynotbeanyphysicalmemoryatsuchahighaddressonasmallmachine.
Thereasonitplacesthekernelat0x100000ratherthan0x0isbecausetheaddressrange0xa0000:0x100000containsI/Odevices.
Toallowtherestofthekerneltorun,entrysetsupapagetablethatmapsvirtu-DRAFTasofSeptember4,201822https://pdos.
csail.
mit.
edu/6.
828/xv6p->state+codep->pgdir+codebootloaderentry+codealaddressesstartingat0x80000000(calledKERNBASE(0207))tophysicaladdressesstart-ingat0x0(seeFigure1-2).
Settinguptworangesofvirtualaddressesthatmaptothesamephysicalmemoryrangeisacommonuseofpagetables,andwewillseemoreexampleslikethisone.
Theentrypagetableisdenedinmain.
c(1306).
Welookatthedetailsofpageta-blesinChapter2,buttheshortstoryisthatentry0mapsvirtualaddresses0:0x400000tophysicaladdresses0:0x400000.
Thismappingisrequiredaslongasentryisexecutingatlowaddresses,butwilleventuallyberemoved.
Entry512mapsvirtualaddressesKERNBASE:KERNBASE+0x400000tophysicalad-dresses0:0x400000.
Thisentrywillbeusedbythekernelafterentryhasnished;itmapsthehighvirtualaddressesatwhichthekernelexpectstonditsinstructionsanddatatothelowphysicaladdresseswherethebootloaderloadedthem.
Thismappingrestrictsthekernelinstructionsanddatato4Mbytes.
Returningtoentry,itloadsthephysicaladdressofentrypgdirintocontrolreg-ister%cr3.
Thevaluein%cr3mustbeaphysicaladdress.
Itwouldn'tmakesensefor%cr3toholdthevirtualaddressofentrypgdir,becausethepaginghardwaredoesn'tknowhowtotranslatevirtualaddressesyet;itdoesn'thaveapagetableyet.
Thesym-bolentrypgdirreferstoanaddressinhighmemory,andthemacroV2P_WO(0213)subtractsKERNBASEinordertondthephysicaladdress.
Toenablethepaginghard-ware,xv6setstheagCR0_PGinthecontrolregister%cr0.
Theprocessorisstillexecutinginstructionsatlowaddressesafterpagingisen-abled,whichworkssinceentrypgdirmapslowaddresses.
Ifxv6hadomittedentry0fromentrypgdir,thecomputerwouldhavecrashedwhentryingtoexecutethein-structionaftertheonethatenabledpaging.
Nowentryneedstotransfertothekernel'sCcode,andrunitinhighmemory.
Firstitmakesthestackpointer,%esp,pointtomemorytobeusedasastack(1058).
Allsymbolshavehighaddresses,includingstack,sothestackwillstillbevalidevenwhenthelowmappingsareremoved.
Finallyentryjumpstomain,whichisalsoahighaddress.
TheindirectjumpisneededbecausetheassemblerwouldotherwisegenerateaPC-relativedirectjump,whichwouldexecutethelow-memoryversionofmain.
Maincannotreturn,sincethethere'snoreturnPConthestack.
Nowthekernelisrunninginhighaddressesinthefunctionmain(1217).
Code:creatingtherstprocessNowwe'lllookathowthekernelcreatesuser-levelprocessesandensuresthattheyarestronglyisolated.
Aftermain(1217)initializesseveraldevicesandsubsystems,itcreatestherstpro-cessbycallinguserinit(2520).
Userinit'srstactionistocallallocproc.
Thejobofallocproc(2473)istoallocateaslot(astructproc)intheprocesstableandtoinitializethepartsoftheprocess'sstaterequiredforitskernelthreadtoexecute.
Al-locprociscalledforeachnewprocess,whileuserinitiscalledonlyfortheveryrstprocess.
AllocprocscanstheproctableforaslotwithstateUNUSED(2480-2482).
Whenitndsanunusedslot,allocprocsetsthestatetoEMBRYOtomarkitasusedandgivestheprocessauniquepid(2469-2489).
Next,ittriestoallocateakernelstackforDRAFTasofSeptember4,201823https://pdos.
csail.
mit.
edu/6.
828/xv6KERNBASE+codeentry+codeentrypgdir+codeV2P_WO+codeCR0_PG+codemain+codemain+codemain+codeallocproc+codeEMBRYO+codepid+codep->kstackp->contextp->tfespeipeditrapreteip.
.
.
edi(empty).
.
.
.
.
.
topofnewstackaddressforkretwillreturntoFigure1-4.
Anewkernelstack.
theprocess'skernelthread.
Ifthememoryallocationfails,allocprocchangesthestatebacktoUNUSEDandreturnszerotosignalfailure.
Nowallocprocmustsetupthenewprocess'skernelstack.
allocprociswrittensothatitcanbeusedbyforkaswellaswhencreatingtherstprocess.
allocprocsetsupthenewprocesswithaspeciallypreparedkernelstackandsetofkernelregis-tersthatcauseitto''return''touserspacewhenitrstruns.
Thelayoutofthepre-paredkernelstackwillbeasshowninFigure1-4.
allocprocdoespartofthisworkbysettingupreturnprogramcountervaluesthatwillcausethenewprocess'skernelthreadtorstexecuteinforkretandthenintrapret(2507-2512).
Thekernelthreadwillstartexecutingwithregistercontentscopiedfromp->context.
Thussettingp->context->eiptoforkretwillcausethekernelthreadtoexecuteatthestartofforkret(2853).
Thisfunctionwillreturntowhateveraddressisatthebottomofthestack.
Thecontextswitchcode(3059)setsthestackpointertopointjustbeyondtheendofp->context.
allocprocplacesp->contextonthestack,andputsapointertotrapretjustaboveit;thatiswhereforkretwillreturn.
trapretrestoresuserregis-tersfromvaluesstoredatthetopofthekernelstackandjumpsintotheprocess(3324).
Thissetupisthesameforordinaryforkandforcreatingtherstprocess,thoughinthelattercasetheprocesswillstartexecutingatuser-spacelocationzeroratherthanatareturnfromfork.
AswewillseeinChapter3,thewaythatcontroltransfersfromusersoftwaretothekernelisviaaninterruptmechanism,whichisusedbysystemcalls,interrupts,andexceptions.
Whenevercontroltransfersintothekernelwhileaprocessisrunning,thehardwareandxv6trapentrycodesaveuserregistersontheprocess'skernelstack.
userinitwritesvaluesatthetopofthenewstackthatlookjustlikethosethatwouldDRAFTasofSeptember4,201824https://pdos.
csail.
mit.
edu/6.
828/xv6forkret+codetrapret+codep->context+codeforkret+codetrapret+codeforkret+codetrapret+codeuserinit+codebethereiftheprocesshadenteredthekernelviaaninterrupt(2533-2539),sothattheor-dinarycodeforreturningfromthekernelbacktotheprocess'susercodewillwork.
Thesevaluesareastructtrapframewhichstorestheuserregisters.
Nowthenewprocess'skernelstackiscompletelypreparedasshowninFigure1-4.
Therstprocessisgoingtoexecuteasmallprogram(initcode.
S;(8400)).
Theprocessneedsphysicalmemoryinwhichtostorethisprogram,theprogramneedstobecopiedtothatmemory,andtheprocessneedsapagetablethatmapsuser-spaceaddressestothatmemory.
userinitcallssetupkvm(1818)tocreateapagetablefortheprocesswith(atrst)mappingsonlyformemorythatthekerneluses.
WewillstudythisfunctionindetailinChapter2,butatahighlevelsetupkvmanduserinitcreateanaddressspaceasshowninFigure1-2.
Theinitialcontentsoftherstprocess'suser-spacememoryarethecompiledformofinitcode.
S;aspartofthekernelbuildprocess,thelinkerembedsthatbinaryinthekernelanddenestwospecialsymbols,_binary_initcode_startand_bina-ry_initcode_size,indicatingthelocationandsizeofthebinary.
Userinitcopiesthatbinaryintothenewprocess'smemorybycallinginituvm,whichallocatesonepageofphysicalmemory,mapsvirtualaddresszerotothatmemory,andcopiesthebi-narytothatpage(1886).
Thenuserinitsetsupthetrapframe(0602)withtheinitialusermodestate:the%csregistercontainsasegmentselectorfortheSEG_UCODEsegmentrunningatprivi-legelevelDPL_USER(i.
e.
,usermoderatherthankernelmode),andsimilarly%ds,%es,and%ssuseSEG_UDATAwithprivilegeDPL_USER.
The%eflagsFL_IFbitissettoal-lowhardwareinterrupts;wewillreexaminethisinChapter3.
Thestackpointer%espissettotheprocess'slargestvalidvirtualaddress,p->sz.
Theinstructionpointerissettotheentrypointfortheinitcode,address0.
Thefunctionuserinitsetsp->nametoinitcodemainlyfordebugging.
Settingp->cwdsetstheprocess'scurrentworkingdirectory;wewillexaminenameiindetailinChapter6.
Oncetheprocessisinitialized,userinitmarksitavailableforschedulingbyset-tingp->statetoRUNNABLE.
Code:RunningtherstprocessNowthattherstprocess'sstateisprepared,itistimetorunit.
Aftermaincallsuserinit,mpmaincallsschedulertostartrunningprocesses(1257).
Scheduler(2758)looksforaprocesswithp->statesettoRUNNABLE,andthere'sonlyone:initproc.
Itsetstheper-cpuvariableproctotheprocessitfoundandcallsswitchuvmtotellthehardwaretostartusingthetargetprocess'spagetable(1879).
Changingpagetableswhileexecutinginthekernelworksbecausesetupkvmcausesallprocesses'pagetablestohaveidenticalmappingsforkernelcodeanddata.
switchuvmalsosetsupataskstatesegmentSEG_TSSthatinstructsthehardwaretoexecutesystemcallsandinter-ruptsontheprocess'skernelstack.
Wewillre-examinethetaskstatesegmentinChapter3.
schedulernowsetsp->statetoRUNNINGandcallsswtch(3059)toperformaDRAFTasofSeptember4,201825https://pdos.
csail.
mit.
edu/6.
828/xv6structtrapframe+codeinitcode.
S+codeuserinit+codesetupkvm+codeinitcode.
S+code_binary_initcode_start+_binary_initcode_size+inituvm+codeSEG_UCODE+codeDPL_USER+codeSEG_UDATA+codeDPL_USER+codeFL_IF+codeuserinit+codep->name+codep->cwd+codenamei+codeuserinit+codeRUNNABLE+codempmain+codescheduler+codeswitchuvm+codesetupkvm+codeSEG_TSS+codescheduler+codeswtch+codecontextswitchtothetargetprocess'skernelthread.
swtchrstsavesthecurrentregis-ters.
Thecurrentcontextisnotaprocessbutratheraspecialper-cpuschedulercon-text,soschedulertellsswtchtosavethecurrenthardwareregistersinper-cpustor-age(cpu->scheduler)ratherthaninanyprocess'skernelthreadcontext.
swtchthenloadsthesavedregistersofthetargetkernelthread(p->context)intothex86hard-wareregisters,includingthestackpointerandinstructionpointer.
We'llexamineswtchinmoredetailinChapter5.
Thenalretinstruction(3078)popsthetargetprocess's%eipfromthestack,nishingthecontextswitch.
Nowtheprocessorisrun-ningonthekernelstackofprocessp.
Allocprochadpreviouslysetinitproc'sp->context->eiptoforkret,sotheretstartsexecutingforkret.
Ontherstinvocation(thatisthisone),forkret(2853)runsinitializationfunctionsthatcannotberunfrommainbecausetheymustberuninthecontextofaregularprocesswithitsownkernelstack.
Then,forkretreturns.
Allocprocarrangedthatthetopwordonthestackafterp->contextispoppedowouldbetrapret,sonowtrapretbeginsexecuting,with%espsettop->tf.
Trapret(3324)usespopinstructionstorestoreregistersfromthetrapframe(0602)justasswtchdidwiththekernelcontext:popalrestoresthegeneralregisters,thenthepoplinstructionsrestore%gs,%fs,%es,and%ds.
Theaddlskipsoverthetwoeldstrapnoanderrcode.
Finally,theiretinstructionpops%cs,%eip,%flags,%esp,and%ssfromthestack.
ThecontentsofthetrapframehavebeentransferredtotheCPUstate,sotheprocessorcontinuesatthe%eipspeciedinthetrapframe.
Forinit-proc,thatmeansvirtualaddresszero,therstinstructionofinitcode.
S.
Atthispoint,%eipholdszeroand%espholds4096.
Thesearevirtualaddressesintheprocess'saddressspace.
Theprocessor'spaginghardwaretranslatesthemintophysicaladdresses.
allocuvmhassetuptheprocess'spagetablesothatvirtualaddresszeroreferstothephysicalmemoryallocatedforthisprocess,andsetaag(PTE_U)thattellsthepaginghardwaretoallowusercodetoaccessthatmemory.
Thefactthatuserinit(2533)setupthelowbitsof%cstoruntheprocess'susercodeatCPL=3meansthattheusercodecanonlyusepageswithPTE_Uset,andcannotmodifysensi-tivehardwareregisterssuchas%cr3.
Sotheprocessisconstrainedtousingonlyitsownmemory.
Therstsystemcall:execNowthatwehaveseenhowthekernelprovidesstrongisolationforprocesses,let'slookathowauser-levelprocessre-entersthekerneltoaskforservicesthatitcannotperformitself.
Therstactionofinitcode.
Sistoinvoketheexecsystemcall.
AswesawinChapter0,execreplacesthememoryandregistersofthecurrentprocesswithanewprogram,butitleavestheledescriptors,processid,andparentprocessunchanged.
Initcode.
S(8409)beginsbypushingthreevaluesonthestack—$argv,$init,and$0—andthensets%eaxtoSYS_execandexecutesintT_SYSCALL:itisaskingthekerneltoruntheexecsystemcall.
Ifallgoeswell,execneverreturns:itstartsrun-ningtheprogramnamedby$init,whichisapointertotheNUL-terminatedstring/init(8422-8424).
Theotherargumentistheargvarrayofcommand-linearguments;DRAFTasofSeptember4,201826https://pdos.
csail.
mit.
edu/6.
828/xv6cpu->scheduler+codeswtch+coderet+codeforkret+coderet+codeforkret+codeforkret+codemain+codep->context+codetrapret+codeswtch+codepopal+codepopl+codeaddl+codeiret+codeinitproc+codeinitcode.
S+codeallocuvm+codePTE_U+codeuserinit+codeexec+codeSYS_exec+codeT_SYSCALL+codeexec+codethezeroattheendofthearraymarksitsend.
Iftheexecfailsanddoesreturn,init-codeloopscallingtheexitsystemcall,whichdenitelyshouldnotreturn(8416-8420).
Thiscodemanuallycraftstherstsystemcalltolooklikeanordinarysystemcall,whichwewillseeinChapter3.
Asbefore,thissetupavoidsspecial-casingtherstprocess(inthiscase,itsrstsystemcall),andinsteadreusescodethatxv6mustprovideforstandardoperation.
Chapter2willcovertheimplementationofexecindetail,butatahighlevelitreplacesinitcodewiththe/initbinary,loadedoutofthelesystem.
Nowinit-code(8400)isdone,andtheprocesswillrun/initinstead.
Init(8510)createsanewconsoledeviceleifneededandthenopensitasledescriptors0,1,and2.
Thenitloops,startingaconsoleshell,handlesorphanedzombiesuntiltheshellexits,andre-peats.
Thesystemisup.
RealworldIntherealworld,onecanndbothmonolithickernelsandmicrokernels.
ManyUnixkernelsaremonolithic.
Forexample,Linuxhasamonolithickernel,althoughsomeOSfunctionsrunasuser-levelservers(e.
g.
,thewindowingsystem).
KernelssuchasL4,Minix,QNXareorganizedasamicrokernelwithservers,andhaveseenwidedeploymentinembeddedsettings.
Mostoperatingsystemshaveadoptedtheprocessconcept,andmostprocesseslooksimilartoxv6's.
Arealoperatingsystemwouldndfreeprocstructureswithanexplicitfreelistinconstanttimeinsteadofthelinear-timesearchinallocproc;xv6usesthelinearscan(therstofmany)forsimplicity.
Exercises1.
Setabreakpointatswtch.
Singlestepwithgdb'sstepithroughtherettoforkret,thenusegdb'sfinishtoproceedtotrapret,thenstepiuntilyougettoinitcodeatvirtualaddresszero.
2.
KERNBASElimitstheamountofmemoryasingleprocesscanuse,whichmightbeirritatingonamachinewithafull4GBofRAM.
WouldraisingKERNBASEallowaprocesstousemorememoryDRAFTasofSeptember4,201827https://pdos.
csail.
mit.
edu/6.
828/xv6exit+code/init+codeinitcode+code/init+codeChapter2PagetablesPagetablesarethemechanismthroughwhichtheoperatingsystemcontrolswhatmemoryaddressesmean.
Theyallowxv6tomultiplextheaddressspacesofdierentprocessesontoasinglephysicalmemory,andtoprotectthememoriesofdierentpro-cesses.
Thelevelofindirectionprovidedbypagetablesallowsmanyneattricks.
xv6usespagetablesprimarilytomultiplexaddressspacesandtoprotectmemory.
Italsousesafewsimplepage-tabletricks:mappingthesamememory(thekernel)inseveraladdressspaces,mappingthesamememorymorethanonceinoneaddressspace(eachuserpageisalsomappedintothekernel'sphysicalviewofmemory),andguardingauserstackwithanunmappedpage.
Therestofthischapterexplainsthepagetablesthatthex86hardwareprovidesandhowxv6usesthem.
Comparedtoareal-worldoperatingsystem,xv6'sdesignisrestrictive,butitdoesillustratethekeyideas.
PaginghardwareAsareminder,x86instructions(bothuserandkernel)manipulatevirtualaddresses.
Themachine'sRAM,orphysicalmemory,isindexedwithphysicaladdresses.
Thex86pagetablehardwareconnectsthesetwokindsofaddresses,bymappingeachvirtualaddresstoaphysicaladdress.
Anx86pagetableislogicallyanarrayof2^20(1,048,576)pagetableentries(PTEs).
EachPTEcontainsa20-bitphysicalpagenumber(PPN)andsomeags.
Thepaginghardwaretranslatesavirtualaddressbyusingitstop20bitstoindexintothepagetabletondaPTE,andreplacingtheaddress'stop20bitswiththePPNinthePTE.
Thepaginghardwarecopiesthelow12bitsunchangedfromthevirtualtothetranslatedphysicaladdress.
Thusapagetablegivestheoperatingsystemcontrolovervirtual-to-physicaladdresstranslationsatthegranularityofalignedchunksof4096(2^12)bytes.
Suchachunkiscalledapage.
AsshowninFigure2-1,theactualtranslationhappensintwosteps.
Apagetableisstoredinphysicalmemoryasatwo-leveltree.
Therootofthetreeisa4096-bytepagedirectorythatcontains1024PTE-likereferencestopagetablepages.
Eachpageta-blepageisanarrayof102432-bitPTEs.
Thepaginghardwareusesthetop10bitsofavirtualaddresstoselectapagedirectoryentry.
Ifthepagedirectoryentryispresent,thepaginghardwareusesthenext10bitsofthevirtualaddresstoselectaPTEfromthepagetablepagethatthepagedirectoryentryrefersto.
IfeitherthepagedirectoryentryorthePTEisnotpresent,thepaginghardwareraisesafault.
Thistwo-levelstructureallowsapagetabletoomitentirepagetablepagesinthecommoncaseinwhichlargerangesofvirtualaddresseshavenomappings.
EachPTEcontainsagbitsthattellthepaginghardwarehowtheassociatedvir-tualaddressisallowedtobeused.
PTE_PindicateswhetherthePTEispresent:ifitisDRAFTasofSeptember4,201829https://pdos.
csail.
mit.
edu/6.
828/xv6pagetableentries(PTEs)pagepagedirectorypagetablepagesPTE_P+codePhysicalPageNumberAVLD6A5CD4WT3U2W1P078910111231PWUWTCDADAVL-Present-Writable-User-1=Write-through,0=Write-back-CacheDisabled-Accessed-Dirty(0inpagedirectory)-Availableforsystemuse20VirtualaddressPhysicalAddress1210DirTableOffset10011023PPNFlags201212PPNOffsetPPNFlags0110232012PageDirectoryPageTableCR3PagetableandpagedirectoryentriesareidenticalexceptfortheDbit.
Figure2-1.
x86pagetablehardware.
notset,areferencetothepagecausesafault(i.
e.
isnotallowed).
PTE_Wcontrolswhetherinstructionsareallowedtoissuewritestothepage;ifnotset,onlyreadsandinstructionfetchesareallowed.
PTE_Ucontrolswhetheruserprogramsareallowedtousethepage;ifclear,onlythekernelisallowedtousethepage.
Figure2-1showshowitallworks.
Theagsandallotherpagehardwarerelatedstructuresaredenedinmmu.
h(0700).
Afewnotesaboutterms.
PhysicalmemoryreferstostoragecellsinDRAM.
Abyteofphysicalmemoryhasanaddress,calledaphysicaladdress.
Instructionsuseonlyvirtualaddresses,whichthepaginghardwaretranslatestophysicaladdresses,andthensendstotheDRAMhardwaretoreadorwritestorage.
Atthislevelofdiscussionthereisnosuchthingasvirtualmemory,onlyvirtualaddresses.
ProcessaddressspaceThepagetablecreatedbyentryhasenoughmappingstoallowthekernel'sCcodetostartrunning.
However,mainimmediatelychangestoanewpagetablebycallingkvmalloc(1840),becausekernelhasamoreelaborateplanfordescribingpro-cessaddressspaces.
Eachprocesshasaseparatepagetable,andxv6tellsthepagetablehardwaretoswitchpagetableswhenxv6switchesbetweenprocesses.
AsshowninFigure2-2,aprocess'susermemorystartsatvirtualaddresszeroandcangrowuptoKERNBASE,al-lowingaprocesstoaddressupto2gigabytesofmemory.
Thelememlayout.
h(0200)DRAFTasofSeptember4,201830https://pdos.
csail.
mit.
edu/6.
828/xv6PTE_W+codePTE_U+codekvmalloc+code0UserdataUsertextUserstackProgramdata&heap+0x100000KerneltextendKERNBASEKerneldata4Gbyte0RW--RW-RWUDevicememory0xFE000000FreememoryRW-R--Virtual0x100000PHYSTOPUnusedifcomputerhasmorethan2GbyteofmemoryExtendedmemory640KbyteI/OspaceBasememoryPhysical4GbyteRWURWUPAGESIZERW-UnusedifPHYSTOPislessthan2Gbyte0xFE000000Figure2-2.
Layoutofthevirtualaddressspaceofaprocessandthelayoutofthephysicaladdressspace.
Notethatifamachinehasmorethan2Gbyteofphysicalmemory,xv6canuseonlythememorythattsbetweenKERNBASEand0xFE00000.
declarestheconstantsforxv6'smemorylayout,andmacrostoconvertvirtualtophysi-caladdresses.
Whenaprocessasksxv6formorememory,xv6rstndsfreephysicalpagestoprovidethestorage,andthenaddsPTEstotheprocess'spagetablethatpointtothenewphysicalpages.
xv6setsthePTE_U,PTE_W,andPTE_PagsinthesePTEs.
Mostprocessesdonotusetheentireuseraddressspace;xv6leavesPTE_PclearinunusedPTEs.
Dierentprocesses'pagetablestranslateuseraddressestodierentpagesofphysicalmemory,sothateachprocesshasprivateusermemory.
Xv6includesallmappingsneededforthekerneltorunineveryprocess'spageta-ble;thesemappingsallappearaboveKERNBASE.
ItmapsvirtualaddressesKERN-BASE:KERNBASE+PHYSTOPto0:PHYSTOP.
Onereasonforthismappingissothatthekernelcanuseitsowninstructionsanddata.
Anotherreasonisthatthekernelsome-timesneedstobeabletowriteagivenpageofphysicalmemory,forexamplewhencreatingpagetablepages;havingeveryphysicalpageappearatapredictablevirtualaddressmakesthisconvenient.
Adefectofthisarrangementisthatxv6cannotmakeuseofmorethan2gigabytesofphysicalmemory,becausethekernelpartofthead-dressspaceis2gigabytes.
Thus,xv6requiresthatPHYSTOPbesmallerthan2giga-bytes,evenifthecomputerhasmorethan2gigabytesofphysicalmemory.
DRAFTasofSeptember4,201831https://pdos.
csail.
mit.
edu/6.
828/xv6Somedevicesthatusememory-mappedI/Oappearatphysicaladdressesstartingat0xFE000000,soxv6pagetablesincludingadirectmappingforthem.
Thus,PHYSTOPmustbesmallerthantwogigabytes-16megabytes(forthedevicememory).
Xv6doesnotsetthePTE_UaginthePTEsaboveKERNBASE,soonlythekernelcanusethem.
Havingeveryprocess'spagetablecontainmappingsforbothusermemoryandtheentirekernelisconvenientwhenswitchingfromusercodetokernelcodeduringsystemcallsandinterrupts:suchswitchesdonotrequirepagetableswitches.
Forthemostpartthekerneldoesnothaveitsownpagetable;itisalmostalwaysborrowingsomeprocess'spagetable.
Toreview,xv6ensuresthateachprocesscanuseonlyitsownmemory.
And,eachprocessseesitsmemoryashavingcontiguousvirtualaddressesstartingatzero,whiletheprocess'sphysicalmemorycanbenon-contiguous.
xv6implementstherstbyset-tingthePTE_UbitonlyonPTEsofvirtualaddressesthatrefertotheprocess'sownmemory.
Itimplementsthesecondusingtheabilityofpagetablestotranslatesucces-sivevirtualaddressestowhateverphysicalpageshappentobeallocatedtotheprocess.
Code:creatinganaddressspacemaincallskvmalloc(1840)tocreateandswitchtoapagetablewiththemappingsaboveKERNBASErequiredforthekerneltorun.
Mostoftheworkhappensinsetup-kvm(1818).
Itrstallocatesapageofmemorytoholdthepagedirectory.
Thenitcallsmappagestoinstallthetranslationsthatthekernelneeds,whicharedescribedinthekmap(1809)array.
Thetranslationsincludethekernel'sinstructionsanddata,physicalmemoryuptoPHYSTOP,andmemoryrangeswhichareactuallyI/Odevices.
setup-kvmdoesnotinstallanymappingsfortheusermemory;thiswillhappenlater.
mappages(1760)installsmappingsintoapagetableforarangeofvirtualaddressestoacorrespondingrangeofphysicaladdresses.
Itdoesthisseparatelyforeachvirtualaddressintherange,atpageintervals.
Foreachvirtualaddresstobemapped,map-pagescallswalkpgdirtondtheaddressofthePTEforthataddress.
Ittheninitial-izesthePTEtoholdtherelevantphysicalpagenumber,thedesiredpermissions(PTE_Wand/orPTE_U),andPTE_PtomarkthePTEasvalid(1772).
walkpgdir(1735)mimicstheactionsofthex86paginghardwareasitlooksupthePTEforavirtualaddress(seeFigure2-1).
walkpgdirusestheupper10bitsofthevirtualaddresstondthepagedirectoryentry(1740).
Ifthepagedirectoryentryisn'tpresent,thentherequiredpagetablepagehasn'tyetbeenallocated;iftheallocargumentisset,walkpgdirallocatesitandputsitsphysicaladdressinthepagedirec-tory.
Finallyitusesthenext10bitsofthevirtualaddresstondtheaddressofthePTEinthepagetablepage(1753).
PhysicalmemoryallocationThekernelmustallocateandfreephysicalmemoryatrun-timeforpagetables,processusermemory,kernelstacks,andpipebuers.
xv6usesthephysicalmemorybetweentheendofthekernelandPHYSTOPforDRAFTasofSeptember4,201832https://pdos.
csail.
mit.
edu/6.
828/xv6PTE_U+codePTE_U+codemain+codekvmalloc+codesetupkvm+codemappages+codekmap+codePHYSTOP+codemappages+codewalkpgdir+codewalkpgdir+codePHYSTOP+coderun-timeallocation.
Itallocatesandfreeswhole4096-bytepagesatatime.
Itkeepstrackofwhichpagesarefreebythreadingalinkedlistthroughthepagesthemselves.
Allocationconsistsofremovingapagefromthelinkedlist;freeingconsistsofaddingthefreedpagetothelist.
Thereisabootstrapproblem:allofphysicalmemorymustbemappedinorderfortheallocatortoinitializethefreelist,butcreatingapagetablewiththosemappingsinvolvesallocatingpage-tablepages.
xv6solvesthisproblembyusingaseparatepageallocatorduringentry,whichallocatesmemoryjustaftertheendofthekernel'sdatasegment.
Thisallocatordoesnotsupportfreeingandislimitedbythe4MBmappingintheentrypgdir,butthatissucienttoallocatetherstkernelpagetable.
Code:PhysicalmemoryallocatorTheallocator'sdatastructureisafreelistofphysicalmemorypagesthatareavail-ableforallocation.
Eachfreepage'slistelementisastructrun(3115).
WheredoestheallocatorgetthememorytoholdthatdatastructureItstoreeachfreepage'srunstructureinthefreepageitself,sincethere'snothingelsestoredthere.
Thefreelistisprotectedbyaspinlock(3119-3123).
Thelistandthelockarewrappedinastructtomakeclearthatthelockprotectstheeldsinthestruct.
Fornow,ignorethelockandthecallstoacquireandrelease;Chapter4willexaminelockingindetail.
Thefunctionmaincallskinit1andkinit2toinitializetheallocator(3131).
Thereasonforhavingtwocallsisthatformuchofmainonecannotuselocksormemoryabove4megabytes.
Thecalltokinit1setsupforlock-lessallocationintherst4megabytes,andthecalltokinit2enableslockingandarrangesformorememorytobeallocatable.
mainoughttodeterminehowmuchphysicalmemoryisavailable,butthisturnsouttobedicultonthex86.
Insteaditassumesthatthemachinehas224megabytes(PHYSTOP)ofphysicalmemory,andusesallthememorybetweentheendofthekernelandPHYSTOPastheinitialpooloffreememory.
kinit1andkinit2callfreerangetoaddmemorytothefreelistviaper-pagecallstokfree.
APTEcanonlyrefertoaphysicaladdressthatisalignedona4096-byteboundary(isamultipleof4096),sofreerangeusesPGROUNDUPtoensurethatitfreesonlyalignedphysicalad-dresses.
Theallocatorstartswithnomemory;thesecallstokfreegiveitsometomanage.
Theallocatorreferstophysicalpagesbytheirvirtualaddressesasmappedinhighmemory,notbytheirphysicaladdresses,whichiswhykinitusesP2V(PHYSTOP)totranslatePHYSTOP(aphysicaladdress)toavirtualaddress.
Theallocatorsometimestreatsaddressesasintegersinordertoperformarithmeticonthem(e.
g.
,traversingallpagesinkinit),andsometimesusesaddressesaspointerstoreadandwritememory(e.
g.
,manipulatingtherunstructurestoredineachpage);thisdualuseofaddressesisthemainreasonthattheallocatorcodeisfullofCtypecasts.
Theotherreasonisthatfreeingandallocationinherentlychangethetypeofthememory.
Thefunctionkfree(3164)beginsbysettingeverybyteinthememorybeingfreedtothevalue1.
Thiswillcausecodethatusesmemoryafterfreeingit(uses''danglingreferences'')toreadgarbageinsteadoftheoldvalidcontents;hopefullythatwillcausesuchcodetobreakfaster.
Thenkfreecastsvtoapointertostructrun,recordstheDRAFTasofSeptember4,201833https://pdos.
csail.
mit.
edu/6.
828/xv6structrun+codemain+codekinit1+codekinit2+codePHYSTOP+codefreerange+codekfree+codePGROUNDUP+codetypecast0KERNBASEtextdatastackheapPAGESIZEargument0argumentN0addressofargument0addressofargumentNaddressofaddressofargument00xFFFFFFF(empty)argc.
.
.
.
.
.
nul-terminatedstringargv[argc]argv[0]argvargumentofmainargcargumentofmainreturnPCformainguardpageFigure2-3.
Memorylayoutofauserprocesswithitsinitialstack.
oldstartofthefreelistinr->next,andsetsthefreelistequaltor.
kallocremovesandreturnstherstelementinthefreelist.
UserpartofanaddressspaceFigure2-3showsthelayoutoftheusermemoryofanexecutingprocessinxv6.
Eachuserprocessstartsataddress0.
Thebottomoftheaddressspacecontainsthetextfortheuserprogram,itsdata,anditsstack.
Theheapisabovethestacksothattheheapcanexpandwhentheprocesscallssbrk.
Notethatthetext,data,andstacksectionsarelayedoutcontiguouslyintheprocess'saddressspacebutxv6isfreetousenon-contiguousphysicalpagesforthosesections.
Forexample,whenxv6expandsaprocess'sheap,itcanuseanyfreephysicalpageforthenewvirtualpageandthenpro-gramthepagetablehardwaretomapthevirtualpagetotheallocatedphysicalpage.
Thisexibilityisamajoradvantageofusingpaginghardware.
Thestackisasinglepage,andisshownwiththeinitialcontentsascreatedbyex-ec.
Stringscontainingthecommand-linearguments,aswellasanarrayofpointerstothem,areattheverytopofthestack.
Justunderthatarevaluesthatallowaprogramtostartatmainasifthefunctioncallmain(argc,argv)hadjuststarted.
Toguardastackgrowingothestackpage,xv6placesaguardpagerightbelowthestack.
Theguardpageisnotmappedandsoifthestackrunsothestackpage,thehardwarewillgenerateanexceptionbecauseitcannottranslatethefaultingaddress.
Areal-worldoperatingsystemmightallocatemorespaceforthestacksothatitcangrowbe-yondonepage.
DRAFTasofSeptember4,201834https://pdos.
csail.
mit.
edu/6.
828/xv6kalloc+codesbrk+codeCode:sbrkSbrkisthesystemcallforaprocesstoshrinkorgrowitsmemory.
Thesystemcallisimplementedbythefunctiongrowproc(2558).
Ifnispostive,growprocallocatesoneormorephysicalpagesandmapsthematthetopoftheprocess'saddressspace.
Ifnisnegative,growprocunmapsoneormorepagesfromtheprocess'saddressspaceandfreesthecorrespondingphysicalpages.
Tomakethesechanges,xv6modiesthepro-cess'spagetable.
Theprocess'spagetableisstoredinmemory,andsothekernelcanupdatethetablewithordinaryassignmentstatements,whichiswhatallocuvmanddeallocuvmdo.
Thex86hardwarecachespagetableentriesinaTranslationLook-asideBuer(TLB),andwhenxv6changesthepagetables,itmustinvalidatethecachedentries.
Ifitdidn'tinvalidatethecachedentries,thenatsomepointlatertheTLBmightuseanoldmapping,pointingtoaphysicalpagethatinthemeantimehasbeenallocatedtoanotherprocess,andasaresult,aprocessmightbeabletoscribbleonsomeotherprocess'smemory.
Xv6invalidatesstalecachedentries,byreloadingcr3,theregisterthatholdstheaddressofthecurrentpagetable.
Code:execExecisthesystemcallthatcreatestheuserpartofanaddressspace.
Itinitializestheuserpartofanaddressspacefromalestoredinthelesystem.
Exec(6610)opensthenamedbinarypathusingnamei(6623),whichisexplainedinChapter6.
Then,itreadstheELFheader.
Xv6applicationsaredescribedinthewidely-usedELFformat,denedinelf.
h.
AnELFbinaryconsistsofanELFheader,structelfhdr(0905),fol-lowedbyasequenceofprogramsectionheaders,structproghdr(0924).
Eachprogh-drdescribesasectionoftheapplicationthatmustbeloadedintomemory;xv6pro-gramshaveonlyoneprogramsectionheader,butothersystemsmighthaveseparatesectionsforinstructionsanddata.
TherststepisaquickcheckthattheleprobablycontainsanELFbinary.
AnELFbinarystartswiththefour-byte''magicnumber''0x7F,'E','L','F',orELF_MAGIC(0902).
IftheELFheaderhastherightmagicnumber,execassumesthatthebinaryiswell-formed.
Execallocatesanewpagetablewithnousermappingswithsetupkvm(6637),allo-catesmemoryforeachELFsegmentwithallocuvm(6651),andloadseachsegmentintomemorywithloaduvm(6655).
allocuvmchecksthatthevirtualaddressesrequestedisbelowKERNBASE.
loaduvm(1903)useswalkpgdirtondthephysicaladdressoftheal-locatedmemoryatwhichtowriteeachpageoftheELFsegment,andreaditoreadfromthele.
Theprogramsectionheaderfor/init,therstuserprogramcreatedwithexec,lookslikethis:#objdump-p_init_init:fileformatelf32-i386ProgramHeader:LOADoff0x00000054vaddr0x00000000paddr0x00000000align2**2DRAFTasofSeptember4,201835https://pdos.
csail.
mit.
edu/6.
828/xv6TranslationLook-asideBuer(TLB)namei+codeELFformatstructelfhdr+codeELF_MAGIC+codesetupkvm+codeallocuvm+codeloaduvm+codeloaduvm+codewalkpgdir+codereadi+code/init+codefilesz0x000008c0memsz0x000008ccflagsrwxTheprogramsectionheader'sfileszmaybelessthanthememsz,indicatingthatthegapbetweenthemshouldbelledwithzeroes(forCglobalvariables)ratherthanreadfromthele.
For/init,fileszis2240bytesandmemszis2252bytes,andthusallocuvmallocatesenoughphysicalmemorytohold2252bytes,butreadsonly2240bytesfromthele/init.
Nowexecallocatesandinitializestheuserstack.
Itallocatesjustonestackpage.
Execcopiestheargumentstringstothetopofthestackoneatatime,recordingthepointerstotheminustack.
Itplacesanullpointerattheendofwhatwillbetheargvlistpassedtomain.
TherstthreeentriesinustackarethefakereturnPC,argc,andargvpointer.
Execplacesaninaccessiblepagejustbelowthestackpage,sothatprogramsthattrytousemorethanonepagewillfault.
Thisinaccessiblepagealsoallowsexectodealwithargumentsthataretoolarge;inthatsituation,thecopyout(2118)functionthatexecusestocopyargumentstothestackwillnoticethatthedestinationpageisnotaccessible,andwillreturn–1.
Duringthepreparationofthenewmemoryimage,ifexecdetectsanerrorlikeaninvalidprogramsegment,itjumpstothelabelbad,freesthenewimage,andre-turns–1.
Execmustwaittofreetheoldimageuntilitissurethatthesystemcallwillsucceed:iftheoldimageisgone,thesystemcallcannotreturn–1toit.
Theonlyer-rorcasesinexechappenduringthecreationoftheimage.
Oncetheimageiscom-plete,execcaninstallthenewimage(6701)andfreetheoldone(6702).
Finally,execreturns0.
ExecloadsbytesfromtheELFleintomemoryataddressesspeciedbytheELFle.
UsersorprocessescanplacewhateveraddressestheywantintoanELFle.
Thusexecisrisky,becausetheaddressesintheELFlemayrefertothekernel,accidentallyoronpurpose.
Theconsequencesforanunwarykernelcouldrangefromacrashtoamalicioussubversionofthekernel'sisolationmechanisms(i.
e.
,asecurityexploit).
xv6performsanumberofcheckstoavoidtheserisks.
Tounderstandtheimportanceofthesechecks,considerwhatcouldhappenifxv6didn'tcheckif(ph.
vaddr+ph.
memsz=KERNBASE)inallocuvm.
Thesubsequentcalltoloaduvmpassesph.
vaddrbyitself,withoutaddingph.
memszandwithoutcheckingph.
vaddragainstKERNBASE,andwouldthuscopydatafromtheELFbinaryintothekernel.
Thiscouldbeexploitedbyauserprogramtorunarbi-traryusercodewithkernelprivileges.
Asthisexampleillustrates,argumentcheckingmustbedonewithgreatcare.
Itiseasyforakerneldevelopertoomitacrucialcheck,andreal-worldkernelshavealonghistoryofmissingcheckswhoseabsencecanbeex-ploitedbyuserprogramstoobtainkernelprivileges.
Itislikelythatxv6doesn'tdoacompletejobofvalidatinguser-leveldatasuppliedtothekernel,whichamalicioususerprogrammightbeabletoexploittocircumventxv6'sisolation.
DRAFTasofSeptember4,201836https://pdos.
csail.
mit.
edu/6.
828/xv6allocuvm+codeexec+codeustack+codeargv+codeargc+codecopyout+codeRealworldLikemostoperatingsystems,xv6usesthepaginghardwareformemoryprotec-tionandmapping.
Mostoperatingsystemsusex86's64-bitpaginghardware(whichhas3levelsoftranslation).
64-bitaddressspacesallowforalessrestrictivememorylayoutthanxv6's;forexample,itwouldbeeasytoremovexv6'slimitof2gigabytesforphysicalmemory.
Mostoperatingsystemsmakefarmoresophisticateduseofpagingthanxv6;forexample,xv6lacksdemandpagingfromdisk,copy-on-writefork,sharedmemory,lazily-allocatedpages,andautomaticallyextendingstacks.
Thex86supportsaddresstranslationusingsegmentation(seeAppendixB),butxv6usessegmentsonlyforthecommontrickofimplementingper-cpuvariablessuchasprocthatareataxedaddressbuthavedierentvaluesondierentCPUs(seeseginit).
Implementa-tionsofper-CPU(orper-thread)storageonnon-segmentarchitectureswoulddedicatearegistertoholdingapointertotheper-CPUdataarea,butthex86hassofewgener-alregistersthattheextraeortrequiredtousesegmentationisworthwhile.
Xv6mapsthekernelintheaddressspaceofeachuserprocessbutsetsitupsothatthekernelpartoftheaddressspaceisinaccessiblewhentheprocessorisinusermode.
Thissetupisconvenientbecauseafteraprocessswitchesfromuserspacetokernelspace,thekernelcaneasilyaccessusermemorybyreadingmemorylocationsdirectly.
Itisprobablybetterforsecurity,however,tohaveaseparatepagetableforthekernelandswitchtothatpagetablewhenenteringthekernelfromusermode,sothatthekernelanduserprocessesaremoreseparatedfromeachother.
Thisdesign,forexample,wouldhelpmitigatingside-channelsthatareexposedbytheMeltdownvulnerabilityandthatallowauserprocesstoreadarbitrarykernelmemory.
Onmachineswithlotsofmemoryitmightmakesensetousethex86's4-megabytes''superpages.
''Smallpagesmakesensewhenphysicalmemoryissmall,toallowallocationandpage-outtodiskwithnegranularity.
Forexample,ifaprogramusesonly8kilobytesofmemory,givingita4megabytesphysicalpageiswasteful.
LargerpagesmakesenseonmachineswithlotsofRAM,andmayreduceoverheadforpage-tablemanipulation.
Xv6usessuperpagesinoneplace:theinitialpagetable(1306).
Thearrayinitializationsetstwoofthe1024PDEs,atindiceszeroand512(KERNBASE>>PDXSHIFT),leavingtheotherPDEszero.
Xv6setsthePTE_PSbitinthesetwoPDEstomarkthemassuperpages.
ThekernelalsotellsthepaginghardwaretoallowsuperpagesbysettingtheCR_PSEbit(PageSizeExtension)in%cr4.
Xv6shoulddeterminetheactualRAMconguration,insteadofassuming224MB.
Onthex86,thereareatleastthreecommonalgorithms:therstistoprobethephysicaladdressspacelookingforregionsthatbehavelikememory,preservingtheval-ueswrittentothem;thesecondistoreadthenumberofkilobytesofmemoryoutofaknown16-bitlocationinthePC'snon-volatileRAM;andthethirdistolookinBIOSmemoryforamemorylayouttableleftaspartofthemultiprocessortables.
Readingthememorylayouttableiscomplicated.
Memoryallocationwasahottopicalongtimeago,thebasicproblemsbeinge-cientuseoflimitedmemoryandpreparingforunknownfuturerequests;seeKnuth.
Todaypeoplecaremoreaboutspeedthanspace-eciency.
Inaddition,amoreelabo-ratekernelwouldlikelyallocatemanydierentsizesofsmallblocks,ratherthan(asinxv6)just4096-byteblocks;arealkernelallocatorwouldneedtohandlesmallalloca-DRAFTasofSeptember4,201837https://pdos.
csail.
mit.
edu/6.
828/xv6seginit+codeCR_PSE+codetionsaswellaslargeones.
Exercises1.
Lookatrealoperatingsystemstoseehowtheysizememory.
2.
Ifxv6hadnotusedsuperpages,whatwouldbetherightdeclarationforen-trypgdir3.
Writeauserprogramthatgrowsitsaddressspacewith1bytebycallingsbrk(1).
Runtheprogramandinvestigatethepagetablefortheprogrambeforethecalltosbrkandafterthecalltosbrk.
HowmuchspacehasthekernelallocatedWhatdoesthepteforthenewmemorycontain4.
Modifyxv6sothatthepagesforthekernelaresharedamongprocesses,whichreducesmemoryconsumption.
5.
Modifyxv6sothatwhenauserprogramdereferencesanullpointer,itwillre-ceiveafault.
Thatis,modifyxv6sothatvirtualaddress0isn'tmappedforuserpro-grams.
6.
Uniximplementationsofexectraditionallyincludespecialhandlingforshellscripts.
Iftheletoexecutebeginswiththetext#!
,thentherstlineistakentobeaprogramtoruntointerpretthele.
Forexample,ifexeciscalledtorunmyprogarg1andmyprog'srstlineis#!
/interp,thenexecruns/interpwithcommandline/interpmyprogarg1.
Implementsupportforthisconventioninxv6.
7.
Deletethecheckif(ph.
vaddr+ph.
memszkstackcpu->ts.
esp0ssespeflagscseipesponlypresentonprivilegechangetrapnodsesfsgseaxecxedxebxoespebpesiedi(empty)Figure3-2.
Thetrapframeonthekernelstack%gs,andthegeneral-purposeregisters(3305-3310).
Theresultofthiseortisthatthekernelstacknowcontainsastructtrapframe(0602)containingtheprocessorregis-tersatthetimeofthetrap(seeFigure3-2).
Theprocessorpushes%ss,%esp,%eflags,%cs,and%eip.
Theprocessororthetrapvectorpushesanerrornumber,andalltrapspushestherest.
Thetrapframecontainsalltheinformationnecessarytorestoretheusermodeprocessorregisterswhenthekernelreturnstothecurrentprocess,sothattheprocessorcancontinueexactlyasitwaswhenthetrapstarted.
RecallfromChapter2,thatuserinitbuiltatrapframebyhandtoachievethisgoal(seeFigure1-4).
Inthecaseoftherstsystemcall,thesaved%eipistheaddressoftheinstructionrightaftertheintinstruction.
%csistheusercodesegmentselector.
%eflagsisthecontentofthe%eflagsregisteratthepointofexecutingtheintinstruction.
Aspartofsavingthegeneral-purposeregisters,alltrapsalsosaves%eax,whichcontainsthesystemcallnumberforthekerneltoinspectlater.
Nowthattheusermodeprocessorregistersaresaved,alltrapscannishingset-tinguptheprocessortorunkernelCcode.
Theprocessorsettheselectors%csand%ssbeforeenteringthehandler;alltrapssets%dsand%es(3313-3315).
Oncethesegmentsaresetproperly,alltrapscancalltheCtraphandlertrap.
Itpushes%esp,whichpointsatthetrapframeitjustconstructed,ontothestackasanargumenttotrap(3318).
Thenitcallstrap(3319).
Aftertrapreturns,alltrapspopstheargumentothestackbyaddingtothestackpointer(3320)andthenstartsexecut-DRAFTasofSeptember4,201843https://pdos.
csail.
mit.
edu/6.
828/xv6alltraps+codealltraps+codealltraps+codetrap+codealltraps+codeingthecodeatlabeltrapret.
WetracedthroughthiscodeinChapter2whentherstuserprocessranittoexittouserspace.
Thesamesequencehappenshere:pop-pingthroughthetrapframerestorestheusermoderegistersandtheniretjumpsbackintouserspace.
Thediscussionsofarhastalkedabouttrapsoccurringinusermode,buttrapscanalsohappenwhilethekernelisexecuting.
Inthatcasethehardwaredoesnotswitchstacksorsavethestackpointerorstacksegmentselector;otherwisethesamestepsoccurasintrapsfromusermode,andthesamexv6traphandlingcodeexecutes.
Wheniretlaterrestoresakernelmode%cs,theprocessorcontinuesexecutinginkernelmode.
Code:CtraphandlerWesawinthelastsectionthateachhandlersetsupatrapframeandthencallstheCfunctiontrap.
Trap(3401)looksatthehardwaretrapnumbertf->trapnotodecidewhyithasbeencalledandwhatneedstobedone.
IfthetrapisT_SYSCALL,trapcallsthesystemcallhandlersyscall.
We'llrevisittheproc->killedchecksinChapter5.
Aftercheckingforasystemcall,traplooksforhardwareinterrupts(whichwedis-cussbelow).
Inadditiontotheexpectedhardwaredevices,atrapcanbecausedbyaspuriousinterrupt,anunwantedhardwareinterrupt.
Ifthetrapisnotasystemcallandnotahardwaredevicelookingforattention,trapassumesitwascausedbyincorrectbehavior(e.
g.
,dividebyzero)aspartofthecodethatwasexecutingbeforethetrap.
Ifthecodethatcausedthetrapwasauserprogram,xv6printsdetailsandthensetsproc->killedtoremembertocleanuptheuserprocess.
Wewilllookathowxv6doesthiscleanupinChapter5.
Ifitwasthekernelrunning,theremustbeakernelbug:trapprintsdetailsaboutthesurpriseandthencallspanic.
Code:SystemcallsForsystemcalls,trapinvokessyscall(3701).
Syscallloadsthesystemcallnumberfromthetrapframe,whichcontainsthesaved%eax,andindexesintothesystemcalltables.
Fortherstsystemcall,%eaxcontainsthevalueSYS_exec(3507),andsyscallwillinvoketheSYS_exec'thentryofthesystemcalltable,whichcorre-spondstoinvokingsys_exec.
Syscallrecordsthereturnvalueofthesystemcallfunctionin%eax.
Whenthetrapreturnstouserspace,itwillloadthevaluesfromcp->tfintothemachineregis-ters.
Thus,whenexecreturns,itwillreturnthevaluethatthesystemcallhandlerre-turned(3708).
Systemcallsconventionallyreturnnegativenumberstoindicateerrors,positivenumbersforsuccess.
Ifthesystemcallnumberisinvalid,syscallprintsanerrorandreturns–1.
Laterchapterswillexaminetheimplementationofparticularsystemcalls.
Thischapterisconcernedwiththemechanismsforsystemcalls.
Thereisonebitofmecha-nismleft:ndingthesystemcallarguments.
Thehelperfunctionsargint,argptr,argstr,DRAFTasofSeptember4,201844https://pdos.
csail.
mit.
edu/6.
828/xv6trapret+codeiret+codetrap+codetf->trapno+codeT_SYSCALL+codesyscall+codeproc->killed+codetrap+codepanic+codetrap+codesyscall+codeSYS_exec+codecp->tf+codesyscall+codeandargfdretrievethen'thsystemcallargument,aseitheraninteger,pointer,astring,oraledescriptor.
argintusestheuser-space%espregistertolocatethen'thargu-ment:%esppointsatthereturnaddressforthesystemcallstub.
Theargumentsarerightaboveit,at%esp+4.
Thenthenthargumentisat%esp+4+4*n.
argintcallsfetchinttoreadthevalueatthataddressfromusermemoryandwriteitto*ip.
fetchintcansimplycasttheaddresstoapointer,becausetheuserandthekernelsharethesamepagetable,butthekernelmustverifythatthepointerlieswithintheuserpartoftheaddressspace.
Thekernelhassetupthepage-tablehardwaretomakesurethattheprocesscannotaccessmemoryoutsideitslocalprivatememory:ifauserprogramtriestoreadorwritememoryatanaddressofp->szorabove,theprocessorwillcauseasegmentationtrap,andtrapwillkilltheprocess,aswesawabove.
Thekernel,however,canderefenceanyaddressthattheusermighthavepassed,soitmustcheckexplicitlythattheaddressisbelowp->sz.
argptrfetchesthenthsystemcallargumentandchecksthatthisargumentisavaliduser-spacepointer.
Notethattwochecksoccurduringacalltoargptr.
First,theuserstackpointerischeckedduringthefetchingoftheargument.
Thentheargument,itselfauserpointer,ischecked.
argstrinterpretsthenthargumentasapointer.
ItensuresthatthepointerpointsataNUL-terminatedstringandthatthecompletestringislocatedbelowtheendoftheuserpartoftheaddressspace.
Finally,argfd(6071)usesarginttoretrievealedescriptornumber,checksifitisvalidledescriptor,andreturnsthecorrespondingstructfile.
Thesystemcallimplementations(forexample,sysproc.
candsysle.
c)aretypicallywrappers:theydecodetheargumentsusingargint,argptr,andargstrandthencalltherealimplementations.
Inchapter2,sys_execusesthesefunctionstogetatitsar-guments.
Code:InterruptsDevicesonthemotherboardcangenerateinterrupts,andxv6mustsetupthehardwaretohandletheseinterrupts.
Devicesusuallyinterruptinordertotelltheker-nelthatsomehardwareeventhasoccured,suchasI/Ocompletion.
Interruptsareusuallyoptionalinthesensethatthekernelcouldinsteadperiodicallycheck(or"poll")thedevicehardwaretocheckfornewevents.
Interruptsarepreferabletopollingiftheeventsarerelativelyrare,sothatpollingwouldwasteCPUtime.
Interrupthandlingsharessomeofthecodealreadyneededforsystemcallsandexceptions.
Interruptsaresimilartosystemcalls,exceptdevicesgeneratethematanytime.
ThereishardwareonthemotherboardtosignaltheCPUwhenadeviceneedsatten-tion(e.
g.
,theuserhastypedacharacteronthekeyboard).
Wemustprogramthede-vicetogenerateaninterrupt,andarrangethataCPUreceivestheinterrupt.
Let'slookatthetimerdeviceandtimerinterrupts.
Wewouldlikethetimerhard-waretogenerateaninterrupt,say,100timespersecondsothatthekernelcantrackthepassageoftimeandsothekernelcantime-sliceamongmultiplerunningprocess-es.
Thechoiceof100timespersecondallowsfordecentinteractiveperformancewhilenotswampingtheprocessorwithhandlinginterrupts.
DRAFTasofSeptember4,201845https://pdos.
csail.
mit.
edu/6.
828/xv6argint+codefetchint+codep->sz+codeargptr+codeargstr+codeargfd+codeLikethex86processoritself,PCmotherboardshaveevolved,andthewayinter-ruptsareprovidedhasevolvedtoo.
Theearlyboardshadasimpleprogrammablein-terruptcontroler(calledthePIC).
WiththeadventofmultiprocessorPCboards,anewwayofhandlinginterruptswasneeded,becauseeachCPUneedsaninterruptcontrollertohandleinterruptssenttoit,andtheremustbeamethodforroutingin-terruptstoprocessors.
Thiswayconsistsoftwoparts:apartthatisintheI/Osystem(theIOAPIC,ioapic.
c),andapartthatisattachedtoeachprocessor(thelocalAPIC,lapic.
c).
Xv6isdesignedforaboardwithmultipleprocessors:itignoresin-terruptsfromthePIC,andcongurestheIOAPICandlocalAPIC.
TheIOAPIChasatableandtheprocessorcanprogramentriesinthetablethroughmemory-mappedI/O.
Duringinitialization,xv6programstomapinterrupt0toIRQ0,andsoon,butdisablesthemall.
Specicdevicesenableparticularinterruptsandsaytowhichprocessortheinterruptshouldberouted.
Forexample,xv6routeskeyboardinterruptstoprocessor0(8274).
Xv6routesdiskinterruptstothehighestnumberedprocessoronthesystem,aswewillseebelow.
ThetimerchipisinsidetheLAPIC,sothateachprocessorcanreceivetimerin-terruptsindependently.
Xv6setsitupinlapicinit(7408).
Thekeylineistheonethatprogramsthetimer(7421).
ThislinetellstheLAPICtoperiodicallygenerateaninter-ruptatIRQ_TIMER,whichisIRQ0.
Line(7451)enablesinterruptsonaCPU'sLAPIC,whichwillcauseittodeliverinterruptstothelocalprocessor.
AprocessorcancontrolifitwantstoreceiveinterruptsthroughtheIFaginthe%eflagsregister.
TheinstructionclidisablesinterruptsontheprocessorbyclearingIF,andstienablesinterruptsonaprocessor.
Xv6disablesinterruptsduringbootingofthemaincpu(9112)andtheotherprocessors(1124).
Thescheduleroneachprocessorenablesinterrupts(2766).
Tocontrolthatcertaincodefragmentsarenotinterrupted,xv6disablesinterruptsduringthesecodefragments(e.
g.
,seeswitchuvm(1860)).
Thetimerinterruptsthroughvector32(whichxv6chosetohandleIRQ0),whichxv6setupinidtinit(1255).
Theonlydierencebetweenvector32andvector64(theoneforsystemcalls)isthatvector32isaninterruptgateinsteadofatrapgate.
Inter-ruptgatesclearIF,sothattheinterruptedprocessordoesn'treceiveinterruptswhileitishandlingthecurrentinterrupt.
Fromhereonuntiltrap,interruptsfollowthesamecodepathassystemcallsandexceptions,buildingupatrapframe.
Trapforatimerinterruptdoesjusttwothings:incrementtheticksvariable(3417),andcallwakeup.
Thelatter,aswewillseeinChapter5,maycausetheinterrupttoreturninadierentprocess.
DriversAdriveristhecodeinanoperatingsystemthatmanagesaparticulardevice:ittellsthedevicehardwaretoperformoperations,conguresthedevicetogenerateinterruptswhendone,andhandlestheresultinginterrupts.
Drivercodecanbetrickytowritebecauseadriverexecutesconcurrentlywiththedevicethatitmanages.
Inaddition,thedrivermustunderstandthedevice'sinterface(e.
g.
,whichI/Oportsdowhat),andthatinterfacecanbecomplexandpoorlydocumented.
Thediskdriverprovidesagoodexample.
ThediskdrivercopiesdatafromandDRAFTasofSeptember4,201846https://pdos.
csail.
mit.
edu/6.
828/xv6lapicinit+codeIRQ_TIMER,+codeIF+codecli+codesti+codeswitchuvm+codeidtinit+codetrap+codewakeup+codedriverbacktothedisk.
Diskhardwaretraditionallypresentsthedataonthediskasanum-beredsequenceof512-byteblocks(alsocalledsectors):sector0istherst512bytes,sector1isthenext,andsoon.
Theblocksizethatanoperatingsystemusesforitslesystemmaybedierentthanthesectorsizethatadiskuses,buttypicallytheblocksizeisamultipleofthesectorsize.
Xv6'sblocksizeisidenticaltothedisk'ssectorsize.
Torepresentablockxv6hasastructurestructbuf(3850).
Thedatastoredinthisstruc-tureisoftenoutofsyncwiththedisk:itmighthavenotyetbeenreadinfromdisk(thediskisworkingonitbuthasn'treturnedthesector'scontentyet),oritmighthavebeenupdatedbutnotyetwrittenout.
Thedrivermustensurethattherestofxv6doesn'tgetconfusedwhenthestructureisoutofsyncwiththedisk.
Code:DiskdriverTheIDEdeviceprovidesaccesstodisksconnectedtothePCstandardIDEcon-troller.
IDEisnowfallingoutoffashioninfavorofSCSIandSATA,buttheinterfaceissimpleandletsusconcentrateontheoverallstructureofadriverinsteadofthede-tailsofaparticularpieceofhardware.
Xv6representlesystemblocksusingstructbuf(3850).
BSIZE(4055)isidenticaltotheIDE'ssectorsizeandthuseachbuerrepresentsthecontentsofonesectoronaparticulardiskdevice.
Thedevandsectoreldsgivethedeviceandsectornumberandthedataeldisanin-memorycopyofthedisksector.
Althoughthexv6lesys-temchoosesBSIZEtobeidenticaltotheIDE'ssectorsize,thedrivercanhandleaBSIZEthatisamultipleofthesectorsize.
Operatingsystemsoftenusebiggerblocksthan512bytestoobtainhigherdiskthroughput.
Theflagstracktherelationshipbetweenmemoryanddisk:theB_VALIDagmeansthatdatahasbeenreadin,andtheB_DIRTYagmeansthatdataneedstobewrittenout.
Thekernelinitializesthediskdriveratboottimebycallingideinit(4251)frommain(1232).
IdeinitcallsioapicenabletoenabletheIDE_IRQinterrupt(4256).
ThecalltoioapicenableenablestheinterruptonlyonthelastCPU(ncpu-1):onatwo-processorsystem,CPU1handlesdiskinterrupts.
Next,ideinitprobesthediskhardware.
Itbeginsbycallingidewait(4257)towaitforthedisktobeabletoacceptcommands.
APCmotherboardpresentsthesta-tusbitsofthediskhardwareonI/Oport0x1f7.
Idewait(4238)pollsthestatusbitsuntilthebusybit(IDE_BSY)isclearandthereadybit(IDE_DRDY)isset.
Nowthatthediskcontrollerisready,ideinitcancheckhowmanydisksarepresent.
Itassumesthatdisk0ispresent,becausethebootloaderandthekernelwerebothloadedfromdisk0,butitmustcheckfordisk1.
ItwritestoI/Oport0x1f6toselectdisk1andthenwaitsawhileforthestatusbittoshowthatthediskisready(4259-4266).
Ifnot,ideinitassumesthediskisabsent.
Afterideinit,thediskisnotusedagainuntilthebuercachecallsiderw,whichupdatesalockedbuerasindicatedbytheags.
IfB_DIRTYisset,iderwwritesthebuertothedisk;ifB_VALIDisnotset,iderwreadsthebuerfromthedisk.
Diskaccessestypicallytakemilliseconds,alongtimeforaprocessor.
ThebootDRAFTasofSeptember4,201847https://pdos.
csail.
mit.
edu/6.
828/xv6blocksectorstructbuf+codeB_VALID+codeB_DIRTY+codeideinit+codemain+codeioapicenable+codeIDE_IRQ+codeideinit+codeidewait+codeIDE_BSY+codeIDE_DRDY+codeiderw+codeB_DIRTY+codeB_VALID+codeiderw+codeloaderissuesdiskreadcommandsandreadsthestatusbitsrepeatedlyuntilthedataisready(seeAppendixB).
Thispollingorbusywaitingisneinabootloader,whichhasnothingbettertodo.
Inanoperatingsystem,however,itismoreecienttoletanotherprocessrunontheCPUandarrangetoreceiveaninterruptwhenthediskoperationhascompleted.
Iderwtakesthislatterapproach,keepingthelistofpendingdiskrequestsinaqueueandusinginterruptstondoutwheneachrequesthasn-ished.
Althoughiderwmaintainsaqueueofrequests,thesimpleIDEdiskcontrollercanonlyhandleoneoperationatatime.
Thediskdrivermaintainstheinvariantthatithassentthebueratthefrontofthequeuetothediskhardware;theothersaresimplywaitingtheirturn.
Iderw(4354)addsthebuerbtotheendofthequeue(4367-4371).
Ifthebuerisatthefrontofthequeue,iderwmustsendittothediskhardwarebycallingidestart(4326-4328);otherwisethebuerwillbestartedoncethebuersaheadofitaretakencareof.
Idestart(4274)issueseitherareadorawriteforthebuer'sdeviceandsector,accordingtotheags.
Iftheoperationisawrite,idestartmustsupplythedatanow(4296).
idestartmovesthedatatoabuerinthediskcontrollerusingtheoutslin-struction;usingCPUinstructionstomovedatato/fromdevicehardwareiscalledpro-grammedI/O.
Eventuallythediskhardwarewillraiseaninterrupttosignalthatthedatahasbeenwrittentodisk.
Iftheoperationisaread,theinterruptwillsignalthatthedataisready,andthehandlerwillreadit.
Notethatidestarthasdetailedknowl-edgeabouttheIDEdevice,andwritestherightvaluesattherightports.
Ifanyoftheseoutbstatementsiswrong,theIDEwilldosomethingdierentlythanwhatwewant.
Gettingthesedetailsrightisonereasonwhywritingdevicedriversischalleng-ing.
Havingaddedtherequesttothequeueandstarteditifnecessary,iderwmustwaitfortheresult.
Asdiscussedabove,pollingdoesnotmakeecientuseoftheCPU.
Instead,iderwyieldstheCPUforotherprocessesbysleeping,waitingfortheinterrupthandlertorecordinthebuer'sagsthattheoperationisdone(4378-4379).
Whilethisprocessissleeping,xv6willscheduleotherprocessestokeeptheCPUbusy.
Eventually,thediskwillnishitsoperationandtriggeraninterrupt.
trapwillcallideintrtohandleit(3424).
Ideintr(4304)consultstherstbuerinthequeuetondoutwhichoperationwashappening.
Ifthebuerwasbeingreadandthediskcontrollerhasdatawaiting,ideintrreadsthedatafromabuerinthediskcontrollerintomemorywithinsl(4317-4319).
Nowthebuerisready:ideintrsetsB_VALID,clearsB_DIRTY,andwakesupanyprocesssleepingonthebuer(4321-4324).
Finally,ideintrmustpassthenextwaitingbuertothedisk(4326-4328).
RealworldSupportingallthedevicesonaPCmotherboardinitsfullgloryismuchwork,be-causetherearemanydevices,thedeviceshavemanyfeatures,andtheprotocolbe-tweendeviceanddrivercanbecomplex.
Inmanyoperatingsystems,thedriversto-getheraccountformorecodeintheoperatingsystemthanthecorekernel.
Actualdevicedriversarefarmorecomplexthanthediskdriverinthischapter,DRAFTasofSeptember4,201848https://pdos.
csail.
mit.
edu/6.
828/xv6pollingbusywaitingiderw+codeidestart+codeidestart+codeiderw+codetrap+codeideintr+codeinsl+codeB_VALID+codeB_DIRTY+codebutthebasicideasarethesame:typicallydevicesareslowerthanCPU,sothehard-wareusesinterruptstonotifytheoperatingsystemofstatuschanges.
Moderndiskcontrollerstypicallyacceptabatchofdiskrequestsatatimeandevenreorderthemtomakemostecientuseofthediskarm.
Whendisksweresimpler,operatingsystemsoftenreorderedtherequestqueuethemselves.
Manyoperatingsystemshavedriversforsolid-statedisksbecausetheyprovidemuchfasteraccesstodata.
But,althoughasolid-statediskworksverydierentlyfromatraditionalmechanicaldisk,bothdevicesprovideblock-basedinterfacesandread-ing/writingblocksonasolid-statediskisstillmoreexpensivethanreading/writingRAM.
Otherhardwareissurprisinglysimilartodisks:networkdevicebuersholdpack-ets,audiodevicebuersholdsoundsamples,graphicscardbuersholdvideodataandcommandsequences.
High-bandwidthdevices—disks,graphicscards,andnetworkcards—oftenusedirectmemoryaccess(DMA)insteadofprogrammedI/O(insl,outsl).
DMAallowsthedevicedirectaccesstophysicalmemory.
Thedrivergivesthedevicethephysicaladdressofthebuer'sdataandthedevicecopiesdirectlytoorfrommainmemory,interruptingoncethecopyiscomplete.
DMAisfasterandmoreecientthanprogrammedI/OandislesstaxingfortheCPU'smemorycaches.
Somedriversdynamicallyswitchbetweenpollingandinterrupts,becauseusinginterruptscanbeexpensive,butusingpollingcanintroducedelayuntilthedriverpro-cessesanevent.
Forexample,anetworkdriverthatreceivesaburstofpacketsmayswitchfrominterruptstopollingsinceitknowsthatmorepacketsmustbeprocessedanditislessexpensivetoprocessthemusingpolling.
Oncenomorepacketsneedtobeprocessed,thedrivermayswitchbacktointerrupts,sothatitwillbealertedimme-diatelywhenanewpacketarrives.
TheIDEdriverroutesinterruptsstaticallytoaparticularprocessor.
SomedriversconguretheIOAPICtorouteinterruptstomultipleprocessorstospreadouttheworkofprocessingpackets.
Forexample,anetworkdrivermightarrangetodeliverinterruptsforpacketsofonenetworkconnectiontotheprocessorthatismanagingthatconnection,whileinterruptsforpacketsofanotherconnectionaredeliveredtoan-otherprocessor.
Thisroutingcangetquitesophisticated;forexample,ifsomenetworkconnectionsareshortlivedwhileothersarelonglivedandtheoperatingsystemwantstokeepallprocessorsbusytoachievehighthroughput.
Ifaprogramreadsale,thedataforthatleiscopiedtwice.
First,itiscopiedfromthedisktokernelmemorybythedriver,andthenlateritiscopiedfromkernelspacetouserspacebythereadsystemcall.
Iftheprogramthensendsthedataoverthenetwork,thedataiscopiedtwicemore:fromuserspacetokernelspaceandfromkernelspacetothenetworkdevice.
Tosupportapplicationsforwhicheciencyisim-portant(e.
g.
,servingpopularimagesontheWeb),operatingsystemsusespecialcodepathstoavoidcopies.
Asoneexample,inreal-worldoperatingsystems,buerstypical-lymatchthehardwarepagesize,sothatread-onlycopiescanbemappedintoapro-cess'saddressspaceusingthepaginghardware,withoutanycopying.
ExercisesDRAFTasofSeptember4,201849https://pdos.
csail.
mit.
edu/6.
828/xv6batch1.
Setabreakpointattherstinstructionofsyscalltocatchtheveryrstsys-temcall(e.
g.
,brsyscall).
WhatvaluesareonthestackatthispointExplaintheout-putofx/37x$espatthatbreakpointwitheachvaluelabeledastowhatitis(e.
g.
,saved%ebpfortrap,trapframe.
eip,scratchspace,etc.
).
2.
AddanewsystemcalltogetthecurrentUTCtimeandreturnittotheuserprogram.
Youmaywanttousethehelperfunction,cmostime(7552),toreadtherealtimeclock.
Theledate.
hcontainsthedenitionofthestructrtcdate(0950),whichyouwillprovideasanargumenttocmostimeasapointer.
3.
WriteadriverforadiskthatsupportstheSATAstandard(searchforSATAontheWeb).
UnlikeIDE,SATAisn'tobsolete.
UseSATA'staggedcommandqueuingtoissuemanycommandstothedisksothatthediskinternallycanreordercommandstoobtainhighperformance.
4.
AddsimpledriverforanEthernetcard.
DRAFTasofSeptember4,201850https://pdos.
csail.
mit.
edu/6.
828/xv6Chapter4LockingXv6runsonmultiprocessors:computerswithmultipleCPUsexecutingindepen-dently.
ThesemultipleCPUssharephysicalRAM,andxv6exploitsthesharingtomaintaindatastructuresthatallCPUsreadandwrite.
Thissharingraisesthepossibil-ityofoneCPUreadingadatastructurewhileanotherCPUismid-waythroughup-datingit,orevenmultipleCPUsupdatingthesamedatasimultaneously;withoutcare-fuldesignsuchparallelaccessislikelytoyieldincorrectresultsorabrokendatastruc-ture.
Evenonauniprocessor,aninterruptroutinethatusesthesamedataassomein-terruptiblecodecoulddamagethedataiftheinterruptoccursatjustthewrongtime.
Anycodethataccessesshareddataconcurrentlymusthaveastrategyformain-tainingcorrectnessdespiteconcurrency.
Theconcurrencymayarisefromaccessesbymultiplecores,orbymultiplethreads,orbyinterruptcode.
xv6usesahandfulofsim-pleconcurrencycontrolstrategies;muchmoresophisticationispossible.
Thischapterfocusesononeofthestrategiesusedextensivelyinxv6andmanyothersystems:thelock.
Alockprovidesmutualexclusion,ensuringthatonlyoneCPUatatimecanholdthelock.
Ifalockisassociatedwitheachshareddataitem,andthecodealwaysholdstheassociatedlockwhenusingagivenitem,thenwecanbesurethattheitemisusedfromonlyoneCPUatatime.
Inthissituation,wesaythatthelockprotectsthedataitem.
Therestofthischapterexplainswhyxv6needslocks,howxv6implementsthem,andhowitusesthem.
Akeyobservationwillbethatifyoulookatsomecodeinxv6,youmustaskyourselfifanotherprocessor(orinterrupt)couldchangetheintendedbehaviorofthecodebymodifyingdata(orhardwareresources)itdependson.
YoumustkeepinmindthatasingleCstatementcanbeseveralmachineinstructionsandthusanotherprocessororaninterruptmaymuckaroundinthemiddleofaCstate-ment.
Youcannotassumethatlinesofcodeonthepageareexecutedatomically.
Concurrencymakesreasoningaboutcorrectnessmuchmoredicult.
RaceconditionsAsanexampleofwhyweneedlocks,considerseveralprocessorssharingasingledisk,suchastheIDEdiskinxv6.
Thediskdrivermaintainsalinkedlistoftheout-standingdiskrequests(4226)andprocessorsmayaddnewrequeststothelistconcur-rently(4354).
Iftherewerenoconcurrentrequests,youmightimplementthelinkedlistasfollows:DRAFTasofSeptember4,201851https://pdos.
csail.
mit.
edu/6.
828/xv6lockMemoryCPU1CPU215l->next16list1516listl->nextTimeFigure4-1.
Examplerace1structlist{2intdata;3structlist*next;4};56structlist*list=0;78void9insert(intdata)10{11structlist*l;1213l=malloc(sizeof*l);14l->data=data;15l->next=list;16list=l;17}Thisimplementationiscorrectifexecutedinisolation.
However,thecodeisnotcor-rectifmorethanonecopyexecutesconcurrently.
IftwoCPUsexecuteinsertatthesametime,itcouldhappenthatbothexecuteline15beforeeitherexecutes16(seeFigure4-1).
Ifthishappens,therewillnowbetwolistnodeswithnextsettothefor-mervalueoflist.
Whenthetwoassignmentstolisthappenatline16,thesecondonewilloverwritetherst;thenodeinvolvedintherstassignmentwillbelost.
Thelostupdateatline16isanexampleofaracecondition.
Araceconditionisasituationinwhichamemorylocationisaccessedconcurrently,andatleastoneaccessisawrite.
Araceisoftenasignofabug,eitheralostupdate(iftheaccessesarewrites)orareadofanincompletely-updateddatastructure.
TheoutcomeofaracedependsontheexacttimingofthetwoCPUsinvolvedandhowtheirmemoryopera-tionsareorderedbythememorysystem,whichcanmakerace-inducederrorsdiculttoreproduceanddebug.
Forexample,addingprintstatementswhiledebuggingin-DRAFTasofSeptember4,201852https://pdos.
csail.
mit.
edu/6.
828/xv6raceconditionsertmightchangethetimingoftheexecutionenoughtomaketheracedisappear.
Theusualwaytoavoidracesistousealock.
Locksensuremutualexclusion,sothatonlyoneCPUcanexecuteinsertatatime;thismakesthescenarioaboveim-possible.
Thecorrectlylockedversionoftheabovecodeaddsjustafewlines(notnumbered):6structlist*list=0;structlocklistlock;78void9insert(intdata)10{11structlist*l;12l=malloc(sizeof*l);13l->data=data;14acquire(&listlock);15l->next=list;16list=l;release(&listlock);17}Thesequenceofinstructionsbetweenacquireandreleaseisoftencalledacriticalsection,andthelockprotectslist.
Whenwesaythatalockprotectsdata,wereallymeanthatthelockprotectssomecollectionofinvariantsthatapplytothedata.
Invariantsarepropertiesofdatastruc-turesthataremaintainedacrossoperations.
Typically,anoperation'scorrectbehaviordependsontheinvariantsbeingtruewhentheoperationbegins.
Theoperationmaytemporarilyviolatetheinvariantsbutmustreestablishthembeforenishing.
Forex-ample,inthelinkedlistcase,theinvariantisthatlistpointsattherstnodeinthelistandthateachnode'snexteldpointsatthenextnode.
Theimplementationofinsertviolatesthisinvarianttemporarily:inline15,lpointstothenextlistelement,butlistdoesnotpointatlyet(reestablishedatline16).
Theraceconditionweex-aminedabovehappenedbecauseasecondCPUexecutedcodethatdependedonthelistinvariantswhiletheywere(temporarily)violated.
ProperuseofalockensuresthatonlyoneCPUatatimecanoperateonthedatastructureinthecriticalsection,sothatnoCPUwillexecuteadatastructureoperationwhenthedatastructure'sinvari-antsdonothold.
Youcanthinkoflocksasserializingconcurrentcriticalsectionssothattheyrunoneatatime,andthuspreserveinvariants(assumingtheyarecorrectinisolation).
Youcanalsothinkofcriticalsectionsasbeingatomicwithrespecttoeachother,sothatacriticalsectionthatobtainsthelocklaterseesonlythecompletesetofchangesfromearliercriticalsections,andneverseespartially-completedupdates.
Notethatitwouldalsobecorrecttomoveupacquiretoearlierininsert.
Forexample,itisnetomovethecalltoacquireuptobeforeline12.
Thismayreduceparalellismbecausethenthecallstomallocarealsoserialized.
Thesection"Usinglocks"belowprovidessomeguidelinesforwheretoinsertacquireandreleaseinvo-cations.
DRAFTasofSeptember4,201853https://pdos.
csail.
mit.
edu/6.
828/xv6mutualexclusioncriticalsectionserializingCode:LocksXv6hastwotypesoflocks:spin-locksandsleep-locks.
We'llstartwithspin-locks.
Xv6representsaspin-lockasastructspinlock(1501).
Theimportanteldinthestructureislocked,awordthatiszerowhenthelockisavailableandnon-zerowhenitisheld.
Logically,xv6shouldacquirealockbyexecutingcodelike21void22acquire(structspinlock*lk)23{24for(;;){25if(!
lk->locked){26lk->locked=1;27break;28}29}30}Unfortunately,thisimplementationdoesnotguaranteemutualexclusiononamulti-processor.
ItcouldhappenthattwoCPUssimultaneouslyreachline25,seethatlk->lockediszero,andthenbothgrabthelockbyexecutingline26.
Atthispoint,twodierentCPUsholdthelock,whichviolatesthemutualexclusionproperty.
Ratherthanhelpingusavoidraceconditions,thisimplementationofacquirehasitsownracecondition.
Theproblemhereisthatlines25and26executedasseparateactions.
Inorderfortheroutineabovetobecorrect,lines25and26mustexecuteinoneatomic(i.
e.
,indivisible)step.
Toexecutethosetwolinesatomically,xv6reliesonaspecialx86instruction,xchg(0569).
Inoneatomicoperation,xchgswapsawordinmemorywiththecontentsofaregister.
Thefunctionacquire(1574)repeatsthisxchginstructioninaloop;eachiter-ationatomicallyreadslk->lockedandsetsitto1(1581).
Ifthelockisalreadyheld,lk->lockedwillalreadybe1,sothexchgreturns1andtheloopcontinues.
Ifthexchgreturns0,however,acquirehassuccessfullyacquiredthelock—lockedwas0andisnow1—sotheloopcanstop.
Oncethelockisacquired,acquirerecords,fordebugging,theCPUandstacktracethatacquiredthelock.
Ifaprocessforgetstore-leasealock,thisinformationcanhelptoidentifytheculprit.
Thesedebuggingeldsareprotectedbythelockandmustonlybeeditedwhileholdingthelock.
Thefunctionrelease(1602)istheoppositeofacquire:itclearsthedebuggingeldsandthenreleasesthelock.
Thefunctionusesanassemblyinstructiontoclearlocked,becauseclearingthiseldshouldbeatomicsothatthexchginstructionwon'tseeasubsetofthe4bytesthatholdlockedupdated.
Thex86guaranteesthata32-bitmovlupdatesall4bytesatomically.
Xv6cannotusearegularCassignment,becausetheClanguagespecicationdoesnotspecifythatasingleassignmentisatomic.
Xv6'simplementationofspin-locksisx86-specic,andxv6isthusnotdirectlyportabletootherprocessors.
Toallowforportableimplementationsofspin-locks,theClanguagesupportsalibraryofatomicinstructions;aportableoperatingsystemwouldusethoseinstructions.
Code:UsinglocksDRAFTasofSeptember4,201854https://pdos.
csail.
mit.
edu/6.
828/xv6structspinlock+codeacquire+codeatomicxchg+codeacquire+coderelease+codeXv6useslocksinmanyplacestoavoidraceconditions.
AsimpleexampleisintheIDEdriver(4200).
Asmentionedinthebeginningofthechapter,iderw(4354)hasaqueueofdiskrequestsandprocessorsmayaddnewrequeststothelistconcurrently(4369).
Toprotectthislistandotherinvariantsinthedriver,iderwacquirestheide-lock(4365)andreleasesitattheendofthefunction.
Exercise1exploreshowtotriggertheIDEdriverraceconditionthatwesawatthebeginningofthechapterbymovingtheacquiretoafterthequeuemanipulation.
Itisworthwhiletotrytheexercisebecauseitwillmakeclearthatitisnotthateasytotriggertherace,suggestingthatitisdiculttondrace-conditionsbugs.
Itisnotun-likelythatxv6hassomeraces.
Ahardpartaboutusinglocksisdecidinghowmanylockstouseandwhichdataandinvariantseachlockprotects.
Thereareafewbasicprinciples.
First,anytimeavariablecanbewrittenbyoneCPUatthesametimethatanotherCPUcanreadorwriteit,alockshouldbeintroducedtokeepthetwooperationsfromoverlapping.
Second,rememberthatlocksprotectinvariants:ifaninvariantinvolvesmultiplememorylocations,typicallyallofthemneedtobeprotectedbyasinglelocktoensuretheinvariantismaintained.
Therulesabovesaywhenlocksarenecessarybutsaynothingaboutwhenlocksareunnecessary,anditisimportantforeciencynottolocktoomuch,becauselocksreduceparallelism.
Ifparallelismisn'timportant,thenonecouldarrangetohaveonlyasinglethreadandnotworryaboutlocks.
Asimplekernelcandothisonamultipro-cessorbyhavingasinglelockthatmustbeacquiredonenteringthekernelandre-leasedonexitingthekernel(thoughsystemcallssuchaspipereadsorwaitwouldposeaproblem).
Manyuniprocessoroperatingsystemshavebeenconvertedtorunonmultiprocessorsusingthisapproach,sometimescalleda''giantkernellock,''buttheap-proachsacricesparallelism:onlyoneCPUcanexecuteinthekernelatatime.
Ifthekerneldoesanyheavycomputation,itwouldbemoreecienttousealargersetofmorene-grainedlocks,sothatthekernelcouldexecuteonmultipleCPUssimultane-ously.
Ultimately,thechoiceoflockgranularityisanexerciseinparallelprogramming.
Xv6usesafewcoarsedata-structurespeciclocks(seeFigure4-2).
Forexample,xv6hasalockthatprotectsthewholeprocesstableanditsinvariants,whicharedescribedinChapter5.
Amorene-grainedapproachwouldbetohavealockperentryintheprocesstablesothatthreadsworkingondierententriesintheprocesstablecanpro-ceedinparallel.
However,itcomplicatesoperationsthathaveinvariantsoverthewholeprocesstable,sincetheymighthavetoacquireseverallocks.
Subsequentchap-terswilldiscusshoweachpartofxv6dealswithconcurrency,illustratinghowtouselocks.
DeadlockandlockorderingIfacodepaththroughthekernelmustholdseverallocksatthesametime,itisim-portantthatallcodepathsacquirethelocksinthesameorder.
Iftheydon't,thereisariskofdeadlock.
Let'ssaytwocodepathsinxv6needlocksAandB,butcodepath1acquireslocksintheorderAthenB,andtheotherpathacquiresthemintheorderBDRAFTasofSeptember4,201855https://pdos.
csail.
mit.
edu/6.
828/xv6iderw+codeidelock+codeLockDescriptionbcache.
lockProtectsallocationofblockbuercacheentriescons.
lockSerializesaccesstoconsolehardware,avoidsintermixedoutputftable.
lockSerializesallocationofastructleinletableicache.
lockProtectsallocationofinodecacheentriesidelockSerializesaccesstodiskhardwareanddiskqueuekmem.
lockSerializesallocationofmemorylog.
lockSerializesoperationsonthetransactionlogpipe'sp->lockSerializesoperationsoneachpipeptable.
lockSerializescontextswitching,andoperationsonproc->stateandproctabletickslockSerializesoperationsonthetickscounterinode'sip->lockSerializesoperationsoneachinodeanditscontentbuf'sb->lockSerializesoperationsoneachblockbuerFigure4-2.
Locksinxv6thenA.
Thissituationcanresultinadeadlockiftwothreadsexecutethecodepathsconcurrently.
SupposethreadT1executescodepath1andacquireslockA,andthreadT2executescodepath2andacquireslockB.
NextT1willtrytoacquirelockB,andT2willtrytoacquirelockA.
Bothacquireswillblockindenitely,becauseinbothcasestheotherthreadholdstheneededlock,andwon'treleaseituntilitsacquirere-turns.
Toavoidsuchdeadlocks,allcodepathsmustacquirelocksinthesameorder.
Theneedforagloballockacquisitionordermeansthatlocksareeectivelypartofeachfunction'sspecication:callersmustinvokefunctionsinawaythatcauseslockstobeacquiredintheagreed-onorder.
Xv6hasmanylock-orderchainsoflengthtwoinvolvingtheptable.
lock,duetothewaythatsleepworksasdiscussedinChapter5.
Forexample,ideintrholdstheidelockwhilecallingwakeup,whichacquirestheptablelock.
Thelesystemcodecontainsxv6'slongestlockchains.
Forexample,creatingalerequiressimultaneouslyholdingalockonthedirectory,alockonthenewle'sinode,alockonadiskblockbuer,idelock,andptable.
lock.
Toavoiddeadlock,lesystemcodealwaysacquireslocksintheordermentionedintheprevioussentence.
InterrupthandlersXv6usesspin-locksinmanysituationstoprotectdatathatisusedbybothinterrupthandlersandthreads.
Forexample,atimerinterruptmight(3414)incrementticksataboutthesametimethatakernelthreadreadsticksinsys_sleep(3823).
Thelocktickslockserializesthetwoaccesses.
Interruptscancauseconcurrencyevenonasingleprocessor:ifinterruptsareen-abled,kernelcodecanbestoppedatanymomenttorunaninterrupthandlerinstead.
Supposeiderwheldtheidelockandthengotinterruptedtorunideintr.
Ideintrwouldtrytolockidelock,seeitwasheld,andwaitforittobereleased.
Inthissitu-ation,idelockwillneverbereleased—onlyiderwcanreleaseit,andiderwwillnotcontinuerunninguntilideintrreturns—sotheprocessor,andeventuallythewholesystem,willdeadlock.
DRAFTasofSeptember4,201856https://pdos.
csail.
mit.
edu/6.
828/xv6ideintr+codewakeup+codeptable+codeticks+codesys_sleep+codetickslock+codeiderw+codeidelock+codeideintr+codeToavoidthissituation,ifaspin-lockisusedbyaninterrupthandler,aprocessormustneverholdthatlockwithinterruptsenabled.
Xv6ismoreconservative:whenaprocessorentersaspin-lockcriticalsection,xv6alwaysensuresinterruptsaredisabledonthatprocessor.
Interruptsmaystilloccuronotherprocessors,soaninterrupt'sac-quirecanwaitforathreadtoreleaseaspin-lock;justnotonthesameprocessor.
xv6re-enablesinterruptswhenaprocessorholdsnospin-locks;itmustdoalittlebook-keepingtocopewithnestedcriticalsections.
acquirecallspushcli(1667)andreleasecallspopcli(1679)totrackthenestingleveloflocksonthecurrentprocessor.
Whenthatcountreacheszero,popclirestorestheinterruptenablestatethatexistedatthestartoftheoutermostcriticalsection.
Thecliandstifunctionsexecutethex86interruptdisableandenableinstructions,respectively.
Itisimportantthatacquirecallpushclibeforethexchgthatmightacquirethelock(1581).
Ifthetwowerereversed,therewouldbeafewinstructioncycleswhenthelockwasheldwithinterruptsenabled,andanunfortunatelytimedinterruptwoulddeadlockthesystem.
Similarly,itisimportantthatreleasecallpopclionlyafterthexchgthatreleasesthelock(1581).
InstructionandmemoryorderingThischapterhasassumedthatcodeexecutesintheorderinwhichthecodeap-pearsintheprogram.
Manycompilersandprocessors,however,executecodeoutofordertoachievehigherperformance.
Ifaninstructiontakesmanycyclestocomplete,aprocessormaywanttoissuetheinstructionearlysothatitcanoverlapwithotherinstructionsandavoidprocessorstalls.
Forexample,aprocessormaynoticethatinaserialsequenceofinstructionsAandBarenotdependentoneachotherandstartin-structionBbeforeAsothatitwillbecompletedwhentheprocessorcompletesA.
Acompilermayperformasimilarre-orderingbyemittinginstructionBbeforeinstruc-tionAintheexecutablele.
Concurrency,however,mayexposethisreorderingtosoftware,whichcanleadtoincorrectbehavior.
Forexample,inthiscodeforinsert,itwouldbeadisasterifthecompilerorprocessorcausedtheeectsofline4(or2or5)tobevisibletoothercoresaftertheeectsofline6:1l=malloc(sizeof*l);2l->data=data;3acquire(&listlock);4l->next=list;5list=l;6release(&listlock);Ifthehardwareorcompilerwouldre-order,forexample,theeectsofline4tobevis-ibleafterline6,thenanotherprocessorcanacquirelistlockandobservethatlistpointstol,butitwon'tobservethatl->nextissettotheremainderofthelistandwon'tbeabletoreadtherestofthelist.
Totellthehardwareandcompilernottoperformsuchre-orderings,xv6uses__sync_synchronize(),inbothacquireandrelease.
_sync_synchronize()isamemorybarrier:ittellsthecompilerandCPUtonotreorderloadsorstoresacrossDRAFTasofSeptember4,201857https://pdos.
csail.
mit.
edu/6.
828/xv6pushcli+codepopcli+codeacquire+codexchg+coderelease+codepopcli+codexchg+codethebarrier.
Xv6worriesaboutorderingonlyinacquireandrelease,becausecon-currentaccesstodatastructuresotherthanthelockstructureisperformedbetweenacquireandrelease.
SleeplocksSometimesxv6codeneedstoholdalockforalongtime.
Forexample,thelesystem(Chapter6)keepsalelockedwhilereadingandwritingitscontentonthedisk,andthesediskoperationscantaketensofmilliseconds.
Eciencydemandsthattheprocessorbeyieldedwhilewaitingsothatotherthreadscanmakeprogress,andthisinturnmeansthatxv6needslocksthatworkwellwhenheldacrosscontextswitches.
Xv6providessuchlocksintheformofsleep-locks.
Xv6sleep-lockssupportyieldingtheprocessorduringtheircriticalsections.
Thispropertyposesadesignchallenge:ifthreadT1holdslockL1andhasyieldedthepro-cessor,andthreadT2wishestoacquireL1,wehavetoensurethatT1canexecutewhileT2iswaitingsothatT1canreleaseL1.
T2can'tusethespin-lockacquirefunc-tionhere:itspinswithinterruptsturnedo,andthatwouldpreventT1fromrunning.
Toavoidthisdeadlock,thesleep-lockacquireroutine(calledacquiresleep)yieldstheprocessorwhilewaiting,anddoesnotdisableinterrupts.
acquiresleep(4622)usestechniquesthatwillbeexplainedinChapter5.
Atahighlevel,asleep-lockhasalockedeldthatisprotectedbyaspinlock,andac-quiresleep'scalltosleepatomicallyyieldstheCPUandreleasesthespin-lock.
Theresultisthatotherthreadscanexecutewhileacquiresleepwaits.
Becausesleep-locksleaveinterruptsenabled,theycannotbeusedininterrupthandlers.
Becauseacquiresleepmayyieldtheprocessor,sleep-lockscannotbeusedinsidespin-lockcriticalsections(thoughspin-lockscanbeusedinsidesleep-lockcriti-calsections).
Xv6usesspin-locksinmostsituations,sincetheyhavelowoverhead.
Itusessleep-locksonlyinthelesystem,whereitisconvenienttobeabletoholdlocksacrosslengthydiskoperations.
LimitationsoflocksLocksoftensolveconcurrencyproblemscleanly,buttherearetimeswhentheyareawkward.
Subsequentchapterswillpointoutsuchsituationsinxv6;thissectionout-linessomeoftheproblemsthatcomeup.
Sometimesafunctionusesdatawhichmustbeguardedbyalock,butthefunc-tioniscalledbothfromcodethatalreadyholdsthelockandfromcodethatwouldn'totherwiseneedthelock.
Onewaytodealwiththisistohavetwovariantsofthefunction,onethatacquiresthelock,andtheotherthatexpectsthecallertoalreadyholdthelock;seewakeup1foranexample(2953).
Anotherapproachisforthefunctiontorequirecallerstoholdthelockwhetherthecallerneedsitornot,aswithsched(2758).
Kerneldevelopersneedtobeawareofsuchrequirements.
Itmightseemthatonecouldsimplifysituationswherebothcallerandcalleeneedalockbyallowingrecursivelocks,sothatifafunctionholdsalock,anyfunctionitDRAFTasofSeptember4,201858https://pdos.
csail.
mit.
edu/6.
828/xv6sleep-locksrecursivelockscallsisallowedtore-acquirethelock.
However,theprogrammerwouldthenneedtoreasonaboutallcombinationsofcallerandcallee,becauseitwillnolongerbethecasethatthedatastructure'sinvariantsalwaysholdafteranacquire.
Whetherrecursivelocksarebetterthanxv6'suseofconventionsaboutfunctionsthatrequirealocktobeheldisnotclear.
Thelargerlessonisthat(aswithgloballockorderingtoavoiddead-lock)lockrequirementssometimescan'tbeprivate,butintrudethemselvesonthein-terfacesoffunctionsandmodules.
Asituationinwhichlocksareinsucientiswhenonethreadneedstowaitforanotherthread'supdatetoadatastructure,forexamplewhenapipe'sreaderwaitsforsomeotherthreadtowritethepipe.
Thewaitingthreadcannotholdthelockonthedata,sincethatwouldpreventtheupdateitiswaitingfor.
Instead,xv6providesasepa-ratemechanismthatjointlymanagesthelockandeventwait;seethedescriptionofsleepandwakeupinChapter5.
RealworldConcurrencyprimitivesandparallelprogrammingareactiveareasofresearch,becauseprogrammingwithlocksisstillchallenging.
Itisbesttouselocksasthebaseforhigher-levelconstructslikesynchronizedqueues,althoughxv6doesnotdothis.
Ifyouprogramwithlocks,itiswisetouseatoolthatattemptstoidentifyraceconditions,becauseitiseasytomissaninvariantthatrequiresalock.
MostoperatingsystemssupportPOSIXthreads(Pthreads),whichallowauserprocesstohaveseveralthreadsrunningconcurrentlyondierentprocessors.
Pthreadshassupportforuser-levellocks,barriers,etc.
SupportingPthreadsrequiressupportfromtheoperatingsystem.
Forexample,itshouldbethecasethatifonepthreadblocksinasystemcall,anotherpthreadofthesameprocessshouldbeabletorunonthatprocessor.
Asanotherexample,ifapthreadchangesitsprocess'saddressspace(e.
g.
,groworshrinkit),thekernelmustarrangethatotherprocessorsthatrunthreadsofthesameprocessupdatetheirhardwarepagetablestoreectthechangeinthead-dressspace.
Onthex86,thisinvolvesshootingdowntheTranslationLook-asideBuer(TLB)ofotherprocessorsusinginter-processorinterrupts(IPIs).
Itispossibletoimplementlockswithoutatomicinstructions,butitisexpensive,andmostoperatingsystemsuseatomicinstructions.
Lockscanbeexpensiveifmanyprocessorstrytoacquirethesamelockatthesametime.
Ifoneprocessorhasalockcachedinitslocalcache,andanotherproces-sormustacquirethelock,thentheatomicinstructiontoupdatethecachelinethatholdsthelockmustmovethelinefromtheoneprocessor'scachetotheotherproces-sor'scache,andperhapsinvalidateanyothercopiesofthecacheline.
Fetchingacachelinefromanotherprocessor'scachecanbeordersofmagnitudemoreexpensivethanfetchingalinefromalocalcache.
Toavoidtheexpensesassociatedwithlocks,manyoperatingsystemsuselock-freedatastructuresandalgorithms.
Forexample,itispossibletoimplementalinkedlistliketheoneinthebeginningofthechapterthatrequiresnolocksduringlistsearches,andoneatomicinstructiontoinsertaniteminalist.
Lock-freeprogrammingismorecomplicated,however,thanprogramminglocks;forexample,onemustworryaboutin-DRAFTasofSeptember4,201859https://pdos.
csail.
mit.
edu/6.
828/xv6TranslationLook-asideBuer(TLB)structionandmemoryreordering.
Programmingwithlocksisalreadyhard,soxv6avoidstheadditionalcomplexityoflock-freeprogramming.
Exercises1.
Movetheacquireiniderwtobeforesleep.
IstherearaceWhydon'tyouobserveitwhenbootingxv6andrunstressfsIncreasecriticalsectionwithadummyloop;whatdoyouseenowexplain.
2.
Removethexchginacquire.
Explainwhathappenswhenyourunxv63.
WriteaparallelprogramusingPOSIXthreads,whichissupportedonmostop-eratingsystems.
Forexample,implementaparallelhashtableandmeasureifthenum-berofputs/getsscaleswithincreasingnumberofcores.
4.
ImplementasubsetofPthreadsinxv6.
Thatis,implementauser-levelthreadlibrarysothatauserprocesscanhavemorethan1threadandarrangethatthesethreadscanruninparallelondierentprocessors.
Comeupwithadesignthatcor-rectlyhandlesathreadmakingablockingsystemcallandchangingitssharedaddressspace.
DRAFTasofSeptember4,201860https://pdos.
csail.
mit.
edu/6.
828/xv6Chapter5SchedulingAnyoperatingsystemislikelytorunwithmoreprocessesthanthecomputerhasprocessors,soaplanisneededtotime-sharetheprocessorsamongtheprocesses.
Ide-allythesharingwouldbetransparenttouserprocesses.
Acommonapproachistoprovideeachprocesswiththeillusionthatithasitsownvirtualprocessorbymulti-plexingtheprocessesontothehardwareprocessors.
Thischapterexplainshowxv6achievesthismultiplexing.
MultiplexingXv6multiplexesbyswitchingeachprocessorfromoneprocesstoanotherintwosituations.
First,xv6'ssleepandwakeupmechanismswitcheswhenaprocesswaitsfordeviceorpipeI/Otocomplete,orwaitsforachildtoexit,orwaitsinthesleepsys-temcall.
Second,xv6periodicallyforcesaswitchwhenaprocessisexecutinguserin-structions.
ThismultiplexingcreatestheillusionthateachprocesshasitsownCPU,justasxv6usesthememoryallocatorandhardwarepagetablestocreatetheillusionthateachprocesshasitsownmemory.
Implementingmultiplexingposesafewchallenges.
First,howtoswitchfromoneprocesstoanotherAlthoughtheideaofcontextswitchingissimple,theimplementa-tionissomeofthemostopaquecodeinxv6.
Second,howtoswitchtransparentlytouserprocessesXv6usesthestandardtechniqueofdrivingcontextswitcheswithtimerinterrupts.
Third,manyCPUsmaybeswitchingamongprocessesconcurrently,andalockingplanisnecessarytoavoidraces.
Fourth,aprocess'smemoryandotherresourcesmustbefreedwhentheprocessexits,butitcannotdoallofthisitselfbe-cause(forexample)itcan'tfreeitsownkernelstackwhilestillusingit.
Finally,eachcoreofamulti-coremachinemustrememberwhichprocessitisexecutingsothatsys-temcallsaectthecorrectprocess'skernelstate.
Xv6triestosolvetheseproblemsassimplyaspossible,butneverthelesstheresultingcodeistricky.
xv6mustprovidewaysforprocessestocoordinateamongthemselves.
Forexam-ple,aparentprocessmayneedtowaitforoneofitschildrentoexit,oraprocessreadingapipemayneedtowaitforsomeotherprocesstowritethepipe.
RatherthanmakethewaitingprocesswasteCPUbyrepeatedlycheckingwhetherthedesiredeventhashappened,xv6allowsaprocesstogiveuptheCPUandsleepwaitingforanevent,andallowsanotherprocesstowaketherstprocessup.
Careisneededtoavoidracesthatresultinthelossofeventnotications.
Asanexampleoftheseproblemsandtheirsolution,thischapterexaminestheimplementationofpipes.
Code:ContextswitchingDRAFTasofSeptember4,201861https://pdos.
csail.
mit.
edu/6.
828/xv6multiplexingKernelshellcatuserspacekernelspacekstackshellkstackcatkstackschedulersaverestoreswtchswtchFigure5-1.
Switchingfromoneuserprocesstoanother.
Inthisexample,xv6runswithoneCPU(andthusoneschedulerthread).
Figure5-1outlinesthestepsinvolvedinswitchingfromoneuserprocesstoan-other:auser-kerneltransition(systemcallorinterrupt)totheoldprocess'skernelthread,acontextswitchtothecurrentCPU'sschedulerthread,acontextswitchtoanewprocess'skernelthread,andatrapreturntotheuser-levelprocess.
Thexv6schedulerhasitsownthread(savedregistersandstack)becauseitissometimesnotsafeforitexecuteonanyprocess'skernelstack;we'llseeanexampleinexit.
Inthissectionwe'llexaminethemechanicsofswitchingbetweenakernelthreadandasched-ulerthread.
Switchingfromonethreadtoanotherinvolvessavingtheoldthread'sCPUregis-ters,andrestoringthepreviously-savedregistersofthenewthread;thefactthat%espand%eiparesavedandrestoredmeansthattheCPUwillswitchstacksandswitchwhatcodeitisexecuting.
Thefunctionswtchperformsthesavesandrestoresforathreadswitch.
swtchdoesn'tdirectlyknowaboutthreads;itjustsavesandrestoresregistersets,calledcon-texts.
WhenitistimeforaprocesstogiveuptheCPU,theprocess'skernelthreadcallsswtchtosaveitsowncontextandreturntotheschedulercontext.
Eachcontextisrepresentedbyastructcontext*,apointertoastructurestoredonthekernelstackinvolved.
Swtchtakestwoarguments:structcontext**oldandstructcontext*new.
Itpushesthecurrentregistersontothestackandsavesthestackpoint-erin*old.
Thenswtchcopiesnewto%esp,popspreviouslysavedregisters,andre-turns.
Let'sfollowauserprocessthroughswtchintothescheduler.
WesawinChapter3thatonepossibilityattheendofeachinterruptisthattrapcallsyield.
Yieldinturncallssched,whichcallsswtchtosavethecurrentcontextinproc->contextandswitchtotheschedulercontextpreviouslysavedincpu->scheduler(2822).
Swtch(3052)startsbycopyingitsargumentsfromthestacktothecaller-savedreg-isters%eaxand%edx(3060-3061);swtchmustdothisbeforeitchangesthestackpointerandcannolongeraccesstheargumentsvia%esp.
Thenswtchpushestheregisterstate,creatingacontextstructureonthecurrentstack.
Onlythecallee-savedregistersneedtobesaved;theconventiononthex86isthattheseare%ebp,%ebx,%esi,DRAFTasofSeptember4,201862https://pdos.
csail.
mit.
edu/6.
828/xv6swtch+codecontextsstructcontext+codetrap+codeyield+codesched+codeswtch+codecpu->scheduler+codeswtch+code%edi,and%esp.
Swtchpushestherstfourexplicitly(3064-3067);itsavesthelastim-plicitlyasthestructcontext*writtento*old(3070).
Thereisonemoreimportantregister:theprogramcounter%eip.
Ithasalreadybeensavedonthestackbythecallinstructionthatinvokedswtch.
Havingsavedtheoldcontext,swtchisreadytorestorethenewone.
Itmovesthepointertothenewcontextintothestackpointer(3071).
Thenewstackhasthesameformastheoldonethatswtchjustleft—thenewstackwastheoldoneinapreviouscalltoswtch—soswtchcaninvertthesequencetorestorethenewcontext.
Itpopsthevaluesfor%edi,%esi,%ebx,and%ebpandthenreturns(3074-3078).
Becauseswtchhaschangedthestackpointer,thevaluesre-storedandtheinstructionaddressreturnedtoaretheonesfromthenewcontext.
Inourexample,schedcalledswtchtoswitchtocpu->scheduler,theper-CPUschedulercontext.
Thatcontexthadbeensavedbyscheduler'scalltoswtch(2781).
Whentheswtchwehavebeentracingreturns,itreturnsnottoschedbuttosched-uler,anditsstackpointerpointsatthecurrentCPU'sschedulerstack.
Code:SchedulingThelastsectionlookedatthelow-leveldetailsofswtch;nowlet'stakeswtchasagivenandexamineswitchingfromaprocessthroughtheschedulertoanotherprocess.
AprocessthatwantstogiveuptheCPUmustacquiretheprocesstablelockpt-able.
lock,releaseanyotherlocksitisholding,updateitsownstate(proc->state),andthencallsched.
Yield(2828)followsthisconvention,asdosleepandexit,whichwewillexaminelater.
Scheddouble-checksthoseconditions(2813-2818)andthenanimplicationofthoseconditions:sincealockisheld,theCPUshouldberunningwithinterruptsdisabled.
Finally,schedcallsswtchtosavethecurrentcontextinproc->contextandswitchtotheschedulercontextincpu->scheduler.
Swtchre-turnsonthescheduler'sstackasthoughscheduler'sswtchhadreturned(2781).
Theschedulercontinuestheforloop,ndsaprocesstorun,switchestoit,andthecyclerepeats.
Wejustsawthatxv6holdsptable.
lockacrosscallstoswtch:thecallerofswtchmustalreadyholdthelock,andcontrolofthelockpassestotheswitched-tocode.
Thisconventionisunusualwithlocks;usuallythethreadthatacquiresalockisalsoresponsibleforreleasingthelock,whichmakesiteasiertoreasonaboutcorrect-ness.
Forcontextswitchingitisnecessarytobreakthisconventionbecausept-able.
lockprotectsinvariantsontheprocess'sstateandcontexteldsthatarenottruewhileexecutinginswtch.
Oneexampleofaproblemthatcouldariseifpt-able.
lockwerenotheldduringswtch:adierentCPUmightdecidetorunthepro-cessafteryieldhadsetitsstatetoRUNNABLE,butbeforeswtchcausedittostopusingitsownkernelstack.
TheresultwouldbetwoCPUsrunningonthesamestack,whichcannotberight.
Akernelthreadalwaysgivesupitsprocessorinschedandalwaysswitchestothesamelocationinthescheduler,which(almost)alwaysswitchestosomekernelthreadthatpreviouslycalledsched.
Thus,ifoneweretoprintoutthelinenumberswherexv6switchesthreads,onewouldobservethefollowingsimplepattern:(2781),(2822),(2781),(2822),andsoon.
TheproceduresinwhichthisstylizedswitchingbetweentwoDRAFTasofSeptember4,201863https://pdos.
csail.
mit.
edu/6.
828/xv6swtch+codesched+codeswtch+codecpu->scheduler+codeswtch+codescheduler+codeswtch+codeptable.
lock+codesched+codesleep+codeexit+codesched+codeswtch+codecpu->scheduler+codescheduler+codeptable.
lock+codeswtch+codeptable.
lock+codeswtch+codeyield+codethreadshappensaresometimesreferredtoascoroutines;inthisexample,schedandschedulerareco-routinesofeachother.
Thereisonecasewhenthescheduler'scalltoswtchdoesnotendupinsched.
WesawthiscaseinChapter2:whenanewprocessisrstscheduled,itbeginsatforkret(2853).
Forkretexiststoreleasetheptable.
lock;otherwise,thenewprocesscouldstartattrapret.
Scheduler(2758)runsasimpleloop:ndaprocesstorun,runituntilityields,repeat.
schedulerholdsptable.
lockformostofitsactions,butreleasesthelock(andexplicitlyenablesinterrupts)onceineachiterationofitsouterloop.
Thisisim-portantforthespecialcaseinwhichthisCPUisidle(canndnoRUNNABLEprocess).
Ifanidlingschedulerloopedwiththelockcontinuouslyheld,nootherCPUthatwasrunningaprocesscouldeverperformacontextswitchoranyprocess-relatedsystemcall,andinparticularcouldnevermarkaprocessasRUNNABLEsoastobreaktheidlingCPUoutofitsschedulingloop.
ThereasontoenableinterruptsperiodicallyonanidlingCPUisthattheremightbenoRUNNABLEprocessbecauseprocesses(e.
g.
,theshell)arewaitingforI/O;iftheschedulerleftinterruptsdisabledallthetime,theI/Owouldneverarrive.
Theschedulerloopsovertheprocesstablelookingforarunnableprocess,onethathasp->state==RUNNABLE.
Onceitndsaprocess,itsetstheper-CPUcurrentprocessvariableproc,switchestotheprocess'spagetablewithswitchuvm,markstheprocessasRUNNING,andthencallsswtchtostartrunningit(2774-2781).
Onewaytothinkaboutthestructureoftheschedulingcodeisthatitarrangestoenforceasetofinvariantsabouteachprocess,andholdsptable.
lockwheneverthoseinvariantsarenottrue.
OneinvariantisthatifaprocessisRUNNING,atimerinter-rupt'syieldmustbeabletoswitchawayfromtheprocess;thismeansthattheCPUregistersmustholdtheprocess'sregistervalues(i.
e.
theyaren'tactuallyinacontext),%cr3mustrefertotheprocess'spagetable,%espmustrefertotheprocess'skernelstacksothatswtchcanpushregisterscorrectly,andprocmustrefertotheprocess'sproc[]slot.
AnotherinvariantisthatifaprocessisRUNNABLE,anidleCPU'sschedulermustbeabletorunit;thismeansthatp->contextmustholdtheprocess'skernelthreadvariables,thatnoCPUisexecutingontheprocess'skernelstack,thatnoCPU's%cr3referstotheprocess'spagetable,andthatnoCPU'sprocreferstotheprocess.
Maintainingtheaboveinvariantsisthereasonwhyxv6acquiresptable.
lockinonethread(ofteninyield)andreleasesthelockinadierentthread(theschedulerthreadoranothernextkernelthread).
Oncethecodehasstartedtomodifyarunningprocess'sstatetomakeitRUNNABLE,itmustholdthelockuntilithasnishedrestoringtheinvariants:theearliestcorrectreleasepointisafterschedulerstopsusingthepro-cess'spagetableandclearsproc.
Similarly,onceschedulerstartstoconvertarunnableprocesstoRUNNING,thelockcannotbereleaseduntilthekernelthreadiscompletelyrunning(aftertheswtch,e.
g.
inyield).
ptable.
lockprotectsotherthingsaswell:allocationofprocessIDsandfreeprocesstableslots,theinterplaybetweenexitandwait,themachinerytoavoidlostwakeups(seenextsection),andprobablyotherthingstoo.
Itmightbeworththinkingaboutwhetherthedierentfunctionsofptable.
lockcouldbesplitup,certainlyforclarityandperhapsforperformance.
DRAFTasofSeptember4,201864https://pdos.
csail.
mit.
edu/6.
828/xv6coroutinessched+codescheduler+codeswtch+codesched+codeforkret+codeptable.
lock+codescheduler+codeptable.
lock+codeRUNNABLE+codeswitchuvm+codeswtch+codeptable.
lock+codeyield+codeRUNNABLE+codescheduler+codep->context+codeptable.
lock+codeptable.
lock+codeexit+codewait+codeCode:mycpuandmyprocxv6maintainsastructcpuforeachprocessor,whichrecordstheprocesscur-rentlyrunningontheprocessor(ifany),theprocessor'suniquehardwareidentier(apicid),andsomeotherinformation.
Thefunctionmycpu(2437)returnsthecurrentprocessor'sstructcpu.
mycpudoesthisbyreadingtheprocessoridentierfromthelocalAPIChardwareandlookingthroughthearrayofstructcpuforanentrywiththatidentier.
Thereturnvalueofmycpuisfragile:ifthetimerweretointerruptandcausethethreadtobemovedtoadierentprocessor,thereturnvaluewouldnolongerbecorrect.
Toavoidthisproblem,xv6requiresthatcallersofmycpudisablein-terrupts,andonlyenablethemaftertheynishusingthereturnedstructcpu.
Thefunctionmyproc(2457)returnsthestructprocpointerfortheprocessthatisrunningonthecurrentprocessor.
myprocdisablesinterrupts,invokesmycpu,fetchesthecurrentprocesspointer(c->proc)outofthestructcpu,andthenenablesinter-rupts.
Ifthereisnoprocessrunning,becausethethecallerisexecutinginscheduler,myprocreturnszero.
Thereturnvalueofmyprocissafetouseevenifinterruptsareenabled:ifatimerinterruptmovesthecallingprocesstoadierentprocessor,itsstructprocpointerwillstaythesame.
SleepandwakeupSchedulingandlockshelpconcealtheexistenceofoneprocessfromanother,butsofarwehavenoabstractionsthathelpprocessesintentionallyinteract.
Sleepandwakeupllthatvoid,allowingoneprocesstosleepwaitingforaneventandanotherprocesstowakeituponcetheeventhashappened.
Sleepandwakeupareoftencalledsequencecoordinationorconditionalsynchronizationmechanisms,andtherearemanyothersimilarmechanismsintheoperatingsystemsliterature.
Toillustratewhatwemean,let'sconsiderasimpleproducer/consumerqueue.
ThisqueueissimilartotheonethatfeedscommandsfromprocessestotheIDEdriv-er(seeChapter3),butabstractsawayallIDE-speciccode.
Thequeueallowsoneprocesstosendanonzeropointertoanotherprocess.
Iftherewereonlyonesenderandonereceiver,andtheyexecutedondierentCPUs,andthecompilerdidn'topti-mizetooagressively,thisimplementationwouldbecorrect:100structq{101void*ptr;102};103104void*105send(structq*q,void*p)106{107while(q->ptr!
=0)108;109q->ptr=p;110}111DRAFTasofSeptember4,201865https://pdos.
csail.
mit.
edu/6.
828/xv6structcpu+codemycpu+codemyproc+codesequencecoordinationconditionalsynchronization112void*113recv(structq*q)114{115void*p;116117while((p=q->ptr)==0)118;119q->ptr=0;120returnp;121}Sendloopsuntilthequeueisempty(ptr==0)andthenputsthepointerpinthequeue.
Recvloopsuntilthequeueisnon-emptyandtakesthepointerout.
Whenrunindierentprocesses,sendandrecvbothmodifyq->ptr,butsendonlywritesthepointerwhenitiszeroandrecvonlywritesthepointerwhenitisnonzero,sonoup-datesarelost.
Theimplementationaboveisexpensive.
Ifthesendersendsrarely,thereceiverwillspendmostofitstimespinninginthewhileloophopingforapointer.
There-ceiver'sCPUcouldndmoreproductiveworkthanbusywaitingbyrepeatedlypollingq->ptr.
AvoidingbusywaitingrequiresawayforthereceivertoyieldtheCPUandresumeonlywhensenddeliversapointer.
Let'simagineapairofcalls,sleepandwakeup,thatworkasfollows.
Sleep(chan)sleepsonthearbitraryvaluechan,calledthewaitchannel.
Sleepputsthecallingprocesstosleep,releasingtheCPUforotherwork.
Wakeup(chan)wakesallprocessessleepingonchan(ifany),causingtheirsleepcallstoreturn.
Ifnopro-cessesarewaitingonchan,wakeupdoesnothing.
Wecanchangethequeueimple-mentationtousesleepandwakeup:201void*202send(structq*q,void*p)203{204while(q->ptr!
=0)205;206q->ptr=p;207wakeup(q);/*wakerecv*/208}209210void*211recv(structq*q)212{213void*p;214215while((p=q->ptr)==0)216sleep(q);217q->ptr=0;218returnp;219}RecvnowgivesuptheCPUinsteadofspinning,whichisnice.
However,itturnsoutnottobestraightforwardtodesignsleepandwakeupwiththisinterfacewithoutsueringfromwhatisknownasthe''lostwake-up''problem(seeFigure5-2).
Supposethatrecvndsthatq->ptr==0online215.
Whilerecvisbetweenlines215andDRAFTasofSeptember4,201866https://pdos.
csail.
mit.
edu/6.
828/xv6busywaitingpollingsleep+codewakeup+codechan+codewaitchannelrecvsend206207216Timesleepwakeupwaitforwakeupforever215teststorep204test205spinforeverFigure5-2.
Examplelostwakeupproblem216,sendrunsonanotherCPU:itchangesq->ptrtobenonzeroandcallswakeup,whichndsnoprocessessleepingandthusdoesnothing.
Nowrecvcontinuesexecut-ingatline216:itcallssleepandgoestosleep.
Thiscausesaproblem:recvisasleepwaitingforapointerthathasalreadyarrived.
Thenextsendwillwaitforrecvtoconsumethepointerinthequeue,atwhichpointthesystemwillbedeadlocked.
Therootofthisproblemisthattheinvariantthatrecvonlysleepswhenq->ptr==0isviolatedbysendrunningatjustthewrongmoment.
Oneincorrectwayofprotectingtheinvariantwouldbetomodifythecodeforrecvasfollows:300structq{301structspinlocklock;302void*ptr;303};304305void*306send(structq*q,void*p)307{308acquire(&q->lock);309while(q->ptr!
=0)310;311q->ptr=p;312wakeup(q);313release(&q->lock);314}315316void*317recv(structq*q)318{319void*p;320321acquire(&q->lock);322while((p=q->ptr)==0)323sleep(q);324q->ptr=0;325release(&q->lock);326returnp;327}OnemighthopethatthisversionofrecvwouldavoidthelostwakeupbecausetheDRAFTasofSeptember4,201867https://pdos.
csail.
mit.
edu/6.
828/xv6deadlockedlockpreventssendfromexecutingbetweenlines322and323.
Itdoesthat,butitalsodeadlocks:recvholdsthelockwhileitsleeps,sothesenderwillblockforeverwaitingforthelock.
We'llxtheprecedingschemebychangingsleep'sinterface:thecallermustpassthelocktosleepsoitcanreleasethelockafterthecallingprocessismarkedasasleepandwaitingonthesleepchannel.
Thelockwillforceaconcurrentsendtowaituntilthereceiverhasnishedputtingitselftosleep,sothatthewakeupwillndthesleepingreceiverandwakeitup.
Oncethereceiverisawakeagainsleepreacquiresthelockbeforereturning.
Ournewcorrectschemeisuseableasfollows:400structq{401structspinlocklock;402void*ptr;403};404405void*406send(structq*q,void*p)407{408acquire(&q->lock);409while(q->ptr!
=0)410;411q->ptr=p;412wakeup(q);413release(&q->lock);414}415416void*417recv(structq*q)418{419void*p;420421acquire(&q->lock);422while((p=q->ptr)==0)423sleep(q,&q->lock);424q->ptr=0;425release(&q->lock);426returnp;427}Thefactthatrecvholdsq->lockpreventssendfromtryingtowakeitupbe-tweenrecv'scheckofq->ptranditscalltosleep.
Weneedsleeptoatomicallyre-leaseq->lockandputthereceivingprocesstosleep.
Acompletesender/receiverimplementationwouldalsosleepinsendwhenwait-ingforareceivertoconsumethevaluefromaprevioussend.
Code:SleepandwakeupLet'slookattheimplementationofsleep(2874)andwakeup(2953).
ThebasicideaistohavesleepmarkthecurrentprocessasSLEEPINGandthencallschedtoreleasetheprocessor;wakeuplooksforaprocesssleepingonthegivenwaitchannelandmarksitasRUNNABLE.
CallersofsleepandwakeupcanuseanymutuallyconvenientDRAFTasofSeptember4,201868https://pdos.
csail.
mit.
edu/6.
828/xv6sleep+codesleep+codewakeup+codeSLEEPING+codesched+codeRUNNABLE+codenumberasthechannel.
Xv6oftenusestheaddressofakerneldatastructureinvolvedinthewaiting.
Sleep(2874)beginswithafewsanitychecks:theremustbeacurrentprocess(2878)andsleepmusthavebeenpassedalock(2881-2882).
Thensleepacquirespt-able.
lock(2891).
Nowtheprocessgoingtosleepholdsbothptable.
lockandlk.
Holdinglkwasnecessaryinthecaller(intheexample,recv):itensuredthatnootherprocess(intheexample,onerunningsend)couldstartacalltowakeup(chan).
Nowthatsleepholdsptable.
lock,itissafetoreleaselk:someotherprocessmaystartacalltowakeup(chan),butwakeupwillnotrununtilitcanacquireptable.
lock,soitmustwaituntilsleephasnishedputtingtheprocesstosleep,keepingthewakeupfrommissingthesleep.
Thereisaminorcomplication:iflkisequalto&ptable.
lock,thensleepwoulddeadlocktryingtoacquireitas&ptable.
lockandthenreleaseitaslk.
Inthiscase,sleepconsiderstheacquireandreleasetocanceleachotheroutandskipsthemen-tirely(2890).
Forexample,wait(2964)callssleepwith&ptable.
lock.
Nowthatsleepholdsptable.
lockandnoothers,itcanputtheprocesstosleepbyrecordingthesleepchannel,changingtheprocessstate,andcallingsched(2895-2898).
Atsomepointlater,aprocesswillcallwakeup(chan).
Wakeup(2964)acquirespt-able.
lockandcallswakeup1,whichdoestherealwork.
Itisimportantthatwakeupholdtheptable.
lockbothbecauseitismanipulatingprocessstatesandbecause,aswejustsaw,ptable.
lockmakessurethatsleepandwakeupdonotmisseachother.
Wakeup1isaseparatefunctionbecausesometimestheschedulerneedstoexecuteawakeupwhenitalreadyholdstheptable.
lock;wewillseeanexampleofthislater.
Wakeup1(2953)loopsovertheprocesstable.
WhenitndsaprocessinstateSLEEPINGwithamatchingchan,itchangesthatprocess'sstatetoRUNNABLE.
Thenexttimetheschedulerruns,itwillseethattheprocessisreadytoberun.
Xv6codealwayscallswakeupwhileholdingthelockthatguardsthesleepcondi-tion;intheexampleabovethatlockisq->lock.
Strictlyspeakingitissucientifwakeupalwaysfollowstheacquire(thatis,onecouldcallwakeupaftertherelease).
Whydothelockingrulesforsleepandwakeupensureasleepingprocesswon'tmissawakeupitneedsThesleepingprocessholdseitherthelockontheconditionortheptable.
lockorbothfromapointbeforeitcheckstheconditiontoapointafteritismarkedassleeping.
Ifaconcurrentthreadcausestheconditiontobetrue,thatthreadmusteitherholdthelockontheconditionbeforethesleepingthreadacquiredit,orafterthesleepingthreadreleaseditinsleep.
Ifbefore,thesleepingthreadmusthaveseenthenewconditionvalue,anddecidedtosleepanyway,soitdoesn'tmatterifitmissesthewakeup.
Ifafter,thentheearliestthewakercouldacquirethelockontheconditionisaftersleepacquiresptable.
lock,sothatwakeup'sacquisitionofpt-able.
lockmustwaituntilsleephascompletelynishedputtingthesleepertosleep.
Thenwakeupwillseethesleepingprocessandwakeitup(unlesssomethingelsewakesituprst).
Itissometimesthecasethatmultipleprocessesaresleepingonthesamechannel;forexample,morethanoneprocessreadingfromapipe.
Asinglecalltowakeupwillwakethemallup.
Oneofthemwillrunrstandacquirethelockthatsleepwascalledwith,and(inthecaseofpipes)readwhateverdataiswaitinginthepipe.
TheDRAFTasofSeptember4,201869https://pdos.
csail.
mit.
edu/6.
828/xv6ptable.
lock+codewakeup+codeptable.
lock+codeptable.
lock+codewakeup1+codewakeup+codeSLEEPING+codechan+codeRUNNABLE+codeotherprocesseswillndthat,despitebeingwokenup,thereisnodatatoberead.
Fromtheirpointofviewthewakeupwas''spurious,''andtheymustsleepagain.
Forthisreasonsleepisalwayscalledinsidealoopthatchecksthecondition.
Noharmisdoneiftwousesofsleep/wakeupaccidentallychoosethesamechan-nel:theywillseespuriouswakeups,butloopingasdescribedabovewilltoleratethisproblem.
Muchofthecharmofsleep/wakeupisthatitisbothlightweight(noneedtocreatespecialdatastructurestoactassleepchannels)andprovidesalayerofindirec-tion(callersneednotknowwhichspecicprocesstheyareinteractingwith).
Code:PipesThesimplequeueweusedearlierinthischapterwasatoy,butxv6containstworealqueuesthatusesleepandwakeuptosynchronizereadersandwriters.
OneisintheIDEdriver:aprocessaddsadiskrequesttoaqueueandthencallssleep.
TheIDEinterrupthandleruseswakeuptoalerttheprocessthatitsrequesthascompleted.
Amorecomplexexampleistheimplementationofpipes.
WesawtheinterfaceforpipesinChapter0:byteswrittentooneendofapipearecopiedinanin-kernelbuerandthencanbereadoutoftheotherendofthepipe.
Futurechapterswillex-aminetheledescriptorsupportsurroundingpipes,butlet'slooknowattheimple-mentationsofpipewriteandpiperead.
Eachpipeisrepresentedbyastructpipe,whichcontainsalockandadatabuer.
Theeldsnreadandnwritecountthenumberofbytesreadfromandwrittentothebuer.
Thebuerwrapsaround:thenextbytewrittenafterbuf[PIPESIZE-1]isbuf[0].
Thecountsdonotwrap.
Thisconventionletstheimplementationdistin-guishafullbuer(nwrite==nread+PIPESIZE)fromanemptybuer(nwrite==nread),butitmeansthatindexingintothebuermustusebuf[nread%PIPESIZE]insteadofjustbuf[nread](andsimilarlyfornwrite).
Let'ssupposethatcallstopipereadandpipewritehappensimultaneouslyontwodierentCPUs.
Pipewrite(6830)beginsbyacquiringthepipe'slock,whichprotectsthecounts,thedata,andtheirassociatedinvariants.
Piperead(6851)thentriestoacquirethelocktoo,butcannot.
Itspinsinacquire(1574)waitingforthelock.
Whilepipereadwaits,pipewriteloopsoverthebytesbeingwritten—addr[0],addr[1],.
.
.
,addr[n-1]—addingeachtothepipeinturn(6844).
Duringthisloop,itcouldhappenthatthebuerlls(6836).
Inthiscase,pipewritecallswakeuptoalertanysleepingreaderstothefactthatthereisdatawaitinginthebuerandthensleepson&p->nwritetowaitforareadertotakesomebytesoutofthebuer.
Sleepreleasesp->lockaspartofputtingpipewrite'sprocesstosleep.
Nowthatp->lockisavailable,pipereadmanagestoacquireitandentersitscrit-icalsection:itndsthatp->nread!
=p->nwrite(6856)(pipewritewenttosleepbe-causep->nwrite==p->nread+PIPESIZE(6836))soitfallsthroughtotheforloop,copiesdataoutofthepipe(6863-6867),andincrementsnreadbythenumberofbytescopied.
Thatmanybytesarenowavailableforwriting,sopipereadcallswakeup(6868)towakeanysleepingwritersbeforeitreturnstoitscaller.
Wakeupndsaprocesssleepingon&p->nwrite,theprocessthatwasrunningpipewritebutstoppedwhenthebuerlled.
ItmarksthatprocessasRUNNABLE.
DRAFTasofSeptember4,201870https://pdos.
csail.
mit.
edu/6.
828/xv6pipewrite+codepiperead+codestructpipe+codeRUNNABLE+codeThepipecodeusesseparatesleepchannelsforreaderandwriter(p->nreadandp->nwrite);thismightmakethesystemmoreecientintheunlikelyeventthattherearelotsofreadersandwriterswaitingforthesamepipe.
Thepipecodesleepsinsidealoopcheckingthesleepcondition;iftherearemultiplereadersorwriters,allbuttherstprocesstowakeupwillseetheconditionisstillfalseandsleepagain.
Code:Wait,exit,andkillSleepandwakeupcanbeusedformanykindsofwaiting.
Aninterestingexample,seeninChapter0,isthewaitsystemcallthataparentprocessusestowaitforachildtoexit.
Whenachildexits,itdoesnotdieimmediately.
Instead,itswitchestotheZOMBIEprocessstateuntiltheparentcallswaittolearnoftheexit.
Theparentisthenresponsibleforfreeingthememoryassociatedwiththeprocessandpreparingthestructprocforreuse.
Iftheparentexitsbeforethechild,theinitprocessadoptsthechildandwaitsforit,sothateverychildhasaparenttocleanupafterit.
Animplementationchallengeisthepossibilityofracesbetweenparentandchildwaitandexit,aswellasexitandexit.
Waitbeginsbyacquiringptable.
lock.
Thenitscanstheprocesstablelookingforchildren.
Ifwaitndsthatthecurrentprocesshaschildrenbutthatnonehaveexited,itcallssleeptowaitforoneofthemtoexit(2707)andscansagain.
Here,thelockbeingreleasedinsleepisptable.
lock,thespecialcasewesawabove.
Exitacquiresptable.
lockandthenwakesupanyprocesssleepingonawaitchannelequaltothecurrentprocess'sparentproc(2651);ifthereissuchaprocess,itwillbetheparentinwait.
Thismaylookpremature,sinceexithasnotmarkedthecurrentprocessasaZOMBIEyet,butitissafe:althoughwakeupmaycausetheparenttorun,theloopinwaitcannotrununtilexitreleasesptable.
lockbycallingschedtoenterthescheduler,sowaitcan'tlookattheexitingprocessuntilafterexithassetitsstatetoZOMBIE(2663).
Beforeexityieldstheprocessor,itreparentsalloftheexitingprocess'schildren,passingthemtotheinitproc(2653-2660).
Finally,exitcallsschedtorelinquishtheCPU.
Iftheparentprocesswassleepinginwait,theschedulerwilleventuallyrunit.
Thecalltosleepreturnsholdingptable.
lock;waitrescanstheprocesstableandndstheexitedchildwithstate==ZOMBIE.
(2657).
Itrecordsthechild'spidandthencleansupthestructproc,freeingthememoryassociatedwiththeprocess(2687-2694).
Thechildprocesscouldhavedonemostofthecleanupduringexit,butitisim-portantthattheparentprocessbetheonetofreep->kstackandp->pgdir:whenthechildrunsexit,itsstacksitsinthememoryallocatedasp->kstackanditusesitsownpagetable.
Theycanonlybefreedafterthechildprocesshasnishedrunningforthelasttimebycallingswtch(viasched).
Thisisonereasonthattheschedulerproce-durerunsonitsownstackratherthanonthestackofthethreadthatcalledsched.
Whileexitallowsaprocesstoterminateitself,kill(2975)letsoneprocessre-questthatanotherbeterminated.
Itwouldbetoocomplexforkilltodirectlyde-stroythevictimprocess,sincethevictimmightbeexecutingonanotherCPUorsleep-ingwhilemidwaythroughupdatingkerneldatastructures.
Toaddressthesechal-lenges,killdoesverylittle:itjustsetsthevictim'sp->killedand,ifitissleeping,DRAFTasofSeptember4,201871https://pdos.
csail.
mit.
edu/6.
828/xv6wait+codeZOMBIE+codestructproc+codeexit+codesched+codep->kstack+codep->pgdir+codeswtch+codewakesitup.
Eventuallythevictimwillenterorleavethekernel,atwhichpointcodeintrapwillcallexitifp->killedisset.
Ifthevictimisrunninginuserspace,itwillsoonenterthekernelbymakingasystemcallorbecausethetimer(orsomeotherdevice)interrupts.
Ifthevictimprocessisinsleep,thecalltowakeupwillcausethevictimprocesstoreturnfromsleep.
Thisispotentiallydangerousbecausetheconditionbeingwait-ingformaynotbetrue.
However,xv6callstosleeparealwayswrappedinawhileloopthatre-teststheconditionaftersleepreturns.
Somecallstosleepalsotestp->killedintheloop,andabandonthecurrentactivityifitisset.
Thisisonlydonewhensuchabandonmentwouldbecorrect.
Forexample,thepipereadandwritecode(6837)returnsifthekilledagisset;eventuallythecodewillreturnbacktotrap,whichwillagainchecktheagandexit.
Somexv6sleeploopsdonotcheckp->killedbecausethecodeisinthemiddleofamulti-stepsystemcallthatshouldbeatomic.
TheIDEdriver(4379)isanexample:itdoesnotcheckp->killedbecauseadiskoperationmaybeoneofasetofwritesthatareallneededinorderforthelesystemtobeleftinacorrectstate.
Toavoidthecomplicationofcleaningupafterapartialoperation,xv6delaysthekillingofaprocessthatisintheIDEdriveruntilsomepointlaterwhenitiseasytokillthepro-cess(e.
g.
,whenthecompletelesystemoperationhascompletedandtheprocessisabouttoreturntouserspace).
RealworldThexv6schedulerimplementsasimpleschedulingpolicy,whichrunseachpro-cessinturn.
Thispolicyiscalledroundrobin.
Realoperatingsystemsimplementmoresophisticatedpoliciesthat,forexample,allowprocessestohavepriorities.
Theideaisthatarunnablehigh-priorityprocesswillbepreferredbythescheduleroverarunnablelow-priorityprocess.
Thesepoliciescanbecomecomplexquicklybecausethereareoftencompetinggoals:forexample,theoperatingmightalsowanttoguaran-teefairnessandhighthroughput.
Inaddition,complexpoliciesmayleadtounintend-edinteractionssuchaspriorityinversionandconvoys.
Priorityinversioncanhappenwhenalow-priorityandhigh-priorityprocesssharealock,whichwhenacquiredbythelow-priorityprocesscanpreventthehigh-priorityprocessfrommakingprogress.
Alongconvoycanformwhenmanyhigh-priorityprocessesarewaitingforalow-pri-orityprocessthatacquiresasharedlock;onceaconvoyhasformeditcanpersistforlongtime.
Toavoidthesekindsofproblemsadditionalmechanismsarenecessaryinsophisticatedschedulers.
Sleepandwakeupareasimpleandeectivesynchronizationmethod,buttherearemanyothers.
Therstchallengeinallofthemistoavoidthe''lostwakeups''prob-lemwesawatthebeginningofthechapter.
TheoriginalUnixkernel'ssleepsimplydisabledinterrupts,whichsucedbecauseUnixranonasingle-CPUsystem.
Becausexv6runsonmultiprocessors,itaddsanexplicitlocktosleep.
FreeBSD'smsleeptakesthesameapproach.
Plan9'ssleepusesacallbackfunctionthatrunswiththeschedulinglockheldjustbeforegoingtosleep;thefunctionservesasalastminutecheckofthesleepcondition,toavoidlostwakeups.
TheLinuxkernel'ssleepusesanDRAFTasofSeptember4,201872https://pdos.
csail.
mit.
edu/6.
828/xv6roundrobinpriorityinversionconvoysexplicitprocessqueueinsteadofawaitchannel;thequeuehasitsowninternallock.
Scanningtheentireprocesslistinwakeupforprocesseswithamatchingchanisinecient.
Abettersolutionistoreplacethechaninbothsleepandwakeupwithadatastructurethatholdsalistofprocessessleepingonthatstructure.
Plan9'ssleepandwakeupcallthatstructurearendezvouspointorRendez.
Manythreadlibrariesre-fertothesamestructureasaconditionvariable;inthatcontext,theoperationssleepandwakeuparecalledwaitandsignal.
Allofthesemechanismssharethesamea-vor:thesleepconditionisprotectedbysomekindoflockdroppedatomicallyduringsleep.
Theimplementationofwakeupwakesupallprocessesthatarewaitingonapar-ticularchannel,anditmightbethecasethatmanyprocessesarewaitingforthatpar-ticularchannel.
Theoperatingsystemwillschedulealltheseprocessesandtheywillracetocheckthesleepcondition.
Processesthatbehaveinthiswayaresometimescalledathunderingherd,anditisbestavoided.
Mostconditionvariableshavetwoprimitivesforwakeup:signal,whichwakesuponeprocess,andbroadcast,whichwakesupallprocesseswaiting.
Semaphoresareanothercommoncoordinationmechanism.
Asemaphoreisanintegervaluewithtwooperations,incrementanddecrement(orupanddown).
Itisawayspossibletoincrementasemaphore,butthesemaphorevalueisnotallowedtodropbelowzero:adecrementofazerosemaphoresleepsuntilanotherprocessincre-mentsthesemaphore,andthenthosetwooperationscancelout.
Theintegervaluetypicallycorrespondstoarealcount,suchasthenumberofbytesavailableinapipebuerorthenumberofzombiechildrenthataprocesshas.
Usinganexplicitcountaspartoftheabstractionavoidsthe''lostwakeup''problem:thereisanexplicitcountofthenumberofwakeupsthathaveoccurred.
Thecountalsoavoidsthespuriouswake-upandthunderingherdproblems.
Terminatingprocessesandcleaningthemupintroducesmuchcomplexityinxv6.
Inmostoperatingsystemsitisevenmorecomplex,because,forexample,thevictimprocessmaybedeepinsidethekernelsleeping,andunwindingitsstackrequiresmuchcarefulprogramming.
Manyoperatingsystemsunwindthestackusingexplicitmecha-nismsforexceptionhandling,suchaslongjmp.
Furthermore,thereareothereventsthatcancauseasleepingprocesstobewokenup,eventhoughtheeventitiswaitingforhasnothappenedyet.
Forexample,whenaUnixprocessissleeping,anotherpro-cessmaysendasignaltoit.
Inthiscase,theprocesswillreturnfromtheinterrupt-edsystemcallwiththevalue-1andwiththeerrorcodesettoEINTR.
Theapplicationcancheckforthesevaluesanddecidewhattodo.
Xv6doesn'tsupportsignalsandthiscomplexitydoesn'tarise.
Xv6'ssupportforkillisnotentirelysatisfactory:therearesleeploopswhichprobablyshouldcheckforp->killed.
Arelatedproblemisthat,evenforsleeploopsthatcheckp->killed,thereisaracebetweensleepandkill;thelattermaysetp->killedandtrytowakeupthevictimjustafterthevictim'sloopchecksp->killedbutbeforeitcallssleep.
Ifthisproblemoccurs,thevictimwon'tnoticethep->killeduntiltheconditionitiswaitingforoccurs.
Thismaybequiteabitlater(e.
g.
,whentheIDEdriverreturnsadiskblockthatthevictimiswaitingfor)ornever(e.
g.
,ifthevictimiswaitingfrominputfromtheconsole,buttheuserdoesn'ttypeanyin-DRAFTasofSeptember4,201873https://pdos.
csail.
mit.
edu/6.
828/xv6thunderingherdsignal+codeput).
Exercises1.
Sleephastochecklk!
=&ptable.
locktoavoidadeadlock(2890-2893).
Sup-posethespecialcasewereeliminatedbyreplacingif(lk!
=&ptable.
lock){acquire(&ptable.
lock);release(lk);}withrelease(lk);acquire(&ptable.
lock);Doingthiswouldbreaksleep.
How2.
Mostprocesscleanupcouldbedonebyeitherexitorwait,butwesawabovethatexitmustnotfreep->stack.
Itturnsoutthatexitmustbetheonetoclosetheopenles.
WhyTheanswerinvolvespipes.
3.
Implementsemaphoresinxv6.
Youcanusemutexesbutdonotusesleepandwakeup.
Replacetheusesofsleepandwakeupinxv6withsemaphores.
Judgethere-sult.
4.
Fixtheracementionedabovebetweenkillandsleep,sothatakillthatoc-cursafterthevictim'ssleeploopchecksp->killedbutbeforeitcallssleepresultsinthevictimabandoningthecurrentsystemcall.
5.
Designaplansothateverysleeploopchecksp->killedsothat,forexample,aprocessthatisintheIDEdrivercanreturnquicklyfromthewhileloopifanotherkillsthatprocess.
6.
Designaplanthatusesonlyonecontextswitchwhenswitchingfromoneuserprocesstoanother.
Thisplaninvolvesrunningtheschedulerprocedureonthekernelstackoftheuserprocess,insteadofthededicatedschedulerstack.
Themainchallengeistocleanupauserprocesscorrectly.
Measuretheperformancebenetofavoidingonecontextswitch.
7.
Modifyxv6toturnoaprocessorwhenitisidleandjustspinningintheloopinscheduler.
(Hint:lookatthex86HLTinstruction.
)8.
Thelockp->lockprotectsmanyinvariants,andwhenlookingataparticularpieceofxv6codethatisprotectedbyp->lock,itcanbediculttogureoutwhichinvariantisbeingenforced.
Designaplanthatismorecleanbyperhapssplittingp->lockinseverallocks.
DRAFTasofSeptember4,201874https://pdos.
csail.
mit.
edu/6.
828/xv6Chapter6FilesystemThepurposeofalesystemistoorganizeandstoredata.
Filesystemstypicallysupportsharingofdataamongusersandapplications,aswellaspersistencesothatdataisstillavailableafterareboot.
Thexv6lesystemprovidesUnix-likeles,directories,andpathnames(seeChap-ter0),andstoresitsdataonanIDEdiskforpersistence(seeChapter3).
Thelesys-temaddressesseveralchallenges:Thelesystemneedson-diskdatastructurestorepresentthetreeofnameddi-rectoriesandles,torecordtheidentitiesoftheblocksthatholdeachle'scon-tent,andtorecordwhichareasofthediskarefree.
Thelesystemmustsupportcrashrecovery.
Thatis,ifacrash(e.
g.
,powerfailure)occurs,thelesystemmuststillworkcorrectlyafterarestart.
Theriskisthatacrashmightinterruptasequenceofupdatesandleaveinconsistenton-diskdatastructures(e.
g.
,ablockthatisbothusedinaleandmarkedfree).
Dierentprocessesmayoperateonthelesystematthesametime,sothelesystemcodemustcoordinatetomaintaininvariants.
Accessingadiskisordersofmagnitudeslowerthanaccessingmemory,sothelesystemmustmaintainanin-memorycacheofpopularblocks.
Therestofthischapterexplainshowxv6addressesthesechallenges.
OverviewThexv6lesystemimplementationisorganizedinsevenlayers,showninFigure6-1.
ThedisklayerreadsandwritesblocksonanIDEharddrive.
Thebuercachelayercachesdiskblocksandsynchronizesaccesstothem,makingsurethatonlyonekernelprocessatatimecanmodifythedatastoredinanyparticularblock.
Thelog-ginglayerallowshigherlayerstowrapupdatestoseveralblocksinatransaction,andensuresthattheblocksareupdatedatomicallyinthefaceofcrashes(i.
e.
,allofthemareupdatedornone).
Theinodelayerprovidesindividualles,eachrepresentedasaninodewithauniquei-numberandsomeblocksholdingthele'sdata.
Thedirectorylayerimplementseachdirectoryasaspecialkindofinodewhosecontentisasequenceofdirectoryentries,eachofwhichcontainsale'snameandi-number.
Thepathnamelayerprovideshierarchicalpathnameslike/usr/rtm/xv6/fs.
c,andresolvesthemwithrecursivelookup.
TheledescriptorlayerabstractsmanyUnixresources(e.
g.
,pipes,devices,les,etc.
)usingthelesysteminterface,simplifyingthelivesofapplica-tionprogrammers.
Thelesystemmusthaveaplanforwhereitstoresinodesandcontentblocksonthedisk.
Todoso,xv6dividesthediskintoseveralsections,asshowninFigure6-2.
Thelesystemdoesnotuseblock0(itholdsthebootsector).
Block1iscalledtheDRAFTasofSeptember4,201875https://pdos.
csail.
mit.
edu/6.
828/xv6persistencecrashrecoverytransactioninodeDirectoryInodeLoggingBuffercacheFigure6-1.
Layersofthexv6lesystem.
superblock;itcontainsmetadataaboutthelesystem(thelesystemsizeinblocks,thenumberofdatablocks,thenumberofinodes,andthenumberofblocksinthelog).
Blocksstartingat2holdthelog.
Afterthelogaretheinodes,withmultipleinodesperblock.
Afterthosecomebitmapblockstrackingwhichdatablocksareinuse.
Theremainingblocksaredatablocks;eachiseithermarkedfreeinthebitmapblock,orholdscontentforaleordirectory.
Thesuperblockislledinbyaseparateprogram,calledmfks,whichbuildsaninitiallesystem.
Therestofthischapterdiscusseseachlayer,startingwiththebuercache.
Lookoutforsituationswherewell-chosenabstractionsatlowerlayerseasethedesignofhigherones.
BuercachelayerThebuercachehastwojobs:(1)synchronizeaccesstodiskblockstoensurethatonlyonecopyofablockisinmemoryandthatonlyonekernelthreadatatimeusesthatcopy;(2)cachepopularblockssothattheydon'tneedtobere-readfromtheslowdisk.
Thecodeisinbio.
c.
Themaininterfaceexportedbythebuercacheconsistsofbreadandbwrite;theformerobtainsabufcontainingacopyofablockwhichcanbereadormodiedinmemory,andthelatterwritesamodiedbuertotheappropriateblockonthedisk.
Akernelthreadmustreleaseabuerbycallingbrelsewhenitisdonewithit.
Thebuercacheusesaper-buersleep-locktoensurethatonlyonethreadatatimeuseseachbuer(andthuseachdiskblock);breadreturnsalockedbuer,andbrelsereleasesthelock.
Let'sreturntothebuercache.
Thebuercachehasaxednumberofbuerstoholddiskblocks,whichmeansthatifthelesystemasksforablockthatisnotal-readyinthecache,thebuercachemustrecycleabuercurrentlyholdingsomeotherblock.
Thebuercacherecyclestheleastrecentlyusedbuerforthenewblock.
TheassumptionisthattheleastrecentlyusedbueristheoneleastlikelytobeusedagainDRAFTasofSeptember4,201876https://pdos.
csail.
mit.
edu/6.
828/xv6superblockmfks+codebread+codebwrite+codebufbrelse+code0bootsuperinodesbitmapdatalog1data.
.
.
.
2Figure6-2.
Structureofthexv6lesystem.
Theheaderfs.
h(4050)containsconstantsanddatastruc-turesdescribingtheexactlayoutofthelesystem.
soon.
Code:BuercacheThebuercacheisadoubly-linkedlistofbuers.
Thefunctionbinit,calledbymain(1230),initializesthelistwiththeNBUFbuersinthestaticarraybuf(4450-4459).
Allotheraccesstothebuercacherefertothelinkedlistviabcache.
head,notthebufarray.
Abuerhastwostatebitsassociatedwithit.
B_VALIDindicatesthatthebuercontainsacopyoftheblock.
B_DIRTYindicatesthatthebuercontenthasbeenmod-iedandneedstobewrittentothedisk.
Bread(4502)callsbgettogetabuerforthegivensector(4506).
Ifthebuerneedstobereadfromdisk,breadcallsiderwtodothatbeforereturningthebuer.
Bget(4466)scansthebuerlistforabuerwiththegivendeviceandsectornum-bers(4472-4480).
Ifthereissuchabuer,bgetacquiresthesleep-lockforthebuer.
bgetthenreturnsthelockedbuer.
Ifthereisnocachedbuerforthegivensector,bgetmustmakeone,possiblyreusingabuerthatheldadierentsector.
Itscansthebuerlistasecondtime,lookingforabuerthatisnotlockedandnotdirty:anysuchbuercanbeused.
Bgeteditsthebuermetadatatorecordthenewdeviceandsectornumberandac-quiresitssleep-lock.
NotethattheassignmenttoflagsclearsB_VALID,thusensuringthatbreadwillreadtheblockdatafromdiskratherthanincorrectlyusingthebuer'spreviouscontents.
Itisimportantthatthereisatmostonecachedbuerperdisksector,toensurethatreadersseewrites,andbecausethelesystemuseslocksonbuersforsynchro-nization.
bgetensuresthisinvariantbyholdingthebache.
lockcontinuouslyfromtherstloop'scheckofwhethertheblockiscachedthroughthesecondloop'sdeclara-tionthattheblockisnowcached(bysettingdev,blockno,andrefcnt).
Thiscausesthecheckforablock'spresenceand(ifnotpresent)thedesignationofabuertoholdtheblocktobeatomic.
Itissafeforbgettoacquirethebuer'ssleep-lockoutsideofthebcache.
lockcriticalsection,sincethenon-zerob->refcntpreventsthebuerfrombeingre-usedforadierentdiskblock.
Thesleep-lockprotectsreadsandwritesoftheblock'sbueredcontent,whilethebcache.
lockprotectsinformationaboutwhichblocksarecached.
Ifallthebuersarebusy,thentoomanyprocessesaresimultaneouslyexecutinglesystemcalls;bgetpanics.
AmoregracefulresponsemightbetosleepuntilaDRAFTasofSeptember4,201877https://pdos.
csail.
mit.
edu/6.
828/xv6binit+codemain+codeNBUF+codebcache.
head+codeB_VALID+codeB_DIRTY+codebget+codeiderw+codebget+codebget+codeB_VALID+codebuerbecamefree,thoughtherewouldthenbeapossibilityofdeadlock.
Oncebreadhasreadthedisk(ifneeded)andreturnedthebuertoitscaller,thecallerhasexclusiveuseofthebuerandcanreadorwritethedatabytes.
Ifthecallerdoesmodifythebuer,itmustcallbwritetowritethechangeddatatodiskbeforereleasingthebuer.
Bwrite(4515)callsiderwtotalktothediskhardware,aftersettingB_DIRTYtoindicatethatiderwshouldwrite(ratherthanread).
Whenthecallerisdonewithabuer,itmustcallbrelsetoreleaseit.
(Thenamebrelse,ashorteningofb-release,iscrypticbutworthlearning:itoriginatedinUnixandisusedinBSD,Linux,andSolaristoo.
)Brelse(4526)releasesthesleep-lockandmovesthebuertothefrontofthelinkedlist(4537-4542).
Movingthebuercausesthelisttobeorderedbyhowrecentlythebuerswereused(meaningreleased):therstbuerinthelististhemostrecentlyused,andthelastistheleastrecentlyused.
Thetwoloopsinbgettakeadvantageofthis:thescanforanexistingbuermustprocesstheentirelistintheworstcase,butcheckingthemostrecentlyusedbuersrst(start-ingatbcache.
headandfollowingnextpointers)willreducescantimewhenthereisgoodlocalityofreference.
Thescantopickabuertoreusepickstheleastrecentlyusedbuerbyscanningbackward(followingprevpointers).
LogginglayerOneofthemostinterestingproblemsinlesystemdesigniscrashrecovery.
Theproblemarisesbecausemanylesystemoperationsinvolvemultiplewritestothedisk,andacrashafterasubsetofthewritesmayleavetheon-disklesysteminanincon-sistentstate.
Forexample,supposeacrashoccursduringletruncation(settingthelengthofaletozeroandfreeingitscontentblocks).
Dependingontheorderofthediskwrites,thecrashmayeitherleaveaninodewithareferencetoacontentblockthatismarkedfree,oritmayleaveanallocatedbutunreferencedcontentblock.
Thelatterisrelativelybenign,butaninodethatreferstoafreedblockislikelytocauseseriousproblemsafterareboot.
Afterreboot,thekernelmightallocatethatblocktoanotherle,andnowwehavetwodierentlespointingunintentionallytothesameblock.
Ifxv6supportedmultipleusers,thissituationcouldbeasecurityproblem,sincetheoldle'sownerwouldbeabletoreadandwriteblocksinthenewle,ownedbyadierentuser.
Xv6solvestheproblemofcrashesduringlesystemoperationswithasimpleformoflogging.
Anxv6systemcalldoesnotdirectlywritetheon-disklesystemdatastructures.
Instead,itplacesadescriptionofallthediskwritesitwishestomakeinalogonthedisk.
Oncethesystemcallhasloggedallofitswrites,itwritesaspecialcommitrecordtothediskindicatingthatthelogcontainsacompleteoperation.
Atthatpointthesystemcallcopiesthewritestotheon-disklesystemdatastructures.
Afterthosewriteshavecompleted,thesystemcallerasesthelogondisk.
Ifthesystemshouldcrashandreboot,thelesystemcoderecoversfromthecrashasfollows,beforerunninganyprocesses.
Ifthelogismarkedascontainingacompleteoperation,thentherecoverycodecopiesthewritestowheretheybelongintheon-disklesystem.
Ifthelogisnotmarkedascontainingacompleteoperation,therecoverycodeignoresthelog.
Therecoverycodenishesbyerasingthelog.
DRAFTasofSeptember4,201878https://pdos.
csail.
mit.
edu/6.
828/xv6bread+codebwrite+codeiderw+codeB_DIRTY+codebrelse+codelogcommitWhydoesxv6'slogsolvetheproblemofcrashesduringlesystemoperationsIfthecrashoccursbeforetheoperationcommits,thenthelogondiskwillnotbemarkedascomplete,therecoverycodewillignoreit,andthestateofthediskwillbeasiftheoperationhadnotevenstarted.
Ifthecrashoccursaftertheoperationcom-mits,thenrecoverywillreplayalloftheoperation'swrites,perhapsrepeatingthemiftheoperationhadstartedtowritethemtotheon-diskdatastructure.
Ineithercase,thelogmakesoperationsatomicwithrespecttocrashes:afterrecovery,eitheralloftheoperation'swritesappearonthedisk,ornoneofthemappear.
LogdesignThelogresidesataknownxedlocation,speciedinthesuperblock.
Itconsistsofaheaderblockfollowedbyasequenceofupdatedblockcopies(''loggedblocks'').
Theheaderblockcontainsanarrayofsectornumbers,oneforeachoftheloggedblocks,andthecountoflogblocks.
Thecountintheheaderblockondiskiseitherzero,indicatingthatthereisnotransactioninthelog,ornon-zero,indicatingthatthelogcontainsacompletecommittedtransactionwiththeindicatednumberofloggedblocks.
Xv6writestheheaderblockwhenatransactioncommits,butnotbefore,andsetsthecounttozeroaftercopyingtheloggedblockstothelesystem.
Thusacrashmidwaythroughatransactionwillresultinacountofzerointhelog'sheaderblock;acrashafteracommitwillresultinanon-zerocount.
Eachsystemcall'scodeindicatesthestartandendofthesequenceofwritesthatmustbeatomicwithrespecttocrashes.
Toallowconcurrentexecutionoflesystemoperationsbydierentprocesses,theloggingsystemcanaccumulatethewritesofmul-tiplesystemcallsintoonetransaction.
Thusasinglecommitmayinvolvethewritesofmultiplecompletesystemcalls.
Toavoidsplittingasystemcallacrosstransactions,theloggingsystemonlycommitswhennolesystemsystemcallsareunderway.
Theideaofcommittingseveraltransactionstogetherisknownasgroupcommit.
Groupcommitreducesthenumberofdiskoperationsbecauseitamortizesthexedcostofacommitovermultipleoperations.
Groupcommitalsohandsthedisksystemmoreconcurrentwritesatthesametime,perhapsallowingthedisktowritethemallduringasinglediskrotation.
Xv6'sIDEdriverdoesn'tsupportthiskindofbatching,butxv6'slesystemdesignallowsforit.
Xv6dedicatesaxedamountofspaceonthedisktoholdthelog.
Thetotalnumberofblockswrittenbythesystemcallsinatransactionmusttinthatspace.
Thishastwoconsequences.
Nosinglesystemcallcanbeallowedtowritemoredis-tinctblocksthanthereisspaceinthelog.
Thisisnotaproblemformostsystemcalls,buttwoofthemcanpotentiallywritemanyblocks:writeandunlink.
Alargelewritemaywritemanydatablocksandmanybitmapblocksaswellasaninodeblock;unlinkingalargelemightwritemanybitmapblocksandaninode.
Xv6'swritesys-temcallbreaksuplargewritesintomultiplesmallerwritesthattinthelog,andun-linkdoesn'tcauseproblemsbecauseinpracticethexv6lesystemusesonlyonebitmapblock.
Theotherconsequenceoflimitedlogspaceisthattheloggingsystemcannotallowasystemcalltostartunlessitiscertainthatthesystemcall'swriteswilltinthespaceremaininginthelog.
DRAFTasofSeptember4,201879https://pdos.
csail.
mit.
edu/6.
828/xv6groupcommitbatchingwrite+codeunlink+codeCode:loggingAtypicaluseoftheloginasystemcalllookslikethis:begin_op();.
.
.
bp=bread(.
.
.
);bp->data[log_write(bp);.
.
.
end_op();begin_op(4828)waitsuntiltheloggingsystemisnotcurrentlycommitting,anduntilthereisenoughunreservedlogspacetoholdthewritesfromthiscall.
log.
outstandingcountsthenumberofsystemcallsthathavereservedlogspace;thetotalreservedspaceislog.
outstandingtimesMAXOPBLOCKS.
Incrementinglog.
outstandingbothreservesspaceandpreventsacommitfromoccuringduringthissystemcall.
ThecodeconservativelyassumesthateachsystemcallmightwriteuptoMAXOPBLOCKSdistinctblocks.
log_write(4922)actsasaproxyforbwrite.
Itrecordstheblock'ssectornumberinmemory,reservingitaslotinthelogondisk,andmarksthebuerB_DIRTYtopre-venttheblockcachefromevictingit.
Theblockmuststayinthecacheuntilcommit-ted:untilthen,thecachedcopyistheonlyrecordofthemodication;itcannotbewrittentoitsplaceondiskuntilaftercommit;andotherreadsinthesametransactionmustseethemodications.
log_writenoticeswhenablockiswrittenmultipletimesduringasingletransaction,andallocatesthatblockthesameslotinthelog.
Thisop-timizationisoftencalledabsorption.
Itiscommonthat,forexample,thediskblockcontaininginodesofseverallesiswrittenseveraltimeswithinatransaction.
Byab-sorbingseveraldiskwritesintoone,thelesystemcansavelogspaceandcanachievebetterperformancebecauseonlyonecopyofthediskblockmustbewrittentodisk.
end_op(4853)rstdecrementsthecountofoutstandingsystemcalls.
Ifthecountisnowzero,itcommitsthecurrenttransactionbycallingcommit().
Therearefourstagesinthisprocess.
write_log()(4885)copieseachblockmodiedinthetransac-tionfromthebuercachetoitsslotinthelogondisk.
write_head()(4804)writestheheaderblocktodisk:thisisthecommitpoint,andacrashafterthewritewillre-sultinrecoveryreplayingthetransaction'swritesfromthelog.
install_trans(4772)readseachblockfromthelogandwritesittotheproperplaceinthelesystem.
Fi-nallyend_opwritesthelogheaderwithacountofzero;thishastohappenbeforethenexttransactionstartswritingloggedblocks,sothatacrashdoesn'tresultinrecoveryusingonetransaction'sheaderwiththesubsequenttransaction'sloggedblocks.
recover_from_log(4818)iscalledfrominitlog(4756),whichiscalledduringbootbeforetherstuserprocessruns.
(2865)Itreadsthelogheader,andmimicstheactionsofend_opiftheheaderindicatesthatthelogcontainsacommittedtransac-tion.
Anexampleuseofthelogoccursinfilewrite(6002).
Thetransactionlookslikethis:DRAFTasofSeptember4,201880https://pdos.
csail.
mit.
edu/6.
828/xv6begin_op+codelog_write+codebwrite+codeabsorptionend_op+codeinstall_trans+coderecover_from_log+codinitlog+codelewrite+codebegin_op();ilock(f->ip);r=writei(f->ip,.
.
.
);iunlock(f->ip);end_op();Thiscodeiswrappedinaloopthatbreaksuplargewritesintoindividualtransactionsofjustafewsectorsatatime,toavoidoverowingthelog.
Thecalltowriteiwritesmanyblocksaspartofthistransaction:thele'sinode,oneormorebitmapblocks,andsomedatablocks.
Code:BlockallocatorFileanddirectorycontentisstoredindiskblocks,whichmustbeallocatedfromafreepool.
xv6'sblockallocatormaintainsafreebitmapondisk,withonebitperblock.
Azerobitindicatesthatthecorrespondingblockisfree;aonebitindicatesthatitisinuse.
Theprogrammkfssetsthebitscorrespondingtothebootsector,su-perblock,logblocks,inodeblocks,andbitmapblocks.
Theblockallocatorprovidestwofunctions:ballocallocatesanewdiskblock,andbfreefreesablock.
BallocTheloopinballocat(5022)considerseveryblock,startingatblock0uptosb.
size,thenumberofblocksinthelesystem.
Itlooksforablockwhosebitmapbitiszero,indicatingthatitisfree.
Ifballocndssuchablock,itupdatesthebitmapandreturnstheblock.
Foreciency,theloopissplitintotwopieces.
Theouterloopreadseachblockofbitmapbits.
TheinnerloopchecksallBPBbitsinasinglebitmapblock.
Theracethatmightoccuriftwoprocessestrytoallocateablockatthesametimeispreventedbythefactthatthebuercacheonlyletsoneprocessuseanyonebitmapblockatatime.
Bfree(5052)ndstherightbitmapblockandclearstherightbit.
Againtheexclu-siveuseimpliedbybreadandbrelseavoidstheneedforexplicitlocking.
Aswithmuchofthecodedescribedintheremainderofthischapter,ballocandbfreemustbecalledinsideatransaction.
InodelayerTheterminodecanhaveoneoftworelatedmeanings.
Itmightrefertotheon-diskdatastructurecontainingale'ssizeandlistofdatablocknumbers.
Or''inode''mightrefertoanin-memoryinode,whichcontainsacopyoftheon-diskinodeaswellasextrainformationneededwithinthekernel.
Theon-diskinodesarepackedintoacontiguousareaofdiskcalledtheinodeblocks.
Everyinodeisthesamesize,soitiseasy,givenanumbern,tondthenthinodeonthedisk.
Infact,thisnumbern,calledtheinodenumberori-number,ishowinodesareidentiedintheimplementation.
Theon-diskinodeisdenedbyastructdinode(4078).
Thetypeelddistin-guishesbetweenles,directories,andspecialles(devices).
Atypeofzeroindicatesthatanon-diskinodeisfree.
Thenlinkeldcountsthenumberofdirectoryentriesthatrefertothisinode,inordertorecognizewhentheon-diskinodeanditsdataDRAFTasofSeptember4,201881https://pdos.
csail.
mit.
edu/6.
828/xv6writei+codeballoc+codebfree+codeinodestructdinode+codeblocksshouldbefreed.
Thesizeeldrecordsthenumberofbytesofcontentinthele.
Theaddrsarrayrecordstheblocknumbersofthediskblocksholdingthele'scontent.
Thekernelkeepsthesetofactiveinodesinmemory;structinode(4162)isthein-memorycopyofastructdinodeondisk.
ThekernelstoresaninodeinmemoryonlyifthereareCpointersreferringtothatinode.
TherefeldcountsthenumberofCpointersreferringtothein-memoryinode,andthekerneldiscardstheinodefrommemoryifthereferencecountdropstozero.
Theigetandiputfunctionsacquireandreleasepointerstoaninode,modifyingthereferencecount.
Pointerstoaninodecancomefromledescriptors,currentworkingdirectories,andtransientkernelcodesuchasexec.
Therearefourlockorlock-likemechanismsinxv6'sinodecode.
icache.
lockprotectstheinvariantthataninodeispresentinthecacheatmostonce,andthein-variantthatacachedinode'srefeldcountsthenumberofin-memorypointerstothecachedinode.
Eachin-memoryinodehasalockeldcontainingasleep-lock,whichensuresexclusiveaccesstotheinode'selds(suchaslelength)aswellastotheinode'sleordirectorycontentblocks.
Aninode'sref,ifitisgreaterthanzero,causesthesystemtomaintaintheinodeinthecache,andnotre-usethecacheentryforadierentinode.
Finally,eachinodecontainsanlinkeld(ondiskandcopiedinmemoryifitiscached)thatcountsthenumberofdirectoryentriesthatrefertoale;xv6won'tfreeaninodeifitslinkcountisgreaterthanzero.
Astructinodepointerreturnedbyiget()isguaranteedtobevaliduntilthecorrespondingcalltoiput();theinodewon'tbedeleted,andthememoryreferredtobythepointerwon'tbere-usedforadierentinode.
iget()providesnon-exclusiveaccesstoaninode,sothattherecanbemanypointerstothesameinode.
Manypartsofthelesystemcodedependonthisbehaviorofiget(),bothtoholdlong-termref-erencestoinodes(asopenlesandcurrentdirectories)andtopreventraceswhileavoidingdeadlockincodethatmanipulatesmultipleinodes(suchaspathnamelookup).
Thestructinodethatigetreturnsmaynothaveanyusefulcontent.
Inordertoensureitholdsacopyoftheon-diskinode,codemustcallilock.
Thislockstheinode(sothatnootherprocesscanilockit)andreadstheinodefromthedisk,ifithasnotalreadybeenread.
iunlockreleasesthelockontheinode.
Separatingacqui-sitionofinodepointersfromlockinghelpsavoiddeadlockinsomesituations,forex-ampleduringdirectorylookup.
MultipleprocessescanholdaCpointertoaninodereturnedbyiget,butonlyoneprocesscanlocktheinodeatatime.
TheinodecacheonlycachesinodestowhichkernelcodeordatastructuresholdCpointers.
Itsmainjobisreallysynchronizingaccessbymultipleprocesses;cachingissecondary.
Ifaninodeisusedfrequently,thebuercachewillprobablykeepitinmemoryifitisn'tkeptbytheinodecache.
Theinodecacheiswrite-through,whichmeansthatcodethatmodiesacachedinodemustimmediatelywriteittodiskwithiupdate.
Code:InodesDRAFTasofSeptember4,201882https://pdos.
csail.
mit.
edu/6.
828/xv6structinode+codeiget+codeiput+codeilock+codeToallocateanewinode(forexample,whencreatingale),xv6callsialloc(5204).
Iallocissimilartoballoc:itloopsovertheinodestructuresonthedisk,oneblockatatime,lookingforonethatismarkedfree.
Whenitndsone,itclaimsitbywritingthenewtypetothediskandthenreturnsanentryfromtheinodecachewiththetailcalltoiget(5218).
Thecorrectoperationofiallocdependsonthefactthatonlyoneprocessatatimecanbeholdingareferencetobp:ialloccanbesurethatsomeotherprocessdoesnotsimultaneouslyseethattheinodeisavailableandtrytoclaimit.
Iget(5254)looksthroughtheinodecacheforanactiveentry(ip->ref>0)withthedesireddeviceandinodenumber.
Ifitndsone,itreturnsanewreferencetothatinode.
(5263-5267).
Asigetscans,itrecordsthepositionoftherstemptyslot(5268-5269),whichitusesifitneedstoallocateacacheentry.
Codemustlocktheinodeusingilockbeforereadingorwritingitsmetadataorcontent.
Ilock(5303)usesasleep-lockforthispurpose.
Onceilockhasexclusiveac-cesstotheinode,itreadstheinodefromdisk(morelikely,thebuercache)ifneeded.
Thefunctioniunlock(5331)releasesthesleep-lock,whichmaycauseanyprocessessleepingtobewokenup.
Iput(5358)releasesaCpointertoaninodebydecrementingthereferencecount(5376).
Ifthisisthelastreference,theinode'sslotintheinodecacheisnowfreeandcanbere-usedforadierentinode.
IfiputseesthattherearenoCpointerreferencestoaninodeandthattheinodehasnolinkstoit(occursinnodirectory),thentheinodeanditsdatablocksmustbefreed.
Iputcallsitrunctotruncatetheletozerobytes,freeingthedatablocks;setstheinodetypeto0(unallocated);andwritestheinodetodisk(5366).
Thelockingprotocoliniputinthecaseinwhichitfreestheinodedeservesacloserlook.
Onedangeristhataconcurrentthreadmightbewaitinginilocktousethisinode(e.
g.
toreadaleorlistadirectory),andwon'tbepreparedtondthein-odeisnotlongerallocated.
Thiscan'thappenbecausethereisnowayforasystemcalltogetapointertoacachedinodeifithasnolinkstoitandip->refisone.
Thatonereferenceisthereferenceownedbythethreadcallingiput.
It'struethatiputchecksthatthereferencecountisoneoutsideofitsicache.
lockcriticalsection,butatthatpointthelinkcountisknowntobezero,sonothreadwilltrytoacquireanewrefer-ence.
Theothermaindangeristhataconcurrentcalltoiallocmightchoosethesameinodethatiputisfreeing.
Thiscanonlyhappenaftertheiupdatewritesthedisksothattheinodehastypezero.
Thisraceisbenign;theallocatingthreadwillpo-litelywaittoacquiretheinode'ssleep-lockbeforereadingorwritingtheinode,atwhichpointiputisdonewithit.
iput()canwritetothedisk.
Thismeansthatanysystemcallthatusesthelesystemmaywritethedisk,becausethesystemcallmaybethelastonehavingarefer-encetothele.
Evencallslikeread()thatappeartoberead-only,mayendupcallingiput().
This,inturn,meansthatevenread-onlysystemcallsmustbewrappedintransactionsiftheyusethelesystem.
Thereisachallenginginteractionbetweeniput()andcrashes.
iput()doesn'ttruncatealeimmediatelywhenthelinkcountfortheledropstozero,becausesomeprocessmightstillholdareferencetotheinodeinmemory:aprocessmightstillDRAFTasofSeptember4,201883https://pdos.
csail.
mit.
edu/6.
828/xv6ialloc+codeballoc+codeiget+codeiget+codeilock+codeilock+codeiunlock+codeiput+codeitrunc+codeiput+codetypemajorminornlinksizeaddress1.
.
.
.
.
address12indirectdinodeaddress1address128.
.
.
.
.
indirectblockdatadatadatadata.
.
.
.
.
.
Figure6-3.
Therepresentationofaleondisk.
bereadingandwritingtothele,becauseitsuccessfullyopenedit.
But,ifacrashhap-pensbeforethelastprocessclosestheledescriptorforthele,thenthelewillbemarkedallocatedondiskbutnodirectoryentrypointstoit.
Filesystemshandlethiscaseinoneoftwoways.
Thesimplesolutionisthatonrecovery,afterreboot,thelesystemscansthewholelesystemforlesthataremarkedallocated,buthavenodirectoryentrypointingtothem.
Ifanysuchleexists,thenitcanfreethoseles.
Thesecondsolutiondoesn'trequirescanningthelesystem.
Inthissolution,thelesystemrecordsondisk(e.
g.
,inthesuperblock)theinodeinumberofalewhoselinkcountdropstozerobutwhosereferencecountisn'tzero.
Ifthelesystemre-movesthelewhenitsreferencecountsreaches0,thenitupdatestheon-disklistbyremovingthatinodefromthelist.
Onrecovery,thelesystemfreesanyleinthelist.
Xv6implementsneithersolution,whichmeansthatinodesmaybemarkedallo-catedondisk,eventhoughtheyarenotinuseanymore.
Thismeansthatovertimexv6runstheriskthatitmayrunoutofdiskspace.
Code:InodecontentTheon-diskinodestructure,structdinode,containsasizeandanarrayofblocknumbers(seeFigure6-3).
Theinodedataisfoundintheblockslistedinthedinode'saddrsarray.
TherstNDIRECTblocksofdataarelistedintherstNDIRECTDRAFTasofSeptember4,201884https://pdos.
csail.
mit.
edu/6.
828/xv6structdinode+codeNDIRECT+codeentriesinthearray;theseblocksarecalleddirectblocks.
ThenextNINDIRECTblocksofdataarelistednotintheinodebutinadatablockcalledtheindirectblock.
Thelastentryintheaddrsarraygivestheaddressoftheindirectblock.
Thustherst6kB(NDIRECT*BSIZE)bytesofalecanbeloadedfromblockslistedintheinode,whilethenext64kB(NINDIRECT*BSIZE)bytescanonlybeloadedafterconsultingtheindi-rectblock.
Thisisagoodon-diskrepresentationbutacomplexoneforclients.
Thefunctionbmapmanagestherepresentationsothathigher-levelroutinessuchasreadiandwritei,whichwewillseeshortly.
Bmapreturnsthediskblocknumberofthebn'thdatablockfortheinodeip.
Ifipdoesnothavesuchablockyet,bmapallocatesone.
Thefunctionbmap(5410)beginsbypickingotheeasycase:therstNDIRECTblocksarelistedintheinodeitself(5415-5419).
ThenextNINDIRECTblocksarelistedintheindirectblockatip->addrs[NDIRECT].
Bmapreadstheindirectblock(5426)andthenreadsablocknumberfromtherightpositionwithintheblock(5427).
IftheblocknumberexceedsNDIRECT+NINDIRECT,bmappanics;writeicontainsthecheckthatpreventsthisfromhappening(5566).
Bmapallocatesblocksasneeded.
Anip->addrs[]orindirectentryofzeroindi-catesthatnoblockisallocated.
Asbmapencounterszeros,itreplacesthemwiththenumbersoffreshblocks,allocatedondemand.
(5416-5417,5424-5425).
itruncfreesale'sblocks,resettingtheinode'ssizetozero.
Itrunc(5456)startsbyfreeingthedirectblocks(5462-5467),thentheoneslistedintheindirectblock(5472-5475),andnallytheindirectblockitself(5477-5478).
Bmapmakesiteasyforreadiandwriteitogetataninode'sdata.
Readi(5503)startsbymakingsurethattheosetandcountarenotbeyondtheendofthele.
Readsthatstartbeyondtheendofthelereturnanerror(5514-5515)whilereadsthatstartatorcrosstheendofthelereturnfewerbytesthanrequested(5516-5517).
Themainloopprocesseseachblockofthele,copyingdatafromthebuerintodst(5519-5524).
writei(5553)isidenticaltoreadi,withthreeexceptions:writesthatstartatorcrosstheendofthelegrowthele,uptothemaximumlesize(5566-5567);theloopcopiesdataintothebuersinsteadofout(5572);andifthewritehasextendedthele,writeimustupdateitssize(5577-5580).
Bothreadiandwriteibeginbycheckingforip->type==T_DEV.
Thiscasehandlesspecialdeviceswhosedatadoesnotliveinthelesystem;wewillreturntothiscaseintheledescriptorlayer.
Thefunctionstati(5488)copiesinodemetadataintothestatstructure,whichisexposedtouserprogramsviathestatsystemcall.
Code:directorylayerAdirectoryisimplementedinternallymuchlikeale.
ItsinodehastypeT_DIRanditsdataisasequenceofdirectoryentries.
Eachentryisastructdirent(4115),whichcontainsanameandaninodenumber.
ThenameisatmostDIRSIZ(14)char-acters;ifshorter,itisterminatedbyaNUL(0)byte.
Directoryentrieswithinodenumberzeroarefree.
Thefunctiondirlookup(5611)searchesadirectoryforanentrywiththegivenDRAFTasofSeptember4,201885https://pdos.
csail.
mit.
edu/6.
828/xv6directblocksNINDIRECT+codeindirectblockBSIZE+codebmap+codereadi+codewritei+codebmap+codeNDIRECT+codeNINDIRECT+codeitrunc+codereadi+codewritei+codewritei+codereadi+codewritei+codereadi+codewritei+codeT_DEV+codestati+codestat+codeT_DIR+codestructdirent+codeDIRSIZ+codedirlookup+codename.
Ifitndsone,itreturnsapointertothecorrespondinginode,unlocked,andsets*pofftothebyteosetoftheentrywithinthedirectory,incasethecallerwishestoeditit.
Ifdirlookupndsanentrywiththerightname,itupdates*poff,releasestheblock,andreturnsanunlockedinodeobtainedviaiget.
Dirlookupisthereasonthatigetreturnsunlockedinodes.
Thecallerhaslockeddp,soifthelookupwasfor.
,analiasforthecurrentdirectory,attemptingtolocktheinodebeforereturningwouldtrytore-lockdpanddeadlock.
(Therearemorecomplicateddeadlockscenar-iosinvolvingmultipleprocessesand.
.
,analiasfortheparentdirectory;.
isnottheonlyproblem.
)Thecallercanunlockdpandthenlockip,ensuringthatitonlyholdsonelockatatime.
Thefunctiondirlink(5652)writesanewdirectoryentrywiththegivennameandinodenumberintothedirectorydp.
Ifthenamealreadyexists,dirlinkreturnsanerror(5658-5662).
Themainloopreadsdirectoryentrieslookingforanunallocatedentry.
Whenitndsone,itstopstheloopearly(5622-5623),withoffsettotheosetoftheavailableentry.
Otherwise,theloopendswithoffsettodp->size.
Eitherway,dirlinkthenaddsanewentrytothedirectorybywritingatosetoff(5672-5675).
Code:PathnamesPathnamelookupinvolvesasuccessionofcallstodirlookup,oneforeachpathcomponent.
Namei(5790)evaluatespathandreturnsthecorrespondinginode.
Thefunctionnameiparentisavariant:itstopsbeforethelastelement,returningtheinodeoftheparentdirectoryandcopyingthenalelementintoname.
Bothcallthegeneral-izedfunctionnamextodotherealwork.
Namex(5755)startsbydecidingwherethepathevaluationbegins.
Ifthepathbe-ginswithaslash,evaluationbeginsattheroot;otherwise,thecurrentdirectory(5759-5762).
Thenitusesskipelemtoconsidereachelementofthepathinturn(5764).
Eachiterationoftheloopmustlookupnameinthecurrentinodeip.
Theiterationbeginsbylockingipandcheckingthatitisadirectory.
Ifnot,thelookupfails(5765-5769).
(Lockingipisnecessarynotbecauseip->typecanchangeunderfoot—itcan't—butbecauseuntililockruns,ip->typeisnotguaranteedtohavebeenloadedfromdisk.
)Ifthecallisnameiparentandthisisthelastpathelement,theloopstopsearly,asperthedenitionofnameiparent;thenalpathelementhasalreadybeencopiedintoname,sonamexneedonlyreturntheunlockedip(5770-5774).
Finally,thelooplooksforthepathelementusingdirlookupandpreparesforthenextiterationbysettingip=next(5775-5780).
Whenthelooprunsoutofpathelements,itreturnsip.
Theprocedurenamexmaytakealongtimetocomplete:itcouldinvolveseveraldiskoperationstoreadinodesanddirectoryblocksforthedirectoriestraversedinthepathname(iftheyarenotinthebuercache).
Xv6iscarefullydesignedsothatifaninvocationofnamexbyonekernelthreadisblockedonadiskI/O,anotherkernelthreadlookingupadierentpathnamecanproceedconcurrently.
namexlockseachdirectoryinthepathseparatelysothatlookupsindierentdirectoriescanproceedinparallel.
Thisconcurrencyintroducessomechallenges.
Forexample,whileonekernelthreadislookingupapathnameanotherkernelthreadmaybechangingthedirectoryDRAFTasofSeptember4,201886https://pdos.
csail.
mit.
edu/6.
828/xv6iget+code.
+code.
.
+codedirlink+codedirlookup+codenameiparent+codenamex+codeskipelem+codeilock+codenameiparent+codenamex+codedirlookup+codetreebyunlinkingadirectory.
Apotentialriskisthatalookupmaybesearchingadi-rectorythathasbeendeletedbyanotherkernelthreadanditsblockshavebeenre-usedforanotherdirectoryorle.
Xv6avoidssuchraces.
Forexample,whenexecutingdirlookupinnamex,thelookupthreadholdsthelockonthedirectoryanddirlookupreturnsaninodethatwasobtainedusingiget.
igetincreasesthereferencecountoftheinode.
Onlyafterreceivingtheinodefromdirlookupdoesnamexreleasethelockonthedirectory.
Nowanotherthreadmayunlinktheinodefromthedirectorybutxv6willnotdeletetheinodeyet,becausethereferencecountoftheinodeisstilllargerthanzero.
Anotherriskisdeadlock.
Forexample,nextpointstothesameinodeasipwhenlookingup".
".
Lockingnextbeforereleasingthelockonipwouldresultinadeadlock.
Toavoidthisdeadlock,namexunlocksthedirectorybeforeobtainingalockonnext.
Hereagainweseewhytheseparationbetweenigetandilockisimportant.
FiledescriptorlayerAcoolaspectoftheUnixinterfaceisthatmostresourcesinUnixarerepresentedasles,includingdevicessuchastheconsole,pipes,andofcourse,realles.
Theledescriptorlayeristhelayerthatachievesthisuniformity.
Xv6giveseachprocessitsowntableofopenles,orledescriptors,aswesawinChapter0.
Eachopenleisrepresentedbyastructfile(4150),whichisawrapperaroundeitheraninodeorapipe,plusani/ooset.
Eachcalltoopencreatesanewopenle(anewstructfile):ifmultipleprocessesopenthesameleindependently,thedierentinstanceswillhavedierenti/oosets.
Ontheotherhand,asingleopenle(thesamestructfile)canappearmultipletimesinoneprocess'sletableandalsointheletablesofmultipleprocesses.
Thiswouldhappenifoneprocessusedopentoopentheleandthencreatedaliasesusingduporshareditwithachildusingfork.
Areferencecounttracksthenumberofreferencestoaparticularopenle.
Alecanbeopenforreadingorwritingorboth.
Thereadableandwritableeldstrackthis.
Alltheopenlesinthesystemarekeptinagloballetable,theftable.
Theletablehasafunctiontoallocateale(filealloc),createaduplicatereference(filedup),releaseareference(fileclose),andreadandwritedata(filereadandfilewrite).
Therstthreefollowthenow-familiarform.
Filealloc(5876)scanstheletableforanunreferencedle(f->ref==0)andreturnsanewreference;filedup(5902)in-crementsthereferencecount;andfileclose(5914)decrementsit.
Whenale'srefer-encecountreacheszero,fileclosereleasestheunderlyingpipeorinode,accordingtothetype.
Thefunctionsfilestat,fileread,andfilewriteimplementthestat,read,andwriteoperationsonles.
Filestat(5952)isonlyallowedoninodesandcallsstati.
Filereadandfilewritecheckthattheoperationisallowedbytheopenmodeandthenpassthecallthroughtoeitherthepipeorinodeimplementation.
Ifthelerepresentsaninode,filereadandfilewriteusethei/oosetastheosetfortheoperationandthenadvanceit(5975-5976,6015-6016).
Pipeshavenoconceptofo-DRAFTasofSeptember4,201887https://pdos.
csail.
mit.
edu/6.
828/xv6structle+codeopen+codedup+codefork+codeftable+codelealloc+codeledup+codeleclose+codeleread+codelewrite+codeledup+codeleclose+codelestat+codeleread+codelewrite+codestat+coderead+codewrite+codestati+codeset.
Recallthattheinodefunctionsrequirethecallertohandlelocking(5955-5957,5974-5977,6025-6028).
Theinodelockinghastheconvenientsideeectthatthereadandwriteosetsareupdatedatomically,sothatmultiplewritingtothesamelesimultaneouslycannotoverwriteeachother'sdata,thoughtheirwritesmayendupinterlaced.
Code:SystemcallsWiththefunctionsthatthelowerlayersprovidetheimplementationofmostsys-temcallsistrivial(seesysfile.
c).
Thereareafewcallsthatdeserveacloserlook.
Thefunctionssys_linkandsys_unlinkeditdirectories,creatingorremovingreferencestoinodes.
Theyareanothergoodexampleofthepowerofusingtransac-tions.
Sys_link(6202)beginsbyfetchingitsarguments,twostringsoldandnew(6207).
Assumingoldexistsandisnotadirectory(6211-6214),sys_linkincrementsitsip->nlinkcount.
Thensys_linkcallsnameiparenttondtheparentdirectoryandnalpathelementofnew(6227)andcreatesanewdirectoryentrypointingatold'sin-ode(6230).
Thenewparentdirectorymustexistandbeonthesamedeviceastheex-istinginode:inodenumbersonlyhaveauniquemeaningonasingledisk.
Ifanerrorlikethisoccurs,sys_linkmustgobackanddecrementip->nlink.
Transactionssimplifytheimplementationbecauseitrequiresupdatingmultiplediskblocks,butwedon'thavetoworryabouttheorderinwhichwedothem.
Theyei-therwillallsucceedornone.
Forexample,withouttransactions,updatingip->nlinkbeforecreatingalink,wouldputthelesystemtemporarilyinanunsafestate,andacrashinbetweencouldresultinhavoc.
Withtransactionswedon'thavetoworryaboutthis.
Sys_linkcreatesanewnameforanexistinginode.
Thefunctioncreate(6357)createsanewnameforanewinode.
Itisageneralizationofthethreelecreationsystemcalls:openwiththeO_CREATEagmakesanewordinaryle,mkdirmakesanewdirectory,andmkdevmakesanewdevicele.
Likesys_link,createstartsbycalingnameiparenttogettheinodeoftheparentdirectory.
Itthencallsdirlookuptocheckwhetherthenamealreadyexists(6367).
Ifthenamedoesexist,create'sbe-haviordependsonwhichsystemcallitisbeingusedfor:openhasdierentsemanticsfrommkdirandmkdev.
Ifcreateisbeingusedonbehalfofopen(type==T_FILE)andthenamethatexistsisitselfaregularle,thenopentreatsthatasasuccess,socreatedoestoo(6371).
Otherwise,itisanerror(6372-6373).
Ifthenamedoesnotal-readyexist,createnowallocatesanewinodewithialloc(6376).
Ifthenewinodeisadirectory,createinitializesitwith.
and.
.
entries.
Finally,nowthatthedataisinitializedproperly,createcanlinkitintotheparentdirectory(6389).
Create,likesys_link,holdstwoinodelockssimultaneously:ipanddp.
Thereisnopossibilityofdeadlockbecausetheinodeipisfreshlyallocated:nootherprocessinthesystemwillholdip'slockandthentrytolockdp.
Usingcreate,itiseasytoimplementsys_open,sys_mkdir,andsys_mknod.
Sys_open(6401)isthemostcomplex,becausecreatinganewleisonlyasmallpartofwhatitcando.
IfopenispassedtheO_CREATEag,itcallscreate(6414).
Otherwise,itcallsnamei(6420).
Createreturnsalockedinode,butnameidoesnot,sosys_openmustlocktheinodeitself.
ThisprovidesaconvenientplacetocheckthatdirectoriesDRAFTasofSeptember4,201888https://pdos.
csail.
mit.
edu/6.
828/xv6sys_link+codesys_unlink+codenameiparent+codesys_link+codecreate+codeopen+codeO_CREATE+codemkdir+codemkdev+codesys_link+codecreate+codenameiparent+codedirlookup+codemkdir+codemkdev+codeT_FILE+codeialloc+code.
+code.
.
+codecreate+codesys_link+codesys_open+codesys_mkdir+codesys_mknod+codeopen+codeO_CREATE+codenamei+codesys_open+codeareonlyopenedforreading,notwriting.
Assumingtheinodewasobtainedonewayortheother,sys_openallocatesaleandaledescriptor(6432)andthenllsinthele(6442-6446).
Notethatnootherprocesscanaccessthepartiallyinitializedlesinceitisonlyinthecurrentprocess'stable.
Chapter5examinedtheimplementationofpipesbeforeweevenhadalesys-tem.
Thefunctionsys_pipeconnectsthatimplementationtothelesystembypro-vidingawaytocreateapipepair.
Itsargumentisapointertospacefortwointegers,whereitwillrecordthetwonewledescriptors.
Thenitallocatesthepipeandin-stallstheledescriptors.
RealworldThebuercacheinareal-worldoperatingsystemissignicantlymorecomplexthanxv6's,butitservesthesametwopurposes:cachingandsynchronizingaccesstothedisk.
Xv6'sbuercache,likeV6's,usesasimpleleastrecentlyused(LRU)evictionpolicy;therearemanymorecomplexpoliciesthatcanbeimplemented,eachgoodforsomeworkloadsandnotasgoodforothers.
AmoreecientLRUcachewouldelimi-natethelinkedlist,insteadusingahashtableforlookupsandaheapforLRUevic-tions.
Modernbuercachesaretypicallyintegratedwiththevirtualmemorysystemtosupportmemory-mappedles.
Xv6'sloggingsystemisinecient.
Acommitcannotoccurconcurrentlywithlesystemsystemcalls.
Thesystemlogsentireblocks,evenifonlyafewbytesinablockarechanged.
Itperformssynchronouslogwrites,ablockatatime,eachofwhichislikelytorequireanentirediskrotationtime.
Realloggingsystemsaddressalloftheseproblems.
Loggingisnottheonlywaytoprovidecrashrecovery.
Earlylesystemsusedascavengerduringreboot(forexample,theUNIXfsckprogram)toexamineeveryleanddirectoryandtheblockandinodefreelists,lookingforandresolvinginconsisten-cies.
Scavengingcantakehoursforlargelesystems,andtherearesituationswhereitisnotpossibletoresolveinconsistenciesinawaythatcausestheoriginalsystemcallstobeatomic.
Recoveryfromalogismuchfasterandcausessystemcallstobeatomicinthefaceofcrashes.
Xv6usesthesamebasicon-disklayoutofinodesanddirectoriesasearlyUNIX;thisschemehasbeenremarkablypersistentovertheyears.
BSD'sUFS/FFSandLinux'sext2/ext3useessentiallythesamedatastructures.
Themostinecientpartofthelesystemlayoutisthedirectory,whichrequiresalinearscanoverallthediskblocksdur-ingeachlookup.
Thisisreasonablewhendirectoriesareonlyafewdiskblocks,butisexpensivefordirectoriesholdingmanyles.
MicrosoftWindows'sNTFS,MacOSX'sHFS,andSolaris'sZFS,justtonameafew,implementadirectoryasanon-diskbal-ancedtreeofblocks.
Thisiscomplicatedbutguaranteeslogarithmic-timedirectorylookups.
Xv6isnaiveaboutdiskfailures:ifadiskoperationfails,xv6panics.
Whetherthisisreasonabledependsonthehardware:ifanoperatingsystemssitsatopspecialhard-warethatusesredundancytomaskdiskfailures,perhapstheoperatingsystemseesfailuressoinfrequentlythatpanickingisokay.
Ontheotherhand,operatingsystemsDRAFTasofSeptember4,201889https://pdos.
csail.
mit.
edu/6.
828/xv6sys_pipe+codefsck+codeusingplaindisksshouldexpectfailuresandhandlethemmoregracefully,sothatthelossofablockinoneledoesn'taecttheuseoftherestofthelesystem.
Xv6requiresthatthelesystemtononediskdeviceandnotchangeinsize.
Aslargedatabasesandmultimedialesdrivestoragerequirementseverhigher,operatingsystemsaredevelopingwaystoeliminatethe''onediskperlesystem''bottleneck.
Thebasicapproachistocombinemanydisksintoasinglelogicaldisk.
HardwaresolutionssuchasRAIDarestillthemostpopular,butthecurrenttrendismovingtowardim-plementingasmuchofthislogicinsoftwareaspossible.
Thesesoftwareimplementa-tionstypicallyallowrichfunctionalitylikegrowingorshrinkingthelogicaldevicebyaddingorremovingdisksonthey.
Ofcourse,astoragelayerthatcangroworshrinkontheyrequiresalesystemthatcandothesame:thexed-sizearrayofin-odeblocksusedbyxv6wouldnotworkwellinsuchenvironments.
Separatingdiskmanagementfromthelesystemmaybethecleanestdesign,butthecomplexinter-facebetweenthetwohasledsomesystems,likeSun'sZFS,tocombinethem.
Xv6'slesystemlacksmanyotherfeaturesofmodernlesystems;forexample,itlackssupportforsnapshotsandincrementalbackup.
ModernUnixsystemsallowmanykindsofresourcestobeaccessedwiththesamesystemcallsason-diskstorage:namedpipes,networkconnections,remotely-ac-cessednetworklesystems,andmonitoringandcontrolinterfacessuchas/proc.
In-steadofxv6'sifstatementsinfilereadandfilewrite,thesesystemstypicallygiveeachopenleatableoffunctionpointers,oneperoperation,andcallthefunctionpointertoinvokethatinode'simplementationofthecall.
Networklesystemsandus-er-levellesystemsprovidefunctionsthatturnthosecallsintonetworkRPCsandwaitfortheresponsebeforereturning.
Exercises1.
WhypanicinballocCanxv6recover2.
WhypaniciniallocCanxv6recover3.
Whydoesn'tfileallocpanicwhenitrunsoutoflesWhyisthismorecommonandthereforeworthhandling4.
Supposethelecorrespondingtoipgetsunlinkedbyanotherprocessbetweensys_link'scallstoiunlock(ip)anddirlink.
WillthelinkbecreatedcorrectlyWhyorwhynot6.
createmakesfourfunctioncalls(onetoiallocandthreetodirlink)thatitrequirestosucceed.
Ifanydoesn't,createcallspanic.
WhyisthisacceptableWhycan'tanyofthosefourcallsfail7.
sys_chdircallsiunlock(ip)beforeiput(cp->cwd),whichmighttrytolockcp->cwd,yetpostponingiunlock(ip)untilaftertheiputwouldnotcausedeadlocks.
Whynot8.
Implementthelseeksystemcall.
Supportinglseekwillalsorequirethatyoumodifyfilewritetollholesinthelewithzeroiflseeksetsoffbeyondf->ip->size.
9.
AddO_TRUNCandO_APPENDtoopen,sothat>and>>operatorsworkintheshell.
DRAFTasofSeptember4,201890https://pdos.
csail.
mit.
edu/6.
828/xv6leread+codelewrite+code10.
Modifythelesystemtosupportsymboliclinks.
DRAFTasofSeptember4,201891https://pdos.
csail.
mit.
edu/6.
828/xv6Chapter7SummaryThistextintroducedthemainideasinoperatingsystemsbystudyingoneoperatingsystem,xv6,linebyline.
Somecodelinesembodytheessenceofthemainideas(e.
g.
,contextswitching,user/kernelboundary,locks,etc.
)andeachlineisimportant;othercodelinesprovideanillustrationofhowtoimplementaparticularoperatingsystemideaandcouldeasilybedoneindierentways(e.
g.
,abetteralgorithmforscheduling,betteron-diskdatastructurestorepresentles,betterloggingtoallowforconcurrenttransactions,etc.
).
Alltheideaswereillustratedinthecontextofoneparticular,verysuccessfulsystemcallinterface,theUnixinterface,butthoseideascarryovertothedesignofotheroperatingsystems.
DRAFTasofSeptember4,201893https://pdos.
csail.
mit.
edu/6.
828/xv6AppendixAPChardwareThisappendixdescribespersonalcomputer(PC)hardware,theplatformonwhichxv6runs.
APCisacomputerthatadherestoseveralindustrystandards,withthegoalthatagivenpieceofsoftwarecanrunonPCssoldbymultiplevendors.
ThesestandardsevolveovertimeandaPCfrom1990sdoesn'tlooklikeaPCnow.
Manyofthecur-rentstandardsarepublicandyoucannddocumentationforthemonline.
FromtheoutsideaPCisaboxwithakeyboard,ascreen,andvariousdevices(e.
g.
,CD-ROM,etc.
).
Insidetheboxisacircuitboard(the''motherboard'')withCPUchips,memorychips,graphicchips,I/Ocontrollerchips,andbussesthroughwhichthechipscommunicate.
Thebussesadheretostandardprotocols(e.
g.
,PCIandUSB)sothatdeviceswillworkwithPCsfrommultiplevendors.
Fromourpointofview,wecanabstractthePCintothreecomponents:CPU,memory,andinput/output(I/O)devices.
TheCPUperformscomputation,thememo-rycontainsinstructionsanddataforthatcomputation,anddevicesallowtheCPUtointeractwithhardwareforstorage,communication,andotherfunctions.
YoucanthinkofmainmemoryasconnectedtotheCPUwithasetofwires,orlines,someforaddressbits,somefordatabits,andsomeforcontrolags.
Toreadavaluefrommainmemory,theCPUsendshighorlowvoltagesrepresenting1or0bitsontheaddresslinesanda1onthe''read''lineforaprescribedamountoftimeandthenreadsbackthevaluebyinterpretingthevoltagesonthedatalines.
Towriteavaluetomainmemory,theCPUsendsappropriatebitsontheaddressanddatalinesanda1onthe''write''lineforaprescribedamountoftime.
Realmemoryinterfacesaremorecomplexthanthis,butthedetailsareonlyimportantifyouneedtoachievehighperformance.
ProcessorandmemoryAcomputer'sCPU(centralprocessingunit,orprocessor)runsaconceptuallysim-pleloop:itconsultsanaddressinaregistercalledtheprogramcounter,readsama-chineinstructionfromthataddressinmemory,advancestheprogramcounterpasttheinstruction,andexecutestheinstruction.
Repeat.
Iftheexecutionoftheinstructiondoesnotmodifytheprogramcounter,thisloopwillinterpretthememorypointedatbytheprogramcounterasasequenceofmachineinstructionstorunoneaftertheother.
Instructionsthatdochangetheprogramcounterincludebranchesandfunctioncalls.
Theexecutionengineisuselesswithouttheabilitytostoreandmodifyprogramdata.
Thefasteststoragefordataisprovidedbytheprocessor'sregisterset.
Aregisterisastoragecellinsidetheprocessoritself,capableofholdingamachineword-sizedDRAFTasofSeptember4,201895https://pdos.
csail.
mit.
edu/6.
828/xv6programcountervalue(typically16,32,or64bits).
Datastoredinregisterscantypicallybereadorwrittenquickly,inasingleCPUcycle.
PCshaveaprocessorthatimplementsthex86instructionset,whichwasoriginal-lydenedbyIntelandhasbecomeastandard.
Severalmanufacturersproduceproces-sorsthatimplementtheinstructionset.
LikeallotherPCstandards,thisstandardisalsoevolvingbutnewerstandardsarebackwardscompatiblewithpaststandards.
ThebootloaderhastodealwithsomeofthisevolutionbecauseeveryPCprocessorstartssimulatinganIntel8088,theCPUchipintheoriginalIBMPCreleasedin1981.
However,formostofxv6youwillbeconcernedwiththemodernx86instructionset.
Themodernx86provideseightgeneralpurpose32-bitregisters—%eax,%ebx,%ecx,%edx,%edi,%esi,%ebp,and%esp—andaprogramcounter%eip(theinstruc-tionpointer).
Thecommoneprexstandsforextended,astheseare32-bitextensionsofthe16-bitregisters%ax,%bx,%cx,%dx,%di,%si,%bp,%sp,and%ip.
Thetworegis-tersetsarealiasedsothat,forexample,%axisthebottomhalfof%eax:writingto%axchangesthevaluestoredin%eaxandviceversa.
Therstfourregistersalsohavenamesforthebottomtwo8-bitbytes:%aland%ahdenotethelowandhigh8bitsof%ax;%bl,%bh,%cl,%ch,%dl,and%dhcontinuethepattern.
Inadditiontothesereg-isters,thex86haseight80-bitoating-pointregistersaswellasahandfulofspecial-purposeregisterslikethecontrolregisters%cr0,%cr2,%cr3,and%cr4;thedebugregis-ters%dr0,%dr1,%dr2,and%dr3;thesegmentregisters%cs,%ds,%es,%fs,%gs,and%ss;andtheglobalandlocaldescriptortablepseudo-registers%gdtrand%ldtr.
Thecontrolregistersandsegmentregistersareimportanttoanyoperatingsystem.
Theoating-pointanddebugregistersarelessinterestingandnotusedbyxv6.
Registersarefastbutexpensive.
Mostprocessorsprovideatmostafewtensofgeneral-purposeregisters.
Thenextconceptuallevelofstorageisthemainrandom-ac-cessmemory(RAM).
Mainmemoryis10-100xslowerthanaregister,butitismuchcheaper,sotherecanbemoreofit.
Onereasonmainmemoryisrelativelyslowisthatitisphysicallyseparatefromtheprocessorchip.
Anx86processorhasafewdozenregisters,butatypicalPCtodayhasgigabytesofmainmemory.
Becauseoftheenor-mousdierencesinbothaccessspeedandsizebetweenregistersandmainmemory,mostprocessors,includingthex86,storecopiesofrecently-accessedsectionsofmainmemoryinon-chipcachememory.
Thecachememoryservesasamiddlegroundbe-tweenregistersandmemorybothinaccesstimeandinsize.
Today'sx86processorstypicallyhavethreelevelsofcache.
Eachcorehasasmallrst-levelcachewithaccesstimesrelativelyclosetotheprocessor'sclockrateandalargersecond-levelcache.
Sev-eralcoresshareanL3cache.
FigureA-1showsthelevelsinthememoryhierarchyandtheiraccesstimesforanInteli7Xeonprocessor.
Forthemostpart,x86processorshidethecachefromtheoperatingsystem,sowecanthinkoftheprocessorashavingjusttwokindsofstorage—registersandmemo-ry—andnotworryaboutthedistinctionsbetweenthedierentlevelsofthememoryhierarchy.
I/OProcessorsmustcommunicatewithdevicesaswellasmemory.
Thex86processorDRAFTasofSeptember4,201896https://pdos.
csail.
mit.
edu/6.
828/xv6instructionpointercontrolregisterssegmentregistersIntelCorei7Xeon5500at2.
4GHzMemoryAccesstimeSizeregister1cycle64bytesL1cache~4cycles64kilobytesL2cache~10cycles4megabytesL3cache~40-75cycles8megabytesremoteL3~100-300cyclesLocalDRAM~60nsecRemoteDRAM~100nsecFigureA-1.
LatencynumbersforanInteli7Xeonsystem,basedonhttp://software.
intel.
com/sites/products/collateral/hpc/vtune/performance_analysis_guide.
pdf.
providesspecialinandoutinstructionsthatreadandwritevaluesfromdevicead-dressescalledI/Oports.
Thehardwareimplementationoftheseinstructionsisessen-tiallythesameasreadingandwritingmemory.
Earlyx86processorshadanextraad-dressline:0meantread/writefromanI/Oportand1meantread/writefrommainmemory.
Eachhardwaredevicemonitorstheselinesforreadsandwritestoitsas-signedrangeofI/Oports.
Adevice'sportsletthesoftwarecongurethedevice,exam-ineitsstatus,andcausethedevicetotakeactions;forexample,softwarecanuseI/Oportreadsandwritestocausethediskinterfacehardwaretoreadandwritesectorsonthedisk.
Manycomputerarchitectureshavenoseparatedeviceaccessinstructions.
Insteadthedeviceshavexedmemoryaddressesandtheprocessorcommunicateswiththedevice(attheoperatingsystem'sbehest)byreadingandwritingvaluesatthosead-dresses.
Infact,modernx86architecturesusethistechnique,calledmemory-mappedI/O,formosthigh-speeddevicessuchasnetwork,disk,andgraphicscontrollers.
Forreasonsofbackwardscompatibility,though,theoldinandoutinstructionslinger,asdolegacyhardwaredevicesthatusethem,suchastheIDEdiskcontroller,whichxv6uses.
DRAFTasofSeptember4,201897https://pdos.
csail.
mit.
edu/6.
828/xv6I/Oportsmemory-mappedI/OCPUSegmentTranslationRAM0xGBSelectorOffsetLinearAddressPhysicalAddressLogicalAddressPageTranslationFigureB-1.
Therelationshipbetweenlogical,linear,andphysicaladdresses.
logicaladdresslinearaddressphysicaladdressAppendixBThebootloaderWhenanx86PCboots,itstartsexecutingaprogramcalledtheBIOS(BasicIn-put/OutputSystem),whichisstoredinnon-volatilememoryonthemotherboard.
TheBIOS'sjobistopreparethehardwareandthentransfercontroltotheoperatingsys-tem.
Specically,ittransferscontroltocodeloadedfromthebootsector,therst512-bytesectorofthebootdisk.
Thebootsectorcontainsthebootloader:instruc-tionsthatloadthekernelintomemory.
TheBIOSloadsthebootsectoratmemoryaddress0x7c00andthenjumps(setstheprocessor's%ip)tothataddress.
Whenthebootloaderbeginsexecuting,theprocessorissimulatinganIntel8088,andtheloader'sjobistoputtheprocessorinamoremodernoperatingmode,toloadthexv6kernelfromdiskintomemory,andthentotransfercontroltothekernel.
Thexv6bootload-ercomprisestwosourceles,onewritteninacombinationof16-bitand32-bitx86assembly(bootasm.
S;(9100))andonewritteninC(bootmain.
c;(9200)).
Code:AssemblybootstrapTherstinstructioninthebootloaderiscli(9112),whichdisablesprocessorin-terrupts.
Interruptsareawayforhardwaredevicestoinvokeoperatingsystemfunc-tionscalledinterrupthandlers.
TheBIOSisatinyoperatingsystem,anditmighthavesetupitsowninterrupthandlersaspartoftheinitializingthehardware.
ButtheBIOSisn'trunninganymore—thebootloaderis—soitisnolongerappropriateorsafetohandleinterruptsfromhardwaredevices.
Whenxv6isready(inChapter3),itwillre-enableinterrupts.
Theprocessorisinrealmode,inwhichitsimulatesanIntel8088.
Inrealmodethereareeight16-bitgeneral-purposeregisters,buttheprocessorsends20bitsofad-dresstomemory.
Thesegmentregisters%cs,%ds,%es,and%ssprovidetheadditionalbitsnecessarytogenerate20-bitmemoryaddressesfrom16-bitregisters.
Whenapro-gramreferstoamemoryaddress,theprocessorautomaticallyadds16timesthevalueofoneofthesegmentregisters;theseregistersare16bitswide.
Whichsegmentregis-terisusuallyimplicitinthekindofmemoryreference:instructionfetchesuse%cs,datareadsandwritesuse%ds,andstackreadsandwritesuse%ss.
DRAFTasofSeptember4,201899https://pdos.
csail.
mit.
edu/6.
828/xv6bootloaderrealmodeXv6pretendsthatanx86instructionusesavirtualaddressforitsmemoryoperands,butanx86instructionactuallyusesalogicaladdress(seeFigureB-1).
Alogicaladdressconsistsofasegmentselectorandanoset,andissometimeswrittenassegment:oset.
Moreoften,thesegmentisimplicitandtheprogramonlydirectlymanipulatestheoset.
Thesegmentationhardwareperformsthetranslationdescribedabovetogeneratealinearaddress.
Ifthepaginghardwareisenabled(seeChapter2),ittranslateslinearaddressestophysicaladdresses;otherwisetheprocessoruseslinearad-dressesasphysicaladdresses.
Thebootloaderdoesnotenablethepaginghardware;thelogicaladdressesthatitusesaretranslatedtolinearaddressesbythesegmentationharware,andthenuseddi-rectlyasphysicaladdresses.
Xv6conguresthesegmentationhardwaretotranslatelogicaltolinearaddresseswithoutchange,sothattheyarealwaysequal.
Forhistoricalreasonswehaveusedthetermvirtualaddresstorefertoaddressesmanipulatedbyprograms;anxv6virtualaddressisthesameasanx86logicaladdress,andisequaltothelinearaddresstowhichthesegmentationhardwaremapsit.
Oncepagingisen-abled,theonlyinterestingaddressmappinginthesystemwillbelineartophysical.
TheBIOSdoesnotguaranteeanythingaboutthecontentsof%ds,%es,%ss,sorstorderofbusinessafterdisablinginterruptsistoset%axtozeroandthencopythatzerointo%ds,%es,and%ss(9115-9118).
Avirtualsegment:osetcanyielda21-bitphysicaladdress,buttheIntel8088couldonlyaddress20bitsofmemory,soitdiscardedthetopbit:0xffff0+0xffff=0x10ffef,butvirtualaddress0xffff:0xffffonthe8088referredtophysicaladdress0x0ffef.
Someearlysoftwarereliedonthehardwareignoringthe21staddressbit,sowhenIntelintroducedprocessorswithmorethan20bitsofphysicaladdress,IBMpro-videdacompatibilityhackthatisarequirementforPC-compatiblehardware.
Ifthesecondbitofthekeyboardcontroller'soutputportislow,the21stphysicaladdressbitisalwayscleared;ifhigh,the21stbitactsnormally.
Thebootloadermustenablethe21staddressbitusingI/Otothekeyboardcontrolleronports0x64and0x60(9120-9136).
Realmode's16-bitgeneral-purposeandsegmentregistersmakeitawkwardforaprogramtousemorethan65,536bytesofmemory,andimpossibletousemorethanamegabyte.
x86processorssincethe80286haveaprotectedmode,whichallowsphysi-caladdressestohavemanymorebits,and(sincethe80386)a''32-bit''modethatcaus-esregisters,virtualaddresses,andmostintegerarithmetictobecarriedoutwith32bitsratherthan16.
Thexv6bootsequenceenablesprotectedmodeand32-bitmodeasfollows.
Inprotectedmode,asegmentregisterisanindexintoasegmentdescriptortable(seeFigureB-2).
Eachtableentryspeciesabasephysicaladdress,amaximumvirtualaddresscalledthelimit,andpermissionbitsforthesegment.
Thesepermissionsaretheprotectioninprotectedmode:thekernelcanusethemtoensurethataprogramusesonlyitsownmemory.
xv6makesalmostnouseofsegments;itusesthepaginghardwareinstead,asChapter2describes.
Thebootloadersetsupthesegmentdescriptortablegdt(9182-9185)sothatallsegmentshaveabaseaddressofzeroandthemaximumpossiblelimit(fourgigabytes).
Thetablehasanullentry,oneentryforexecutablecode,andoneen-DRAFTasofSeptember4,2018100https://pdos.
csail.
mit.
edu/6.
828/xv6bootloaderlogicaladdresslinearaddressvirtualaddressprotectedmodesegmentdescriptortablegdt+codeSelector16Offset32LogicalAddressLinearAddressBaseLimitFlags3220120816GDT/LDTFigureB-2.
Segmentsinprotectedmode.
protectedmodetrytodata.
Thecodesegmentdescriptorhasaagsetthatindicatesthatthecodeshouldrunin32-bitmode(0660).
Withthissetup,whenthebootloaderentersprotect-edmode,logicaladdressesmapone-to-onetophysicaladdresses.
Thebootloaderexecutesanlgdtinstruction(9141)toloadtheprocessor'sglobaldescriptortable(GDT)registerwiththevaluegdtdesc(9187-9189),whichpointstothetablegdt.
OnceithasloadedtheGDTregister,thebootloaderenablesprotectedmodebysettingthe1bit(CR0_PE)inregister%cr0(9142-9144).
Enablingprotectedmodedoesnotimmediatelychangehowtheprocessortranslateslogicaltophysicaladdresses;itisonlywhenoneloadsanewvalueintoasegmentregisterthattheprocessorreadstheGDTandchangesitsinternalsegmentationsettings.
Onecannotdirectlymodify%cs,soinsteadthecodeexecutesanljmp(farjump)instruction(9153),whichallowsacodesegmentselectortobespecied.
Thejumpcontinuesexecutionatthenextline(9156)butindoingsosets%cstorefertothecodedescriptorentryingdt.
Thatdescriptordescribesa32-bitcodesegment,sotheprocessorswitchesinto32-bitmode.
Thebootloaderhasnursedtheprocessorthroughanevolutionfrom8088through80286to80386.
Thebootloader'srstactionin32-bitmodeistoinitializethedatasegmentreg-isterswithSEG_KDATA(9158-9161).
Logicaladdressnowmapdirectlytophysicalad-dresses.
TheonlystepleftbeforeexecutingCcodeistosetupastackinanunusedregionofmemory.
Thememoryfrom0xa0000to0x100000istypicallylitteredwithdevicememoryregions,andthexv6kernelexpectstobeplacedat0x100000.
Thebootloaderitselfisat0x7c00through0x7e00(512bytes).
Essentiallyanyothersec-tionofmemorywouldbeanelocationforthestack.
Thebootloaderchooses0x7c00(knowninthisleas$start)asthetopofthestack;thestackwillgrowdownfromthere,toward0x0000,awayfromthebootloader.
FinallythebootloadercallstheCfunctionbootmain(9168).
Bootmain'sjobistoloadandrunthekernel.
Itonlyreturnsifsomethinghasgonewrong.
Inthatcase,thecodesendsafewoutputwordsonport0x8a00(9170-9176).
Onrealhardware,thereisnodeviceconnectedtothatport,sothiscodedoesnothing.
IfthebootloaderisrunninginsideaPCsimulator,port0x8a00isconnectedtothesimulatoritselfandcantransfercontrolbacktothesimulator.
Simulatorornot,thecodethenexecutesaninniteloop(9177-9178).
Arealbootloadermightattempttoprintanerrormessagerst.
DRAFTasofSeptember4,2018101https://pdos.
csail.
mit.
edu/6.
828/xv6bootloaderglobaldescriptortablegdtdesc+codegdt+codeCR0_PE+codegdt+codeSEG_KDATA+codebootmain+codeCode:CbootstrapTheCpartofthebootloader,bootmain.
c(9200),expectstondacopyofthekernelexecutableonthediskstartingatthesecondsector.
ThekernelisanELFfor-matbinary,aswehaveseeninChapter2.
TogetaccesstotheELFheaders,bootmainloadstherst4096bytesoftheELFbinary(9214).
Itplacesthein-memorycopyatad-dress0x10000.
ThenextstepisaquickcheckthatthisprobablyisanELFbinary,andnotanuninitializeddisk.
Bootmainreadsthesection'scontentstartingfromthedisklocationoffbytesafterthestartoftheELFheader,andwritestomemorystartingataddresspaddr.
Bootmaincallsreadsegtoloaddatafromdisk(9238)andcallsstosbtozerotheremainderofthesegment(9240).
Stosb(0492)usesthex86instructionrepstosbtoinitializeeverybyteofablockofmemory.
Thekernelhasbeencompiledandlinkedsothatitexpectstonditselfatvirtualaddressesstartingat0x80100000.
Thus,functioncallinstructionsmustmentiondesti-nationaddressesthatlooklike0x801xxxxx;youcanseeexamplesinkernel.
asm.
Thisaddressisconguredinkernel.
ld(9311).
0x80100000isarelativelyhighad-dress,towardstheendofthe32-bitaddressspace;Chapter2explainsthereasonsforthischoice.
Theremaynotbeanyphysicalmemoryatsuchahighaddress.
Oncethekernelstartsexecuting,itwillsetupthepaginghardwaretomapvirtualaddressesstartingat0x80100000tophysicaladdressesstartingat0x00100000;thekernelas-sumesthatthereisphysicalmemoryatthisloweraddress.
Atthispointinthebootprocess,however,pagingisnotenabled.
Instead,kernel.
ldspeciesthattheELFpaddrstartat0x00100000,whichcausesthebootloadertocopythekerneltothelowphysicaladdressestowhichthepaginghardwarewilleventuallypoint.
Thebootloader'snalstepistocallthekernel'sentrypoint,whichistheinstruc-tionatwhichthekernelexpectstostartexecuting.
Forxv6theentryaddressis0x10000c:#objdump-fkernelkernel:fileformatelf32-i386architecture:i386,flags0x00000112:EXEC_P,HAS_SYMS,D_PAGEDstartaddress0x0010000cByconvention,the_startsymbolspeciestheELFentrypoint,whichisdenedintheleentry.
S(1040).
Sincexv6hasn'tsetupvirtualmemoryyet,xv6'sentrypointisthephysicaladdressofentry(1044).
RealworldThebootloaderdescribedinthisappendixcompilestoaround470bytesofma-chinecode,dependingontheoptimizationsusedwhencompilingtheCcode.
Inor-dertotinthatsmallamountofspace,thexv6bootloadermakesamajorsimplify-ingassumption,thatthekernelhasbeenwrittentothebootdiskcontiguouslystartingatsector1.
Morecommonly,kernelsarestoredinordinarylesystems,wheretheymaynotbecontiguous,orareloadedoveranetwork.
ThesecomplicationsrequiretheDRAFTasofSeptember4,2018102https://pdos.
csail.
mit.
edu/6.
828/xv6readseg+codestosb+code_start+codeentry+codebootloadertobeabletodriveavarietyofdiskandnetworkcontrollersandunder-standvariouslesystemsandnetworkprotocols.
Inotherwords,thebootloaderitselfmustbeasmalloperatingsystem.
Sincesuchcomplicatedbootloaderscertainlywon'ttin512bytes,mostPCoperatingsystemsuseatwo-stepbootprocess.
First,asim-plebootloaderliketheoneinthisappendixloadsafull-featuredboot-loaderfromaknowndisklocation,oftenrelyingonthelessspace-constrainedBIOSfordiskaccessratherthantryingtodrivethediskitself.
Thenthefullloader,relievedofthe512-bytelimit,canimplementthecomplexityneededtolocate,load,andexecutethedesiredkernel.
ModernPCsavoidmanyoftheabovecomplexities,becausetheysupporttheUniedExtensibleFirmwareInterface(UEFI),whichallowsthePCtoreadalargerbootloaderfromthedisk(andstartitinprotectedand32-bitmode).
ThisappendixiswrittenasiftheonlythingthathappensbetweenpoweronandtheexecutionofthebootloaderisthattheBIOSloadsthebootsector.
InfacttheBIOSdoesahugeamountofinitializationinordertomakethecomplexhardwareofamoderncomputerlooklikeatraditionalstandardPC.
TheBIOSisreallyasmalloperatingsystemembeddedinthehardware,whichispresentafterthecomputerhasbooted.
Exercises1.
Duetosectorgranularity,thecalltoreadseginthetextisequivalenttoread-seg((uchar*)0x100000,0xb500,0x1000).
Inpractice,thissloppybehaviorturnsoutnottobeaproblemWhydoesn'tthesloppyreadsectcauseproblems2.
Supposeyouwantedbootmain()toloadthekernelat0x200000insteadof0x100000,andyoudidsobymodifyingbootmain()toadd0x100000tothevaofeachELFsection.
Somethingwouldgowrong.
What3.
ItseemspotentiallydangerousforthebootloadertocopytheELFheadertomem-oryatthearbitrarylocation0x10000.
Whydoesn'titcallmalloctoobtainthememo-ryitneedsDRAFTasofSeptember4,2018103https://pdos.
csail.
mit.
edu/6.
828/xv6Index.
,86,88.
.
,86,88/init,27,35_binary_initcode_size,25_binary_initcode_start,25_start,102absorption,80acquire,54,57addl,26addressspace,20allocproc,23allocuvm,26,35–36alltraps,42–43argc,36argfd,45argint,45argptr,45argstr,45argv,36atomic,54B_DIRTY,47–48,77–78B_VALID,47–48,77balloc,81,83batch,49batching,79bcache.
head,77begin_op,80bfree,81bget,77binit,77block,47bmap,85bootloader,22,99–101bootmain,101bread,76,78brelse,76,78BSIZE,85buf,76busywaiting,48,66bwrite,76,78,80chan,66,69childprocess,8cli,46commit,78conditionalsynchronization,65contexts,62controlregisters,96convoys,72copyout,36coroutines,64cp->tf,44cpu->scheduler,26,62–63CR0_PE,101CR0_PG,23CR_PSE,37crashrecovery,75create,88criticalsection,53currentdirectory,14deadlocked,67directblocks,85dirlink,86dirlookup,85–86,88DIRSIZ,85DPL_USER,25,42driver,46dup,87ELFformat,35ELF_MAGIC,35EMBRYO,23end_op,80entry,22–23,102entrypgdir,23exception,39exec,9–11,26,36,42exit,8,27,63–64,71fetchint,45ledescriptor,10filealloc,87fileclose,87filedup,87fileread,87,90filestat,87filewrite,80,87,90FL_IF,25fork,8,10–11,87forkret,24,26,64freerange,33fsck,89ftable,87gdt,100–101gdtdesc,101getcmd,10globaldescriptortable,101groupcommit,79I/Oports,97ialloc,83,88IDE_BSY,47IDE_DRDY,47IDE_IRQ,47ideinit,47ideintr,48,56idelock,55–56iderw,47–48,55–56,77–78idestart,48idewait,47idt,42idtinit,46IF,42,46iget,82–83,86ilock,82–83,86indirectblock,85initcode,27initcode.
S,25–26,41initlog,80initproc,26inituvm,25inode,15,75,81insl,48install_trans,80instructionpointer,96int,40–42interfacedesign,7interrupt,39interrupthandler,40ioapicenable,47iput,82–83iret,26,41,44IRQ_TIMER,,46isolation,17itrunc,83,85iunlock,83kalloc,34KERNBASE,23kernel,7,19kernelmode,18,40kernelspace,7,19kfree,33DRAFTasofSeptember4,2018105https://pdos.
csail.
mit.
edu/6.
828/xv6kinit1,33kinit2,33kmap,32kvmalloc,30,32lapicinit,46linearaddress,99–100links,15loaduvm,35lock,51log,78log_write,80logicaladdress,99–100main,23,26,32–33,42,47,77malloc,10mappages,32memory-mappedI/O,97mfks,76microkernel,19–20mkdev,88mkdir,88monolithickernel,17,19mpmain,25multiplexing,61mutualexclusion,53mycpu,65myproc,65namei,25,35,88nameiparent,86,88namex,86NBUF,77NDIRECT,84–85NINDIRECT,85O_CREATE,88open,87–88p->context,24,26,64p->cwd,25p->kstack,21,71p->name,25p->pgdir,22,71p->state,22p->sz,45p->xxx,21page,29pagedirectory,29pagetableentries(PTEs),29pagetablepages,29panic,44parentprocess,8path,14persistence,75PGROUNDUP,33physicaladdress,20,99PHYSTOP,32–33pid,8,23pipe,13piperead,70pipewrite,70polling,48,66popal,26popcli,57popl,26printf,9priorityinversion,72privilegedinstructions,18proc->killed,44process,7–8,20programcounter,95protectedmode,100–101ptable,56ptable.
lock,63–64,69PTE_P,29PTE_U,26,30,32PTE_W,30pushcli,57racecondition,52read,87readi,35,85readseg,102realmode,99recover_from_log,80recursivelocks,58release,54,57ret,26root,14roundrobin,72RUNNABLE,25,64,68–70sbrk,10,34sched,62–64,68,71scheduler,25,63–64sector,47SEG_KDATA,101SEG_TSS,25SEG_UCODE,25SEG_UDATA,25seginit,37segmentdescriptortable,100segmentregisters,96sequencecoordination,65serializing,53setupkvm,25,32,35shell,8signal,73skipelem,86sleep,63,66,68sleep-locks,58SLEEPING,68–69stat,85,87stati,85,87sti,46stosb,102structbuf,47structcontext,62structcpu,65structdinode,81,84structdirent,85structelfhdr,35structfile,87structinode,82structpipe,70structproc,21,71structrun,33structspinlock,54structtrapframe,25superblock,76switchuvm,25,42,46,64swtch,25–26,62–64,71SYS_exec,26,44sys_exec,42sys_link,88sys_mkdir,88sys_mknod,88sys_open,88sys_pipe,89sys_sleep,56sys_unlink,88syscall,44systemcall,7T_DEV,85T_DIR,85T_FILE,88T_SYSCALL,26,42,44tf->trapno,44thread,21thunderingherd,73ticks,56tickslock,56time-share,8,17transaction,75TranslationLook-asideBuer(TLB),35,59DRAFTasofSeptember4,2018106https://pdos.
csail.
mit.
edu/6.
828/xv6trap,40trap,43–44,46,48,62trapret,24,26,44tvinit,42typecast,33unlink,79usermemory,20usermode,18,40userspace,7,19userinit,24–26ustack,36V2P_WO,23vectors[i],42virtualaddress,20,100waitchannel,66wait,8–9,64,71wakeup,46,56,66,68–69wakeup1,69walkpgdir,32,35write,79,87writei,81,85xchg,54,57yield,62–64ZOMBIE,71DRAFTasofSeptember4,2018107https://pdos.
csail.
mit.
edu/6.
828/xv6
活动方案:美国洛杉矶 E5 2696V2 2核4G20M带宽100G流量20元/月美国洛杉矶E5 2696V2 2核4G100M带宽1000G流量99元/季香港CN2 E5 2660V2 2核2G30M CN2500G流量119元/季日本CN2E5 2660 2核2G30M CN2 500G流量119元/季美国300G高防 真实防御E5 2696V2 2核2G30M...
Digital-VM商家目前也在凑热闹的发布六月份的活动,他们家的机房蛮多的有提供8个数据中心,包括日本、洛杉矶、新加坡等。这次六月份的促销活动全场VPS主机六折优惠。Digital-VM商家还是有一点点特点的,有提供1Gbps和10Gbps带宽的VPS主机,如果有需要大带宽的VPS主机可以看看。第一、商家优惠码优惠码:June40全场主机六折优惠,不过仅可以月付、季付。第二、商家VPS主机套餐1...
georgedatacenter怎么样?georgedatacenter这次其实是两个促销,一是促销一款特价洛杉矶E3-1220 V5独服,性价比其实最高;另外还促销三款特价vps,大家可以根据自己的需要入手。georgedatacenter是一家成立于2019年的美国vps商家,主营美国洛杉矶、芝加哥、达拉斯、新泽西、西雅图机房的VPS、邮件服务器和托管独立服务器业务。georgedatacen...
userinit为你推荐
2020双十一成绩单如何查找2020年小考六年级的成绩?mathplayer如何学好理科嘉兴商标注册我想注册个商标怎么注册啊?同ip站点同IP网站具体是什么意思,能换独立的吗5xoy.comhttp://www.5yau.com (舞与伦比),以前是这个地址,后来更新了,很长时间没玩了,谁知道现在的地址? 谢谢,www.niuav.com在那能找到免费高清电影网站呢 ?www.baitu.com韩国片爱人.欲望的观看地址百度指数词什么是百度指数sesehu.com68lolita com是真的吗bbs2.99nets.com让(bbs www)*****.cn进入同一个站
万网域名空间 贝锐花生壳域名 lamp blackfriday 美国主机推荐 windows主机 wordpress技巧 光棍节日志 realvnc win8.1企业版升级win10 标准机柜尺寸 dd444 cpanel空间 腾讯云分析 howfile 129邮箱 服务器是干什么的 空间合租 空间登入 丽萨 更多