powerpagedefrag

pagedefrag  时间:2021-02-21  阅读:()
DocumentNumber:340608-001USUtilizingLinuxSwapwithIntelOptaneDCSSDsasaMemoryOvercommitTechniqueSolutionsBlueprintJune2019Version1TeamContacts:AndrzejJakowskiandrzej.
jakowski@intel.
comKernelDevelopmentTimC.
Chentim.
c.
chen@intel.
comKernelDevelopmentYingHuangying.
huang@intel.
comKernelDevelopmentFrankOberfrank.
ober@intel.
comTestingandOutreachDavidJ.
Leonedavid.
j.
leone@intel.
comTestingandOutreachAndrewRuffinandrew.
ruffin@intel.
comMarketAnalysisandOutreachPragathiNarendrapragathi.
narendra@intel.
comPerformanceTestandTestDevelopmentMariuszBarczakmariusz.
barczak@intel.
comKernelDevelopmentGertPauwelsgert.
pauwels@intel.
comFieldTechnicalSupportEMEARegionStevenBriscoesteven.
briscoe@intel.
comFieldTechnicalSupportEMEARegionFaribKhondokerfarib.
khondoker@intel.
comTestingandSupportUtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune20192340608-001USRevisionHistoryRevisionNumberDescriptionRevisionDate001Initialrelease.
June2019Inteltechnologies'featuresandbenefitsdependonsystemconfigurationandmayrequireenabledhardware,softwareorserviceactivation.
Performancevariesdependingonsystemconfiguration.
Noproductorcomponentcanbeabsolutelysecure.
Checkwithyoursystemmanufacturerorretailerorlearnmoreatintel.
com.
Noproductorcomponentcanbeabsolutelysecure.
Intel,theIntellogo,Optane,andXeonaretrademarksofIntelCorporationoritssubsidiariesintheU.
S.
and/orothercountries.
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
IntelCorporationUtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US3ContentsIntroduction4Scope.
4MemoryOvercommitUseCases.
6ExampleServerCostModel7TheKernelBuildProcess.
8DevelopmentToolsRequiredformenuconfig(PossiblePre-requisites)8AppendixAAutomationScriptsandHow-toGuide16AppendixBMemoryManagementFundamentals18B.
1MemoryManagementSystemOverview18AppendixCLinuxKernelInnovationstoLeverageFastSSDsasMemoryExtension20C.
1SwapImprovementsCompletedinv4.
14ofLinuxKernel21AppendixDSwapImprovementsPatchLists23D.
1References.
24UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune20194340608-001USIntroductionThissolutionsblueprintexplainshowtouseIntelOptaneDCSSDsinmemoryextensionconfigurations,orasmemoryreplacement.
We'lldescriberecentperformanceimprovementsthatwerefirstintroducedinversion4.
11andcompletedinversion4.
14oftheLinux*kernel.
Forsimplicity,wewillrefertoversion4.
14ornewer,asthekernelversionneededtoevaluatehighperformanceswapusage.
VeryhighenduranceandlowlatencydeviceslikeIntelOptaneDCSSDscanbeefficientlyusedasswapdevices,therebyenablingthesystemtoexceeditsminimumrequiredsystemlevelperformanceinvariousmemoryovercommitusecases.
IntelOptaneSSDsusedasswapdevicesareexpectedtohavealonglifespanoffiveormoreyearsinthisusage.
Forthosewhointendtoimmediatelyimplementandtesttheusecasesoutlinedinthisdocument,pleasejumptotheAppendixsections,andvisitthefollowingGitHublinkfortools,instructions,andtestcode.
http://github.
com/fxober/LinuxSwapScopeWewillfocusonhowtheLinuxoperatingsystem(OS)canutilizeIntelOptaneDCSSDsasswapdevices,therebyallowingstoragedevicecapacitytobeusedinconjunctionwithDRAMtostorememorypagesonbothDRAMandnon-volatilememorytypemedia.
Theprocessofmovingmemorypagesbetweenthestoragedeviceandmainmemoryiscalledpaging.
Pagingallowssystemadministratorstoperformefficientmanagementofsystemresources(memory,CPU,storage)atdesiredcostandservicelevels.
WithrecentadvancementsinstoragemediaandLinuxkernelimprovements,IntelOptaneDCSSDsprovideanewopportunitytooffsetDRAMcostsandallowformoreflexibleprocessmemoryoversubscription,athigherperformancelevelsthanbefore.
Thissolutionsblueprintwillexplorethoseusages.
TargetAudienceTargetedforsystemadministrators,systemoperators,DevOpsteams,andapplicationdeveloperswantingtoconfiguretheirunderlyingsoftwareandhardwareresourcestomaximizesystemperformanceatabettercost.
ThisdocumentassumesfamiliaritywithbasiccomputerarchitectureterminologyandtechniquesinOSusagestomanagephysicalresourcessuchasCPU,memoryandstorage.
ItalsoexplainsfundamentalconceptsofmemorymanagementtechniquesutilizedinmodernOSs,focusingontheLinuxenvironment.
TheimprovedimplementationsofLinuxSwap*andbetterhigherendurancememorymedia,suchasIntelOptanememory,isessentiallywhatenablessuchasolutiontobeeffectiveinamoderndatacenterenvironment.
DocumentOrganizationFirst,thisdocumentintroducesusecasesinwhichtheIntelOptaneDCSSDisusedasmemoryaugmentation.
Later,aservercostmodelispresented,whichcanbeadoptedoradjustedtocalculatepotentialcostsavingswhenleveraginganIntelOptaneDCSSDasDRAMreplacement.
Next,wedescribetheOSupgradesnecessarytomaximizesystemperformancewhenusinganIntelOptaneDCSSDasaswapdevice.
SpecificallyweprovideguidanceonminimumrequiredversionsofcommonLinuxdistributionsthatutilizeswapandmemorymanagementsubsystemimprovements,alongwithdetailsonbuildingtheLinuxkernelmanuallytomaximizeswapperformance.
TheAdditionalConsiderationsforSoftwareConfigurationsectionexploressystemconfigurationdetailsformaximizingswapperformance.
Thenwecomparetheperformanceofthedifferentswapdevices.
FinallyintheAppendixsectionsthedetailsofthememorymanagementsubsystemanddetailsofLinuxkernelinnovationsthatimproveswapperformanceareexplained.
Finally,akernelpatchlistisprovidedforadvanceduserswillingtobackportthechangesintotheirownkernelfork.
UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US5GlossaryTermDefinitionPhysicalmemoryFastmemory,byteaddressable(asopposedtodiskstoragewhichissectororblockaddressable).
Thisfast,dynamicsystemmemoryistypicallyprovidedbyDRAMtechnology.
SwapdeviceDedicatedspaceonastoragedeviceforstoringmemorypagesofprocessdataorprocesscode.
Itcanbewholeblockstoragedeviceoritspartitionorafileinfilesystem(swapfile).
VirtualmemoryMemorymanagementtechniqueimplementedinmodernOSs.
Itprovidesanillusiontotherunningprocessthatitoperatesonacontiguousblockofmemory,whileinrealityhardwareandtheOSmanagetranslationsbetweenvirtualaddressestophysicaladdresses,andtransfersofmemorypagesfromstoragedevicetophysicalmemory.
OSvirtualmemoryhidesthosecomplexitiesfromtheapplicationprogrammer.
TotalCostofOwnership(TCO)Adefined,butoftennotstandardizedapproachtoanalyzingthefinancialimpactofapurchase,andperhapsongoingexpensesofhardwareandsoftwareinfrastructureoveritslifecycle.
TCOmodelstypicallyincludesvariousfactorsimpactingcost,e.
g.
costtopurchaseHW(capitalspending),operationalcostrelatedtoelectricityusedtopowerandcoolabuilding,andDataCenterequipment.
Thispaperfocusesonasimplifiedservercostmodel.
YoucanconsideritBillofMaterialoptimization,sincethetargetisnotfullanalysisofallserveroperationoracquisitioncosts.
§UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune20196340608-001USMemoryOvercommitUseCasesThischapterintroducesexampleusecasesinwhichanIntelOptaneDCSSDcanbeusedasmemoryextension,orasmemoryreplacementbyusingtheLinuxswapmechanism.
ThischapteralsoprovidesanexampleservercostmodelthathasbeendevelopedtoillustratepotentialcostsavingswhenconsideringthepurchaseofanewHWinfrastructure.
Usethisservercostmodelasaframeworktocalculatepotentialcostsavingsattheservercapitalexpenditurelevel.
MemoryOvercommitforVirtualizesEnvironmentsOnecommontechniquewidelyusedamongcloudserviceproviders(CSPs)istoperformphysicalresourcesover-commitmentincludingphysicalCPU,storage,andmemory.
Thefollowingfigureillustratesvirtualmachinedifferentiationbasedonratio,andhowmuchoftheguestphysicalmemoryisactuallybackedupbyphysicalDRAM.
Forexample"Gold"VMs'guestphysicalmemoryisfullybackedupbyDRAM,whilefor"Silver"VMshalfofitsguestphysicalmemoryisbackedupbyDRAM,andtheremainingportionisbackedupbytheswapdevice.
Finally,for"Bronze"VMs,aquarteroftheguestphysicalmemoryisbackedupbyDRAM,theremainingportioncanbepagedouttotheswapdevice.
WithLinuxbasedhypervisor(KVM)thistypeofdifferentiationcanbeachievedusingthemechanismcalledcontrolgroups(cgroup)whichcontrolsresourceusage(e.
g.
systemmemory)toagroupofprocess–inthiscaseaclassofVMs.
Figure1:ExampleofVirtualMachineDifferentiationBasedonMemoryOvercommitRatio§UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US7ExampleServerCostModelThischapterfocusesonderivinganexampleservercostmodel,fromasystemmemoryhardwarecostsperspective,fortwoexampleconfigurationsofservers:server"A"andserver"B.
"Theservercostmodeldoesnottakeintoaccountthevariedanduniqueoperationalexpensesorothercapitalexpendituresrelatedtothelargerscopeofrunningadatacenter.
Forsimplicityofourcomparison,differencesinspace,power,operatingcosts,andothervariablefactorsareignored.
Server"A"andserver"B"configurationsarealmostidenticalwithregardstoCPU,networking,andstorage(bothbootdisksanddatavolumes).
Thereareonly2differencesbetweenthem:Server"A"totalphysicalDRAMis384GiB(24x16GBRDIMMs),whileserver"B"ispopulatedwithonly192GiB(12x16GBRDIMMs)ofphysicalDRAMServer"A"doesnotuseIntelOptaneDCSSDasaswapdevice;insteadserver"B"usesIntelOptaneDCSSD(2x100GiBdevices)asswapdevicesOneofthedatapointsmostinterestingtoasystemadministratoristherelativecostofserver"B"toserver"A"whichillustratesthepotentialhardwarecomponentcostsavingsonthepurchaseorleaseofnewserversforthedatacenter.
Additionalservercostcalculationsfocusontherelativecostsofserver"B"configurationcomparedtoserver"A".
Forsimplicity,thiscostingmodeltakesintoaccountonlythememorycomponents(DRAM+IntelOptaneDCSSDcapacities),becauseallothercomponentsofthoseserverconfigurationsareidentical.
Relativecostcomparisonofserver"B"configurationtoserver"A"configurationcanbedefinedasfollows:==_+__NowsimplydividingnumeratoranddenominatorofaboveequationbycostOptaneleadstothefollowingformula:=_+__SubstitutionofwithnormalizedperGiBDRAMtoOptanepriceratio(DRAM_to_Optane)willleadtothisfinalformula:=___+____Note:Pleasedoyourownpricecalculationsusingtheformulaabovetocalculateyourservercostsavings.
§UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune20198340608-001USTheKernelBuildProcessRecommendedSoftwareUpgradesInordertomaximizeIntelOptaneDCSSDperformanceinamemoryextensionconfiguration(asaswapdevice)IntelrecommendsupgradingyourLinuxdistributiontoarecentversioncontainingthebackportedseriesofpatchesthatwereaddedtotheupstreamLinuxkernelinversions4.
11andlater.
ThefollowingtablecontainsinformationonthecommonLinuxdistributionversionsthatadoptedperformanceimprovementspertainingtoswapperformance.
Table1:LinuxDistributionContainingSwapPerformanceImprovementsLinuxDistributionOSVersionRHEL/CentOSStartingversion7.
5andforwardStartingversion8.
0andforwardUbuntuStartingversion18.
10andforwardSLESStartingversionSLES15,SLES12SP4andforwardOracle*LinuxStartingversionOracleLinux7.
5andlaterwithUEKR5andRHCKHowtoBuildyourKernelBasedonUpstreamLinuxKernelThissectionprovidesinstructionsonbuildingaLinuxkernelimagebasedontheupstreamLinuxkernelproject.
ThismaybeespeciallyusefulforthoseinterestedinfurtherexplorationofLinuxkernelimprovementsrelatingtoswapdeviceperformance,andwhoarewillingtoupgradetheirinfrastructure'sLinuxkernel.
PleasenotethattheseinstructionsarebasedonUbuntu*server18.
04.
2systembuild,theexactstepsmaydifferbetweendifferentLinuxdistributions,e.
g.
usageofdistributionpackagemanager.
Approximatetimeneeded:1hourDevelopmentToolsRequiredformenuconfig(PossiblePre-requisites)Inordertoclone,compile,andbuildanewkernel/driver,thefollowingpackagesmustbeinstalled.
Youmustbeloggedinasroottoinstallthesepackages.
##Dependenciesneededtorunkernelmenuconfig#apt-getinstallflexbison#apt-getinstalllibncurses5-devlibncursesw5-dev##Dependenciesneededtoperformkernelbuild#apt-getinstalllibssl-devlibelf-dev#dpkg-ilinux-*.
debUtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US9BuildNewLinuxKernelwithRCUSettingforSwapDownloadLinuxkernel4.
14or5.
xornewerfromthisrepository:https://www.
kernel.
org/pub/linux/kernel/intoyourLinuxdistribution.
Itisthebesttochoosethelateststablekernel.
Fromaworkingdirectory:##Usewgettodownloadthekernelandunpackit(heretheexampleis4.
18.
20)#wgethttps://mirrors.
edge.
kernel.
org/pub/linux/kernel/v4.
x/linux-4.
18.
20.
tar.
xz#tar-xvflinux-4.
18.
20.
tar.
xz##AlternativelyclonewholeLinuxkernelgitrepositoryandcheckoutspecificbranch#gitclonehttps://git.
kernel.
org/pub/scm/linux/kernel/git/stable/linux.
git#gitcheckout–bv4.
18.
20_localv4.
18.
20BuildandinstallTocreatethekernelconfigurationfile(.
config)basedontherunningkernel,andusethedefaultsettingforallnewoptions,runthefollowingcommand:#yes""|makeoldconfigToobtainmaximumperformance,avoidread-copy-update(RCU)callbackprocessingasthismayintroducedelays.
ToavoidRCU,edit"CONFIG_RCU_NOCB_CPU=y"settinginyourlocalkernel.
configfile.
SeeOffloadingRCUProcessingtoDedicatedKernelThreadsfordetailsoneditingRCUsettings.
Alternatively,youcanmakechangesbyrunningmenuconfigtoselectthatoptionusingtheuserinterfaceasshownintheimagebelow.
#makemenuconfigUnder"GeneralSetupandFeatures>RCUSubsystem"setthe"OffloadRCUcallback…"flagasshownintheimagebelow:SaveandExitmenuconfig.
Buildthekernelandkernelmodules,andinstallthenewkernelonthesystem.
##Tobuildkernelimageandloadablekernelmodulesinvoke#make#makemodules_install##Installnewlybuiltkernelintooperatingsystem#makeinstallAftersuccessfulinstall,rebootthesystemtoloadthenewkernelimageandkernelmodules.
Usuallythenewkernelbecomesthedefaultbootselection.
AfterbootingtheOS,use"uname-a"toverifythattherunningkernelversionmatchesthenewlyinstalledkernelversion.
Ifadifferentkernelversionisloaded,youcanmodifythisbyreconfiguringthesystemloader,usuallygrub2.
Refertothesystemloaderdocumentationforyourspecificdistribution.
UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune201910340608-001USAdditionalConsiderationsforOSConfigurationThissectionexploresOSconfigurationconsiderationsformaximizingperformanceoftheswapdevice(s).
OffloadingRCUProcessingtoDedicatedKernelThreadsTooffloadRCUprocessingtodedicatedkernelthreads,editthekernelcommandlineoptioninthesystemloader.
WhenusingGrub2assystemloader,navigateto/etc/default/grubfileandadd"rcu_nocb="totheGRUB_CMDLINE_LINUX_DEFAULTentry.
Seebelow/etc/default/grubfilelistingforexample:.
.
.
GRUB_DISTRIBUTOR=`lsb_release-i-s2>/dev/null||echoDebian`GRUB_CMDLINE_LINUX_DEFAULT="rcu_nocbs=0-nmaybe-ubiquity"GRUB_CMDLINE_LINUX="".
.
.
Note:nisthenumberofcpus(orhwthreads)inyoursystemAftersavingedits,runeitherthe"update-grub"or"grub2-mkconfig"commandtoupdateyourgrub2settingsinthebootpartition.
Rebootthesystemandverifythatthenewsettingshavebeenappliedtothekernel.
#dmesg|grep-ioffload[0.
000000]OffloadRCUcallbacksfromCPUs:0-63.
ThereasonforthisstepistoavoidRCUprocessinginanIOcompletionpath,asRCUprocessingwilllikelyincreasepaginglatency.
TurningOffTransparenthugepagesTominimizetheoverheadofcoalescingmemorypagesintohugepagesandlaterbreakingthemupontheswapdevice,performthefollowingcommands:#echo'never'>/sys/kernel/mm/transparent_hugepage/enabled#echo'never'>/sys/kernel/mm/transparent_hugepage/defragWatermarkScaleFactorItisimportanttoincreasethewatermarkscalefactorin/proc/sys/vmasthisisthelevelwhereavailablememoryischeckedbykswapd.
Werecommendsettingitto400or4%ofavailablememory,doingsowillsetkswapdtoautomaticallykickoffswappingat4%ofavailablesystemmemory.
#echo'400'>/proc/sys/vm/watermark_scale_factorNUMAConsiderationsWhendealingwithmultipleswapdevicesonamulti-socketsystemwerecommenddistributingswapdevicesevenlyamongdifferentCPUsocketstoavoidQPI/UPItransfers.
MoreovertoavoidsoftwareoverheadwerecommendcreatingmanyswapdevicesonapartitionedNVMedevice.
Eachswappartitionmusthavethesamepriority.
Inmostcasestherecanbeatleast28partitions,dependingonthekernelconfiguration.
Whensettingupyoursystem,werecommendadheringtotheNUMAlocalityrulesformaximumperformance.
UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US11PerformanceDataof4.
18.
20LinuxSwapWeusedthepmbenchutilitytotesttheallocationandaccessof4KiBmemorypagesonaLinuxsystem.
OurtestsystemutilizedanUbuntu18.
4.
2distributionofLinuxwhichweinitiallyupgradedtothe4.
18.
20versionofthekernel,astheUbuntureleasecomeswith4.
15.
xkernelversion.
WeupgradedusingthemethodsnotedinAppendixA-AutomationScriptsandHow-toGuide.
Thereshouldbenoissuerunningkernel4.
14ornewerasthekernelpatchestoLinuxswapareupstreamed(publiconkernel.
org)in4.
14.
Youcannotgainthislevelofperformanceonkernelspriorto4.
14.
Wetestedthein-boxkernelofUbuntu18.
04.
2(kernel4.
15.
0-46-generic)andsawminimaldifference(Hereisanexamplevariablesettingfrom/etc/default/grub,CPUcountspecific:GRUB_CMDLINE_LINUX_DEFAULT="rcu_nocbs=0-[n]maybe-ubiquity"Where[n]isthenumberoftotalCPUcoresorvirtualCPUthreadsinyoursystem.
Configurethekernelwiththese.
configsettingsifyouareabletocompileyourownkernel.
4.
EXPERIMENTAL:Generallyspeaking,itisbesttosettheNVMeschedulerto[none]ontheNVMeSSDswhichyouaretestingthemqblockorkyberscheduler.
Inmostcasesyourbuildshows[none],whichisfine.
#more/sys/block/nvme1n1/queue/scheduler[none]UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US175.
NewerkernelsallowanNVMequeuesizeof1,023,whichissufficientandrecommended.
6.
IfyouareseeingNVMeblockmerges,changeyourNVMeblocksizeto4Kib(not512b)sectors.
Ifblockmergesarestilloccurringaftermakingthischange,trythefollowing.
First,checkthenomergesvalue:#cat/sys/block/queue/nomergesThenomergesvalueshouldbesetto2.
Verifyandchangeifnecessary:echo2>/sys/block/queue/nomerges§UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune201918340608-001USAppendixBMemoryManagementFundamentalsThischapterintroducesthebasicmemorymanagementconceptsusedintheLinuxkernel.
ItexplainssystemlevelbottlenecksobservedwhenIntelOptaneDCSSDsareusedasswapdeviceswithLinuxversionspriortov4.
14oftheupstreamLinuxkernel.
Finally,itexplainstechniquestoovercomethosebottlenecksinversion4.
14,souserscanexperienceimprovedperformanceandutilizeIntelOptaneDCSSDsasswapdevices.
B.
1MemoryManagementSystemOverviewModernoperatingsystemsimplementavirtualmemorymodelwhichprovidesmanyadvantagestoapplicationdevelopers.
Virtualmemorymodelsimplifiessoftwaredevelopment,itleavesphysicalmemoryallocationanddataplacementcomplexitytotheunderlyingoperatingsystem.
Theoperatingsystemkerneldealswiththatcomplexitybyprovidinganimpressiontoanyrunningprocessthathasabigchunkofmemoryavailable(usually4GiB)foritsexclusiveuse.
InrealityOSkernelmapsprocessvirtualmemorytophysicalDRAM,andpotentiallyoverflowstoaswapdevice,whichextendsavailablephysicalmemory.
Theprocessoftransferringdatabetweentheswapdeviceandphysicalmemoryiscalledpagingandconsistsofpage-inswhenthedataisreadfromtheswapdeviceintophysicalmemory,andpage-outswhendataismovedoutofmemory.
Itshouldbenoted,page-outsmayrequiredatatobewrittenouttotheswapdevice,basedonthestateofthepage.
Figure2belowprovidesaconceptualdiagramofvirtualmemoryandpagingFigure2:VirtualMemoryConceptthroughPagingUtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US19ThepagingprocessismanagedbytheOSandisheavilysupportedbyCPUhardwarethroughthememorymanagementunit(MMU).
Forexample,MMUcontainstranslationlookasidebuffer(TLB)cachewhichcontainsrecentinformationonvirtual-to-physicalmemorytranslations.
Thisenablesasignificantreductionintimeneededtoaccessdatainmemory.
AnotherCPUfeaturethatassiststheOSwithmemorymanagementisamechanismcalledpagefault.
PagefaultisanexceptionraisedbyCPUhardwarewhenaprocesstriestoaccessavirtualmemorylocationthatisnotmappedtoaphysicaladdress.
Therearedifferenttypesofpagefaults:Minor–isrisenwhenapageexistsinmainmemorybutthereisnoentryindicatingvirtual-to-physicaladdressmapping.
ThepagefaulthandlerisimplementedintheOScreatesanewmappingentry.
Major–isrisenwhenapagedoesnotexistinmainmemory.
Thepagefaulthandlerneedstobringrequireddatafromtheswapdeviceintomemoryandcreatecorrespondingmappingentry.
Forexample,thishappensinafreshlyloadedprocesswhichcausestheOSkerneltodelayloadingthewholeprogramintomemory.
Thistechnique,calledon-demandpaging,acceleratesprocessstartup.
AmajorpagefaultisaperformancedrainingprocedurethatrequirestheOSpagefaulthandlertofindanavailablelocationinphysicalmemory,whichcanpotentiallyinvolvepaging-outandloadingcontentoftheprogramfromtheswapdeviceintomemory,beforetheprocesscancontinueitsexecution.
Therearetwodifferenttypesofpages:Filesystempages,orpagesbackedupbythefiles.
Thesearememorypagesthatcontainfiledata;forexample,databasefilesdirectlymappedintotoprocessaddressspace,orlibraryfilescontainingexecutableprogramcode.
Thesepagescanbepaged-intophysicalmemory;forexample,whentheprogramstartsexecutinginstructionsstoredonthedisk(i.
e.
programusageofasharedlibrary).
TheLinuxpagecacheisacacheofthesepagesdestinedforfiles–bothresidentto-be-read,andchanged(dirty)thatneedtobesynchronizedtosomestoragedevice.
DirectaccessIOroutinesforwhichthereisnopagecacheusagearealsoavailableonLinux.
Sincethepagecacheisanopportunisticandgeneralusagecache,itisnotappropriateforallusages.
Anonymouspages.
Thesearememorypagesthatcontainprivateprocessinformation,thatisheaporstack,andhavenodeviceorfilesystembackingthem.
Whenthesystemisrunningintolowmemoryconditions(highmemorypressure)anonymouspagescanbepaged-out(swappedout)totheswappingfileorswapdevicebyOSprocesskswapdanditsrelatedkernelthreads.
Thisprocesscanbemoreorlessaggressivebasedontheconfigurationoftheswappinessparameter,asthisparametersetsthetargetofwhenswappingshouldbecomemoreactive.
Theparametercanbesetfrom0to200;thehigherthevalue,themoreswapisutilizedoverpagecachememoryreclamation.
InourperformancestudytheOSisconfiguredtoitsdefaultvalueof60,whichisthetypicalproductionrecommendedsetting.
Valueof100meansthatOSwillreclaimmemorypagesusingpagecacheandswapequally.
Youcanprintoutprocvariable/proc/sys/vm/swappinesstoviewitscurrentvalue.
Anotherimportantparameterusedtocontrolwhenkswapdkernelthreadsareactivatediswatermark_scale_factor.
Theusercansetalowerlimitofavailablememorythatspecifieswhenkswapdactivitywillbestarted.
MoredetailsareavailableinWatermarkscalefactorsection.
§UtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune201920340608-001USAppendixCLinuxKernelInnovationstoLeverageFastSSDsasMemoryExtensionUntilrecentlytheLinuxkernelhadbeenprimarilyoptimizedforrotationaldisksbecausetheywerethepredominantstoragedevices.
Oneofthetechniquesusedtomaximizeswapperformanceforrotationalharddiskdrives(HDDs)wastomaintainswapdatainthecontiguouslocationonthedisktominimizediskseektime.
Theperformanceyieldsofthistechniquewerefineforrotationalharddiskdrives(HDDs)butinadequateforsolidstatedrives(SSDs).
Withrecentadvancementsinnon-volatilememory(NVM)technologieslikeIntelOptanetechnology,newtechniquesandmethodsareneededtotakeadvantageoftheincreasedperformanceofthemediaanddevices.
WhiletestingLinuxswapagainstthesenewdevices,manysystem-levelbottleneckswerediscoveredinLinuxswap.
KerneldevelopershaveaddressedsomeoftheperformancebottlenecksinthereleaseofLinuxkernel4.
14.
Inthissectionweexploresomeofthoseenhancements.
SwapdeviceintheLinuxkernelisrepresentedbyadedicateddatastructure(swap_info_struct)thatcontainsinformationonhowmemorypagesarestoredontheswapdevice,seeFigure3below.
Thisinformationisstoredinanarray,calledswap_mapwhichispartofswap_info_struct.
Swap_mapstoresinformationonusagecountforapagestoredontheswapdevice.
Swap_mapentriesareaggregatedintoclusters,theseclusterseffectivelyassignspecificportionsoftheswapdevicetothespecificCPUcore.
Updatestotheusagecountofindividualswap_mapentriesrequireperclusterlockstobetakeninsteadofholdingasinglelockprotectingthewholeswap_map.
Figure3:PrimarySwapDeviceDataStructuresEventhoughtherearededicatedswapentriesperCPUcluster,accessestotheswap_mapareprotectedbyasinglelockwhichisascalabilityandperformancelimiterwhenconcurrentattemptstotheswapdevicearemade.
Thenegativeimpactofthissinglelockisespeciallyvisibleinhighmemorypressureconditions.
Whenthesinglelockisusedtoprotectcriticalinformationintheswap_info_structdatastructure,latenciesforhandlingpagefaultsfromtheswapdevicearesignificantlyincreased.
ThisheavilyimpactsenduserperformanceandrendersthelatestHWlatencyimprovementsineffectiveduetosystemlevelbottlenecks.
Thenextsectionexplainstechniquestominimizelockcontentiononthesinglelockthatprotectsswap_info_structdatastructure,andtoimprovesystemlevellatencies.
AspreviouslydiscussedinthePerformanceDatasection,accesslatenciesonswapaveragebelow20microsecondswhenutilizingahigherperformancedrive.
UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US21C.
1SwapImprovementsCompletedinv4.
14ofLinuxKernelTherearemanysoftwaretechniquestoaddressperformanceproblemsrelatedtolockcontention.
Theseapproachestypicallyrelyonthefollowingprinciples:Replacementofsinglecoarse-grainedlockonswappartitionwithmultiplefiner-grainedlocksontheswapcluster–whenmanypiecesofdataareprotectedfromconcurrentaccessesbyasingle,biglock,theconcurrentthreadsthatareattemptingtoreadorwritedataareserializedinaqueuewhileawaitingtheirturn.
Insuchcases,toimproveparallelism,abiglockcanbesplitintomanysmallerlockstoprotectindependentsub-piecesofdata.
Thisapproachmayyieldsignificantperformanceimprovementsespeciallywhenmultiplethreadsaccessindependentpiecesofdata,howeverwhenmorethanonethreadattemptstoaccessthesamepieceofdata,thoseattemptswillbeserializedinaqueue.
Reductionoftimespentwhenholdinglock(ortimespentincriticalsection)–whentherearemultiplethreadsattemptingtoaccessacriticalsectionthatisprotectedbyanexclusivelockheldbyanotherthreadtheyareallpauseduntillockisreleased.
Thelongerthecriticalsectionis,thelongertheotherthreadswillwaitbeforetheycancontinue.
Reductionoftimethatgiventhreadspendsinthecriticalsectionisanotherusefultechniqueincreasingparallelismandreducinglatency.
KernelDevelopersdeterminedthattheoccurrenceofincreasedsystemlevellatencieswhileswappingtoIntelIntelOptaneDCSSDwerecausedbyasinglelockprotectingswap_info_structdatastructure.
TheyhaveappliedtheprinciplesdiscussedaboveintotheseriesofswapimprovementsthatareavailableinLinuxkernelversion4.
14andlater.
Thefollowingtechniqueshavebeendevelopedtoreducelockcontentionontheswap_info_structlock.
1.
BulkoperationsandperCPUlockclusterimprovements–multipleswap_mapentriesthatrepresentfreespaceontheswapdevicehavebeenaggregatedinlargerunitsandstoredinswapslotcache.
SwapslotcacheismanagedbyaspecificCPUcore,becauseofthatitiscalled"percpuswapslotcache".
WhenaSWthreadrequestsnewswapspaceitfirsttriestoallocateitfromswapslotcacheonthegivenCPU.
Thisoperationdoesnotrequirelocking.
Becausesingleswapslotcachecontainsmultipleswap_mapentriesitislikelythatswap_mapentrywillsuccessfullybeallocatedfromit.
Whenallocationfromswapslotcacheisnotpossible,swapsoftwareneedstoperformbulkallocationofmultipleswap_mapentriesfromswap_map,andassignthoseentriestoswapslotcache.
Swap_info_lockisacquiredwhendoingbulkoperationsontheswap_mapdatastructure.
PleaserefertoFigure4belowfordetailsofthechanges.
Figure4:SwapBulkOperationsImprovementsUtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune201922340608-001US2.
Radixtreesplit–anothersourceoflockcontentionthatexistedinLinuxkernelpriortoversion4.
14wasradixtreeusedforswapcache.
Swapcacheisanoptimizationinaswappingbehaviorthatreducesthenumberofwritestoswapdeviceorswapfileandmaintainsmappingbetweenmemorypageandswapmapentrywhenmemorypageisswappedinorswappedout.
Swapwriteisconsideredunnecessarywhenapageexistsinaswapdeviceorswapfile,aswellasinmainmemory,becausebothofthoselocationscontainthesamedata.
WhenLinuxconsiderspageforreclamationitcansimplycheckifitexistsinbothswapdeviceorswapfile,andinmainmemoryanddatainthosetwolocationsmatch.
Insuchcasepageinmainmemorycanbesimplymarkedasinvalidandreclaimed.
Toperformcheckifswapentryhascorrespondingpagestoredinmainmemoryradixtreedatastructureisused.
Swapcacheradixtreepriortoversion4.
14ofLinuxusedtobeprotectedbysingleswapcachelockwhichreducedparallelism.
Inversion4.
14singleswapcacheradixtreehasbeensplitintomultiplesmallertrees.
Thismodificationintroducedseparatelockspereachsmallerradixtreeandincreasedparallelism.
Thecurrentdesignmethodisbestimplementedwithmanyswappartitionsonthephysicalswapdevice.
SeeAppendixAandtheautomationscriptsongithubtoimplementthemaximumnumberofLinuxswappartitions,typically28.
§UtilizingLinuxSwapwithIntelOptaneDCSSDsJune2019SolutionsBlueprint340608-001US23AppendixDSwapImprovementsPatchListsThissectionprovidesalistofkernelpatchespertainingtoswapimprovementsthatwereintroducedintheLinuxkernel4.
11andin4.
14.
Thislistofpatchesmaybeusefulwhenconsideringcreatingauniquekernelimagebasedonkernelversionsolderthan4.
11,andbackportingswapimprovementsintoit.
commit322b8afe4a65906c133102532e63a278775cc5f0Author:HuangYingDate:WedMay314:52:492017-0700mm,swap:Fixaraceinfree_swap_and_cache()commit0ccfece6ed507738c0e7e4414c3688b78d4e3756Author:HuangYingDate:WedMay314:56:162017-0700mm/swapfile.
c:fixswapspaceleakinerrorpathofswap_free_entries()commit322b8afe4a65906c133102532e63a278775cc5f0Author:HuangYingDate:WedMay314:52:492017-0700mm,swap:Fixaraceinfree_swap_and_cache()commitba81f83842549871cbd7226fc11530dc464500bbAuthor:HuangYingDate:WedFeb2215:45:462017-0800mm/swap:skipreadaheadonlywhenswapslotcacheisenabledcommit039939a65059852242c823ece685579370bc574fAuthor:TimChenDate:WedFeb2215:45:432017-0800mm/swap:enableswapslotscacheusagecommit67afa38e012e9581b9b42f2a41dfc56b1280794dAuthor:TimChenDate:WedFeb2215:45:392017-0800mm/swap:addcacheforswapslotsallocationcommit7c00bafee87c7bac7ed9eced7c161f8e5332cb4eAuthor:TimChenDate:WedFeb2215:45:362017-0800mm/swap:freeswapslotsinbatchUtilizingLinuxSwapwithIntelOptaneDCSSDsSolutionsBlueprintJune201924340608-001UScommit36005bae205da3eef0016a5c96a34f10a68afa1eAuthor:TimChenDate:WedFeb2215:45:332017-0800mm/swap:allocateswapslotsinbatchescommite8c26ab60598558ec3a626e7925b06e7417d7710Author:TimChenDate:WedFeb2215:45:292017-0800mm/swap:skipreadaheadforunreferencedswapslotscommit4b3ef9daa4fc0bba742a79faecb17fdaaead083bAuthor:Huang,YingDate:WedFeb2215:45:262017-0800mm/swap:splitswapcacheinto64MBtrunkscommit235b62176712b970c815923e36b9a9cc05d4d901Author:Huang,YingDate:WedFeb2215:45:222017-0800mm/swap:addclusterlockcommit6a991fc72d1243b8da0c644d3147d3ec41a0b281Author:Huang,YingDate:WedFeb2215:45:192017-0800mm/swap:fixkernelmessageinswap_info_get()commitf6498b3f33123a6ee1c81a1b29b9c07964cb95c1Author:HuangYingDate:FriOct816:59:302016-0700mm:don'tuseradixtreewritebacktagsforpagesinswapcacheD.
1ReferencesSeethefollowinglinksforimportantreferenceinformation.
Mostoftheoriginalpatches:https://kernelnewbies.
org/Linux_4.
11#Memory_managementSecondstepswapoptimizationnotes:https://kernelnewbies.
org/Linux_4.
14#Memory_managementWhitepaperonPMBench(2018):https://www.
semanticscholar.
org/paper/Pmbench%3A-A-Micro-Benchmark-for-Profiling-Paging-on-Yang-Seymour/dd0adcde7d074a414a9df76fb20d52a0d8aa8c71#paper-headerWhitepaperwithdeeperanalysisofpersistentmemory'sapplicabilitytomemorypageaccessperformance:https://web.
cs.
unlv.
edu/jisooy/paper/yang_pmbench.
pdf§

819云互联(800元/月),香港BGP E5 2650 16G,日本 E5 2650 16G

819云互联 在本月发布了一个购买香港,日本独立服务器的活动,相对之前的首月活动性价比更高,最多只能享受1个月的活动 续费价格恢复原价 是有些颇高 这次819云互联与机房是合作伙伴 本次拿到机房 活动7天内购买独立服务器后期的长期续费价格 加大力度 确实来说这次的就可以买年付或者更长时间了…本次是5个机房可供选择,独立服务器最低默认是50M带宽,不限制流量,。官网:https://ww...

hosthatch:14个数据中心15美元/年

hosthatch在做美国独立日促销,可能你会说这操作是不是晚了一个月?对,为了准备资源等,他们拖延到现在才有空,这次是针对自己全球14个数据中心的VPS。提前示警:各个数据中心的网络没有一个是针对中国直连的,都会绕道而且ping值比较高,想买的考虑清楚再说!官方网站:https://hosthatch.com所有VPS都基于KVM虚拟,支持PayPal在内的多种付款方式!芝加哥(大硬盘)VPS5...

妮妮云(30元),美国300G防御 2核4G 107.6元,美国高速建站 2核2G

妮妮云的来历妮妮云是 789 陈总 张总 三方共同投资建立的网站 本着“良心 便宜 稳定”的初衷 为小白用户避免被坑妮妮云的市场定位妮妮云主要代理市场稳定速度的云服务器产品,避免新手购买云服务器的时候众多商家不知道如何选择,妮妮云就帮你选择好了产品,无需承担购买风险,不用担心出现被跑路 被诈骗的情况。妮妮云的售后保证妮妮云退款 通过于合作商的友好协商,云服务器提供2天内全额退款,超过2天不退款 物...

pagedefrag为你推荐
数码资源网安卓有没有可以离线刷题的软件?今日热点怎么删除“今日热点”到底要怎样才能取消弹窗,每次开机都会神雕侠侣礼包大全神雕侠侣先手礼包在哪领怎么点亮qq空间图标QQ空间图标怎么点亮?人人逛街人人逛街网是正品吗免费qq空间装扮有办法免费装扮QQ空间吗??2012年正月十五2012年正月十五 几月几号分词技术什么是seo分词技术网站地图制作我想给网站做网站地图不知道怎么做的,请教高手!网站排名靠前如何优化网站 如何让网站排名靠前
西安虚拟主机 郑州服务器租用 国外vps租用 最便宜虚拟主机 sugarhosts hostmaster 5折 bluehost z.com bash漏洞 免费名片模板 国外php空间 私有云存储 大容量存储 e蜗 linux空间 国外免费全能空间 腾讯总部在哪 丽萨 php服务器 更多