auto红帽子企业

红帽子企业  时间:2021-01-19  阅读:()
RedHatEnterpriseLinuxNetworkPerformanceTuningGuideAuthors:JamieBainbridgeandJonMaxwellReviewer:NoahDavidsEditors:DayleParkerandChrisNegus03/25/2015Tuninganetworkinterfacecard(NIC)foroptimumthroughputandlatencyisacomplexprocesswithmanyfactorstoconsider.
Thesefactorsincludecapabilitiesofthenetworkinterface,driverfeaturesandoptions,thesystemhardwarethatRedHatEnterpriseLinuxisinstalledon,CPU-to-memoryarchitecture,amountofCPUcores,theversionoftheRedHatEnterpriseLinuxkernelwhichimpliesthedriverversion,nottomentiontheworkloadthenetworkinterfacehastohandle,andwhichfactors(speedorlatency)aremostimportanttothatworkload.
Thereisnogenericconfigurationthatcanbebroadlyappliedtoeverysystem,astheabovefactorsarealwaysdifferent.
Theaimofthisdocumentisnottoprovidespecifictuninginformation,buttointroducethereadertotheprocessofpacketreceptionwithintheLinuxkernel,thentodemonstrateavailabletuningmethodswhichcanbeappliedtoagivensystem.
PACKETRECEPTIONINTHELINUXKERNELTheNICringbufferReceiveringbuffersaresharedbetweenthedevicedriverandNIC.
Thecardassignsatransmit(TX)andreceive(RX)ringbuffer.
Asthenameimplies,theringbufferisacircularbufferwhereanoverflowsimplyoverwritesexistingdata.
ItshouldbenotedthattherearetwowaystomovedatafromtheNICtothekernel,hardwareinterruptsandsoftwareinterrupts,alsocalledSoftIRQs.
TheRXringbufferisusedtostoreincomingpacketsuntiltheycanbeprocessedbythedevicedriver.
ThedevicedriverdrainstheRXring,typicallyviaSoftIRQs,whichputstheincomingpacketsintoakerneldatastructurecalledansk_buffor"skb"tobeginitsjourneythroughthekernelanduptotheapplicationwhichownstherelevantsocket.
TheTXringbufferisusedtoholdoutgoingpacketswhicharedestinedforthewire.
Theseringbuffersresideatthebottomofthestackandareacrucialpointatwhichpacketdropcanoccur,whichinturnwilladverselyaffectnetworkperformance.
InterruptsandInterruptHandlersInterruptsfromthehardwareareknownas"top-half"interrupts.
WhenaNICreceivesincomingdata,itcopiesthedataintokernelbuffersusingDMA.
TheNICnotifiesthekernelofthisdatabyRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell1raisingahardinterrupt.
Theseinterruptsareprocessedbyinterrupthandlerswhichdominimalwork,astheyhavealreadyinterruptedanothertaskandcannotbeinterruptedthemselves.
HardinterruptscanbeexpensiveintermsofCPUusage,especiallywhenholdingkernellocks.
Thehardinterrupthandlerthenleavesthemajorityofpacketreceptiontoasoftwareinterrupt,orSoftIRQ,processwhichcanbescheduledmorefairly.
Hardinterruptscanbeseenin/proc/interruptswhereeachqueuehasaninterruptvectorinthe1stcolumnassignedtoit.
TheseareinitializedwhenthesystembootsorwhentheNICdevicedrivermoduleisloaded.
EachRXandTXqueueisassignedauniquevector,whichinformstheinterrupthandlerastowhichNIC/queuetheinterruptiscomingfrom.
Thecolumnsrepresentthenumberofincominginterruptsasacountervalue:#egrep"CPU0|eth2"/proc/interruptsCPU0CPU1CPU2CPU3CPU4CPU5105:14160600000IR-PCI-MSI-edgeeth2-rx-0106:01410910000IR-PCI-MSI-edgeeth2-rx-1107:20163785000IR-PCI-MSI-edgeeth2-rx-2108:30019437000IR-PCI-MSI-edgeeth2-rx-3109:000000IR-PCI-MSI-edgeeth2-txSoftIRQsAlsoknownas"bottom-half"interrupts,softwareinterruptrequests(SoftIRQs)arekernelroutineswhicharescheduledtorunatatimewhenothertaskswillnotbeinterrupted.
TheSoftIRQ'spurposeistodrainthenetworkadapterreceiveringbuffers.
Theseroutinesrunintheformofksoftirqd/cpu-numberprocessesandcalldriver-specificcodefunctions.
Theycanbeseeninprocessmonitoringtoolssuchaspsandtop.
Thefollowingcallstack,readfromthebottomup,isanexampleofaSoftIRQpollingaMellanoxcard.
Thefunctionsmarked[mlx4_en]aretheMellanoxpollingroutinesinthemlx4_en.
kodriverkernelmodule,calledbythekernel'sgenericpollingroutinessuchasnet_rx_action.
Aftermovingfromthedrivertothekernel,thetrafficbeingreceivedwillthenmoveuptothesocket,readyfortheapplicationtoconsume:mlx4_en_complete_rx_desc[mlx4_en]mlx4_en_process_rx_cq[mlx4_en]mlx4_en_poll_rx_cq[mlx4_en]net_rx_action__do_softirqrun_ksoftirqdsmpboot_thread_fnkthreadkernel_thread_starterkernel_thread_starter1lockheldbyksoftirqdRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell2SoftIRQscanbemonitoredasfollows.
EachcolumnrepresentsaCPU:#watch-n1grepRX/proc/softirqs#watch-n1grepTX/proc/softirqsNAPIPollingNAPI,orNewAPI,waswrittentomakeprocessingpacketsofincomingcardsmoreefficient.
Hardinterruptsareexpensivebecausetheycannotbeinterrupted.
Evenwithinterruptcoalescence(describedlaterinmoredetail),theinterrupthandlerwillmonopolizeaCPUcorecompletely.
ThedesignofNAPIallowsthedrivertogointoapollingmodeinsteadofbeinghard-interruptedforeveryrequiredpacketreceive.
Undernormaloperation,aninitialhardinterruptorIRQisraised,followedbyaSoftIRQhandlerwhichpollsthecardusingNAPIroutines.
ThepollingroutinehasabudgetwhichdeterminestheCPUtimethecodeisallowed.
ThisisrequiredtopreventSoftIRQsfrommonopolizingtheCPU.
Oncompletion,thekernelwillexitthepollingroutineandre-arm,thentheentireprocedurewillrepeatitself.
Figure1:SoftIRQmechanismusingNAPIpolltoreceivedataNetworkProtocolStacksOncetraffichasbeenreceivedfromtheNICintothekernel,itisthenprocessedbyprotocolhandlerssuchasEthernet,ICMP,IPv4,IPv6,TCP,UDP,andSCTP.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell3www.
redhat.
comCopyright2015RedHat,Inc.
"RedHat,"RedHatLinux,theRedHat"Shadowman"logo,andtheproductslistedaretrademarksofRedHat,Inc.
,registeredintheU.
S.
andothercountries.
LinuxistheregisteredtrademarkofLinusTorvaldsintheU.
S.
andothercountries.
Finally,thedataisdeliveredtoasocketbufferwhereanapplicationcanrunareceivefunction,movingthedatafromkernelspacetouserspaceandendingthekernel'sinvolvementinthereceiveprocess.
PacketegressintheLinuxkernelAnotherimportantaspectoftheLinuxkernelisnetworkpacketegress.
Althoughsimplerthantheingresslogic,theegressisstillworthacknowledging.
Theprocessworkswhenskbsarepasseddownfromtheprotocollayersthroughtothecorekernelnetworkroutines.
Eachskbcontainsadevfieldwhichcontainstheaddressofthenet_devicewhichitwilltransmittedthrough:intdev_queue_xmit(structsk_buff*skb){structnet_device*dev=skb->dev;>/etc/sysctl.
confThevaluesspecifiedintheconfigurationfilesareappliedatboot,andcanbere-appliedanytimeafterwardswiththesysctl-pcommand.
Thisdocumentwillshowtheruntimeconfigurationchangesforkerneltunables.
Persistingdesirablechangesacrossrebootsisanexerciseforthereader,accomplishedbyfollowingtheaboveexample.
IdentifyingthebottleneckPacketdropsandoverrunstypicallyoccurwhentheRXbufferontheNICcardcannotbedrainedfastenoughbythekernel.
Whentherateatwhichdataiscomingoffthenetworkexceedsthatrateatwhichthekernelisdrainingpackets,theNICthendiscardsincomingpacketsoncetheNICbufferisfullandincrementsadiscardcounter.
Thecorrespondingcountercanbeseeninethtoolstatistics.
ThemaincriteriahereareinterruptsandSoftIRQs,whichrespondtohardwareinterruptsandreceivetraffic,thenpollthecardfortrafficforthedurationspecifiedbynet.
core.
netdev_budget.
Thecorrectmethodtoobservepacketlossatahardwarelevelisethtool.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell6Theexactcountervariesfromdrivertodriver;pleaseconsultthedrivervendorordriverdocumentationfortheappropriatestatistic.
Asageneralrulelookforcounterswithnameslikefail,miss,error,discard,buf,fifo,fullordrop.
Statisticsmaybeupperorlowercase.
Forexample,thisdriverincrementsvariousrx_*_errorsstatistics:#ethtool-Seth3rx_errors:0tx_errors:0rx_dropped:0tx_dropped:0rx_length_errors:0rx_over_errors:3295rx_crc_errors:0rx_frame_errors:0rx_fifo_errors:3295rx_missed_errors:3295Therearevarioustoolsavailabletoisolateaproblemarea.
Locatethebottleneckbyinvestigatingthefollowingpoints:Theadapterfirmwarelevel-Observedropsinethtool-SethXstatisticsTheadapterdriverlevelTheLinuxkernel,IRQsorSoftIRQs-Check/proc/interruptsand/proc/net/softnet_statTheprotocollayersIP,TCP,orUDP-Usenetstat-sandlookforerrorcounters.
Herearesomecommonexamplesofbottlenecks:IRQsarenotgettingbalancedcorrectly.
Insomecasestheirqbalanceservicemaynotbeworkingcorrectlyorrunningatall.
Check/proc/interruptsandmakesurethatinterruptsarespreadacrossmultipleCPUcores.
Refertotheirqbalancemanual,ormanuallybalancetheIRQs.
Inthefollowingexample,interruptsaregettingprocessedbyonlyoneprocessor:#egrep"CPU0|eth2"/proc/interruptsCPU0CPU1CPU2CPU3CPU4CPU5105:143000000000IR-PCI-MSI-edgeeth2-rx-0106:120000000000IR-PCI-MSI-edgeeth2-rx-1107:139999900000IR-PCI-MSI-edgeeth2-rx-2108:135000000000IR-PCI-MSI-edgeeth2-rx-3109:8000000000IR-PCI-MSI-edgeeth2-txSeeifanyofthecolumnsbesidesthe1stcolumnof/proc/net/softnet_statareincreasing.
Inthefollowingexample,thecounterislargeforCPU0andbudgetneedstobeincreased:#cat/proc/net/softnet_stat0073d76b00000000000049ae00000000000000000000000000000000000000000000000000000000000000d20000000000000000000000000000000000000000000000000000000000000000000000000000015c000000000000000000000000000000000000000000000000000000000000000000000000RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell7SoftIRQsmaynotbegettingenoughCPUtimetopolltheadapterasperFigure1.
Usetoolslikesar,mpstat,ortoptodeterminewhatisconsumingCPUruntime.
Useethtool-SethXtocheckaspecificadapterforerrors:#ethtool-Seth3rx_over_errors:399rx_fifo_errors:399rx_missed_errors:399Dataismakingituptothesocketbufferqueuebutnotgettingdrainedfastenough.
Monitorthess-nmpcommandandlookforfullRXqueues.
Usethenetstat-scommandandlookforbufferpruningerrorsorUDPerrors.
ThefollowingexampleshowsUDPreceiveerrors:#netstat-suUdp:4218packetsreceived111999packetreceiveerrors333packetssentIncreasetheapplication'ssocketreceivebufferorusebufferauto-tuningbynotspecifyingasocketbuffersizeintheapplication.
Checkwhethertheapplicationcallssetsockopt(SO_RCVBUF)asthatwilloverridethedefaultsocketbuffersettings.
Applicationdesignisanimportantfactor.
Lookatstreamliningtheapplicationtomakeitmoreefficientatreadingdataoffthesocket.
OnepossiblesolutionistohaveseparateprocessesdrainingthesocketqueuesusingInter-ProcessCommunication(IPC)toanotherprocessthatdoesthebackgroundworklikediskI/O.
UsemultipleTCPstreams.
Morestreamsareoftenmoreefficientattransferringdata.
Usenetstat-neopatocheckhowmanyconnectionsanapplicationisusing:tcp000.
0.
0.
0:123450.
0.
0.
0:*LISTEN030580027840/.
/serveroff(0.
00/0/0)tcp1634285801.
0.
0.
8:123451.
0.
0.
6:57786ESTABLISHED030582127840/.
/serveroff(0.
00/0/0)UselargerTCPorUDPpacketsizes.
Eachindividualnetworkpackethasacertainamountofoverhead,suchasheaders.
Sendingdatainlargercontiguousblockswillreducethatoverhead.
Thisisdonebyspecifyingalargerbuffersizewiththesend()andrecv()functioncalls;pleaseseethemanpageofthesefunctionsfordetails.
Insomecases,theremaybeachangeindriverbehaviorafterupgradingtoanewkernelversionofRedHatEnterpriseLinux.
Ifadapterdropsoccurafteranupgrade,openasupportcasewithRedHatGlobalSupportServicestodeterminewhethertuningisrequired,orwhetherthisisadriverbug.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell8PerformanceTuningSoftIRQMissesIftheSoftIRQsdonotrunforlongenough,therateofincomingdatacouldexceedthekernel'scapabilitytodrainthebufferfastenough.
Asaresult,theNICbufferswilloverflowandtrafficwillbelost.
Occasionally,itisnecessarytoincreasethetimethatSoftIRQsareallowedtorunontheCPU.
Thisisknownasthenetdev_budget.
Thedefaultvalueofthebudgetis300.
ThiswillcausetheSoftIRQprocesstodrain300messagesfromtheNICbeforegettingofftheCPU:#sysctlnet.
core.
netdev_budgetnet.
core.
netdev_budget=300Thisvaluecanbedoubledifthe3rdcolumnin/proc/net/softnet_statisincreasing,whichindicatesthattheSoftIRQdidnotgetenoughCPUtime.
Smallincrementsarenormalanddonotrequiretuning.
Thisleveloftuningisseldomrequiredonasystemwithonlygigabitinterfaces.
However,asystempassingupwardsof10Gbpsmayneedthistunableincreased.
#catsoftnet_stat0073d76b00000000000049ae00000000000000000000000000000000000000000000000000000000000000d20000000000000000000000000000000000000000000000000000000000000000000000000000015c0000000000000000000000000000000000000000000000000000000000000000000000000000002a000000000000000000000000000000000000000000000000000000000000000000000000Forexample,tuningthevalueonthisNICfrom300to600willallowsoftinterruptstorunfordoublethedefaultCPUtime:#sysctl-wnet.
core.
netdev_budget=600TunedTunedisanadaptivesystemtuningdaemon.
Itcanbeusedtoapplyavarietyofsystemsettingsgatheredtogetherintoacollectioncalledaprofile.
AtunedprofilecancontaininstructionssuchasCPUgovernor,IOscheduler,andkerneltunablessuchasCPUschedulingorvirtualmemorymanagement.
TunedalsoincorporatesamonitoringdaemonwhichcancontrolordisablepowersavingabilityofCPUs,disks,andnetworkdevices.
Theaimofperformancetuningistoapplysettingswhichenablethemostdesirableperformance.
Tunedcanautomatealargepartofthiswork.
First,installtuned,startthetuningdaemonservice,andenabletheserviceonboot:#yum-yinstalltuned#servicetunedstart#chkconfigtunedonRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell9Listtheperformanceprofiles:#tuned-admlistAvailableprofiles:-throughput-performance-default-desktop-powersave-enterprise-storage.
.
.
Thecontentsofeachprofilecanbeviewedinthe/etc/tune-profiles/directory.
Weareconcernedaboutsettingaperformanceprofilesuchasthroughput-performance,latency-performance,orenterprise-storage.
Setaprofile:#tuned-admprofilethroughput-performanceSwitchingtoprofile'throughput-performance'.
.
.
Theselectedprofilewillapplyeverytimethetunedservicestarts.
Thetunedserviceisdescribedfurtherinmantuned.
NumadSimilartotuned,numadisadaemonwhichcanassistwithprocessandmemorymanagementonsystemswithNon-UniformMemoryAccess(NUMA)architecture.
Numadachievesthisbymonitoringsystemtopologyandresourceusage,thenattemptingtolocateprocessesforefficientNUMAlocalityandefficiency,whereaprocesshasasufficientlylargememorysizeandCPUload.
Thenumadservicealsorequirescgroups(Linuxkernelcontrolgroups)tobeenabled.
#servicecgconfigstartStartingcgconfigservice:[OK]#servicenumadstartStartingnumad:[OK]Bydefault,asofRedHatEnterpriseLinux6.
5,numadwillmanageanyprocesswithover300Mbofmemoryusageand50%ofonecoreCPUusage,andtrytouseanygivenNUMAnodeupto85%capacity.
Numadcanbemorefinelytunedwiththedirectivesdescribedinmannumad.
PleaserefertotheUnderstandingNUMAarchitecturesectionlaterinthisdocumenttoseeifyoursystemisaNUMAsystemornot.
CPUPowerStatesTheACPIspecificationdefinesvariouslevelsofprocessorpowerstatesor"C-states",withC0RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell10beingtheoperatingstate,C1beingthehaltstate,plusprocessormanufacturersimplementingvariousadditionalstatestoprovideadditionalpowersavingsandrelatedadvantagessuchaslowertemperatures.
Unfortunately,transitioningbetweenpowerstatesiscostlyintermsoflatency.
Asweareconcernedwithmakingtheresponsivenessofthesystemashighaspossible,itisdesirabletodisableallprocessor"deepsleep"states,leavingonlyoperatingandhalt.
ThismustbeaccomplishedfirstinthesystemBIOSorEFIfirmware.
AnystatessuchasC6,C3,C1Eorsimilarshouldbedisabled.
WecanensurethekernelneverrequestsaC-statebelowC1byaddingprocessor.
max_cstate=1tothekernellineintheGRUBbootloaderconfiguration.
Insomeinstances,thekernelisabletooverridethehardwaresettingandtheadditionalparameterintel_idle.
max_cstate=0mustbeaddedtosystemswithIntelprocessors.
Thesleepstateoftheprocessorcanbeconfirmedwith:#cat/sys/module/intel_idle/parameters/max_cstate0Ahighervalueindicatesthatadditionalsleepstatesmaybeentered.
Thepowertoputility'sIdleStatspagecanshowhowmuchtimeisbeingspentineachC-state.
IRQBalanceIRQBalanceisaservicewhichcanautomaticallybalanceinterruptsacrossCPUcores,basedonrealtimesystemconditions.
Itisvitalthatthecorrectversionofirqbalanceisrunningforaparticularkernel.
ForNUMAsystems,irqbalance-1.
0.
4-8.
el6_5orgreaterisrequiredforRedHatEnterpriseLinux6.
5andirqbalance-1.
0.
4-6.
el6_4orgreaterisrequiredforRedHatEnterpriseLinux6.
4.
SeetheUnderstandingNUMAarchitecturesectionlaterinthisdocumentformanuallybalancingirqbalanceforNUMAsystems.
#rpm-qirqbalanceirqbalance-0.
55-29.
el6.
x86_64ManualbalancingofinterruptsTheIRQaffinitycanalsobemanuallybalancedifdesired.
RedHatstronglyrecommendsusingirqbalancetobalanceinterrupts,asitdynamicallybalancesinterruptsdependingonsystemusageandotherfactors.
However,manuallybalancinginterruptscanbeusedtodetermineifirqbalanceisnotbalancingIRQsinaoptimummannerandthereforecausingpacketloss.
Theremaybesomeveryspecificcaseswheremanuallybalancinginterruptspermanentlycanbebeneficial.
Forthiscase,theinterruptswillbemanuallyassociatedwithaCPUusingSMPaffinity.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell11Thereare2waystodothis;withabitmaskorusingsmp_affinity_listwhichisavailablefromRedHatEnterpriseLinux6onwards.
Tomanuallybalanceinterrupts,theirqbalanceserviceneedstobestoppedandpersistentlydisabled:#chkconfigirqbalanceoff#serviceirqbalancestopStoppingirqbalance:[OK]ViewtheCPUcoreswhereadevice'sinterruptisallowedtobereceived:#grep"CPU0|eth3"/proc/interruptsCPU0CPU1CPU2CPU3CPU4CPU5110:113600000IR-PCI-MSI-edgeeth3-rx-0111:200000IR-PCI-MSI-edgeeth3-rx-1112:000000IR-PCI-MSI-edgeeth3-rx-2113:000000IR-PCI-MSI-edgeeth3-rx-3114:000000IR-PCI-MSI-edgeeth3-tx#cat/proc/irq/110/smp_affinity_list0-5OnewaytomanuallybalancetheCPUcoresiswithascript.
Thefollowingscriptisasimpleproof-of-conceptexample:#!
/bin/bash#nic_balance.
sh#usagenic_balance.
shcpu=0grep$1/proc/interrupts|awk'{print$1}'|sed's/://'|whilereadadoecho$cpu>/proc/irq/$a/smp_affinity_listecho"echo$cpu>/proc/irq/$a/smp_affinity_list"if[$cpu=$2]thencpu=0filetcpu=cpu+1doneTheabovescriptreportsthecommandsitranasfollows:#shbalance.
sheth35echo0>/proc/irq/110/smp_affinity_listecho1>/proc/irq/111/smp_affinity_listecho2>/proc/irq/112/smp_affinity_listecho3>/proc/irq/113/smp_affinity_listecho4>/proc/irq/114/smp_affinity_listecho5>/proc/irq/131/smp_affinity_listTheabovescriptisprovidedunderaCreativeCommonsZerolicense.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell12EthernetFlowControl(a.
k.
a.
PauseFrames)PauseframesareEthernet-levelflowcontrolbetweentheadapterandtheswitchport.
Theadapterwillsend"pauseframes"whentheRXorTXbuffersbecomefull.
Theswitchwillstopdataflowingforatimespanintheorderofmillisecondsorless.
Thisisusuallyenoughtimetoallowthekerneltodraintheinterfacebuffers,thuspreventingthebufferoverflowandsubsequentpacketdropsoroverruns.
Ideally,theswitchwillbuffertheincomingdataduringthepausetime.
Howeveritisimportanttorealizethatthislevelofflowcontrolisonlybetweentheswitchandtheadapter.
Ifpacketsaredropped,thehigherlayerssuchasTCP,ortheapplicationinthecaseofUDPand/ormulticast,shouldinitiaterecovery.
PauseframesandFlowControlneedtobeenabledonboththeNICandswitchportforthisfeaturetotakeeffect.
PleaserefertoyournetworkequipmentmanualorvendorforinstructiononhowtoenableFlowControlonaport.
Inthisexample,FlowControlisdisabled:#ethtool-aeth3Pauseparametersforeth3:Autonegotiate:offRX:offTX:offToenableFlowControl:#ethtool-Aeth3rxon#ethtool-Aeth3txonToconfirmFlowControlisenabled:#ethtool-aeth3Pauseparametersforeth3:Autonegotiate:offRX:onTX:onInterruptCoalescence(IC)Interruptcoalescencereferstotheamountoftrafficthatanetworkinterfacewillreceive,ortimethatpassesafterreceivingtraffic,beforeissuingahardinterrupt.
Interruptingtoosoonortoofrequentlyresultsinpoorsystemperformance,asthekernelstops(or"interrupts")arunningtasktohandletheinterruptrequestfromthehardware.
InterruptingtoolatemayresultintrafficnotbeingtakenofftheNICsoonenough.
Moretrafficmayarrive,overwritingtheprevioustrafficstillwaitingtobereceivedintothekernel,resultingintrafficloss.
MostmodernNICsanddriverssupportIC,andmanyallowthedrivertoautomaticallymoderatethenumberofinterruptsgeneratedbythehardware.
TheICsettingsusuallycompriseof2maincomponents,timeandnumberofpackets.
Timebeingthenumbermicroseconds(u-secs)thattheNICwillwaitbeforeinterruptingthekernel,andthenumberbeingthemaximumnumberofRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell13packetsallowedtobewaitinginthereceivebufferbeforeinterruptingthekernel.
ANIC'sinterruptcoalescencecanbeviewedusingethtool-cethXcommand,andtunedusingtheethtool-CethXcommand.
Adaptivemodeenablesthecardtoauto-moderatetheIC.
Inadaptivemode,thedriverwillinspecttrafficpatternsandkernelreceivepatterns,andestimatecoalescingsettingson-the-flywhichaimtopreventpacketloss.
Thisisusefulwhenmanysmallpacketsarebeingreceived.
Higherinterruptcoalescencefavorsbandwidthoverlatency.
AVOIPapplication(latency-sensitive)mayrequirelesscoalescencethanafiletransferprotocol(throughput-sensitive).
Differentbrandsandmodelsofnetworkinterfacecardshavedifferentcapabilitiesanddefaultsettings,sopleaserefertothemanufacturer'sdocumentationfortheadapteranddriver.
OnthissystemadaptiveRXisenabledbydefault:#ethtool-ceth3Coalesceparametersforeth3:AdaptiveRX:onTX:offstats-block-usecs:0sample-interval:0pkt-rate-low:400000pkt-rate-high:450000rx-usecs:16rx-frames:44rx-usecs-irq:0rx-frames-irq:0ThefollowingcommandturnsadaptiveICoff,andtellstheadaptertointerruptthekernelimmediatelyuponreceptionofanytraffic:#ethtool-Ceth3adaptive-rxoffrx-usecs0rx-frames0ArealisticsettingistoallowatleastsomepacketstobufferintheNIC,andatleastsometimetopass,beforeinterruptingthekernel.
Validrangesmaybefrom1tohundreds,dependingonsystemcapabilitiesandtrafficreceived.
TheAdapterQueueThenetdev_max_backlogisaqueuewithintheLinuxkernelwheretrafficisstoredafterreceptionfromtheNIC,butbeforeprocessingbytheprotocolstacks(IP,TCP,etc).
ThereisonebacklogqueueperCPUcore.
Agivencore'squeuecangrowautomatically,containinganumberofpacketsuptothemaximumspecifiedbythenetdev_max_backlogsetting.
Thenetif_receive_skb()kernelfunctionwillfindthecorrespondingCPUforapacket,andenqueuepacketsinthatCPU'squeue.
Ifthequeueforthatprocessorisfullandalreadyatmaximumsize,packetswillbedropped.
Totunethissetting,firstdeterminewhetherthebacklogneedsincreasing.
The/proc/net/softnet_statfilecontainsacounterinthe2rdcolumnthatisincrementedwhenthenetdevbacklogqueueoverflows.
Ifthisvalueisincrementingovertime,thennetdev_max_backlogneedstobeincreased.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell14Eachlineofthesoftnet_statfilerepresentsaCPUcorestartingfromCPU0:Line1=CPU0Line2=CPU1Line3=CPU2andsoon.
Thefollowingsystemhas12CPUcores:#wc-l/proc/net/softnet_stat12Whenapacketisunabletobeplacedintoabacklogqueue,thefollowingcodeisexecutedwhereget_cpu_varidentifiestheappropriateprocessorqueue:__get_cpu_var(netdev_rx_stat).
dropped++;Theabovecodethenincrementsthedroppedstatisticforthatqueue.
Eachlineinthesoftnet_statfilerepresentsthenetif_rx_statsstructureforthatCPU.
Thatdatastructurecontains:structnetif_rx_stats{unsignedtotal;unsigneddropped;unsignedtime_squeeze;unsignedcpu_collision;unsignedreceived_rps;};The1stcolumnisthenumberofframesreceivedbytheinterrupthandler.
The2ndcolumnisthenumberofframesdroppedduetonetdev_max_backlogbeingexceeded.
The3rdcolumnisthenumberoftimesksoftirqdranoutofnetdev_budgetorCPUtimewhentherewasstillworktobedone.
TheothercolumnsmayvarydependingontheversionRedHatEnterpriseLinux.
Usingthefollowingexample,thefollowingcountersforCPU0andCPU1arethefirsttwolines:#catsoftnet_stat0073d76b00000000000049ae00000000000000000000000000000000000000000000000000000000000000d20000000000000000000000000000000000000000000000000000000000000000000000000000015c0000000000000000000000000000000000000000000000000000000000000000000000000000002a000000000000000000000000000000000000000000000000000000000000000000000000.
.
.
Fortheaboveexample,netdev_max_backlogdoesnotneedtobechangedasthenumberofdropshasremainedat0:RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell15ForCPU0Totaldroppedno_budgetlock_contention0073d76b00000000000049ae00000000ForCPU1Totaldroppedno_budgetlock_contention000000d2000000000000000000000000Thestatisticsineachcolumnareprovidedinhexadecimal.
Thedefaultnetdev_max_backlogvalueis1000.
However,thismaynotbeenoughformultipleinterfacesoperatingat1Gbps,orevenasingleinterfaceat10Gbps.
Trydoublingthisvalueandobservingthe/proc/net/softnet_statfile.
Ifdoublingthevaluereducestherateatwhichdropsincrement,doubleagainandtestagain.
Repeatthisprocessuntiltheoptimumsizeisestablishedanddropsdonotincrement.
Thebacklogcanbechangedwithwithfollowingcommand,whereXisthedesiredvaluetobeset:#sysctl-wnet.
core.
netdev_max_backlog=XAdapterRXandTXBufferTuningAdapterbufferdefaultsarecommonlysettoasmallersizethanthemaximum.
Often,increasingthereceivebuffersizeisaloneenoughtopreventpacketdrops,asitcanallowthekernelslightlymoretimetodrainthebuffer.
Asaresult,thiscanpreventpossiblepacketloss.
Thefollowinginterfacehasthespacefor8kilobytesofbufferbutisonlyusing1kilobyte:#ethtool-geth3Ringparametersforeth3:Pre-setmaximums:RX:8192RXMini:0RXJumbo:0TX:8192Currenthardwaresettings:RX:1024RXMini:0RXJumbo:0TX:512IncreaseboththeRXandTXbufferstothemaximum:#ethtool-Geth3rx8192tx8192Thischangecanbemadewhilsttheinterfaceisonline,thoughapauseintrafficwillbeseen.
Thesesettingscanbepersistedbywritingascriptat/sbin/ifup-local.
Thisisdocumentedontheknowledgebaseat:HowdoIrunascriptorprogramimmediatelyaftermynetworkinterfacegoesuphttps://access.
redhat.
com/knowledge/solutions/8694RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell16AdapterTransmitQueueLengthThetransmitqueuelengthvaluedeterminesthenumberofpacketsthatcanbequeuedbeforebeingtransmitted.
Thedefaultvalueof1000isusuallyadequatefortoday'shighspeed10Gbpsoreven40Gbpsnetworks.
However,ifthenumbertransmiterrorsareincreasingontheadapter,considerdoublingit.
Useip-slinktoseeifthereareanydropsontheTXqueueforanadapter.
#iplink2:em1:mtu1500qdiscpfifo_fastmasterbr0stateUPmodeDEFAULTgroupdefaultqlen1000link/etherf4:ab:cd:1e:4c:c7brdff:ff:ff:ff:ff:ffRX:bytespacketserrorsdroppedoverrunmcast71017768832606195240001098117TX:bytespacketserrorsdroppedcarriercollsns10373833340369601900000Thequeuelengthcanbemodifiedwiththeiplinkcommand:#iplinksetdevem1txqueuelen2000#iplink2:em1:mtu1500qdiscpfifo_fastmasterbr0stateUPmodeDEFAULTgroupdefaultqlen2000link/etherf4:ab:cd:1e:4c:c7brdff:ff:ff:ff:ff:ffTopersistthisvalueacrossreboots,audevrulecanbewrittentoapplythequeuelengthtotheinterfaceasitiscreated,orthenetworkscriptscanbeextendedwithascriptat/sbin/ifup-localasdescribedontheknowledgebaseat:HowdoIrunascriptorprogramimmediatelyaftermynetworkinterfacegoesuphttps://access.
redhat.
com/knowledge/solutions/8694ModuleparametersEachnetworkinterfacedriverusuallycomesasloadablekernelmodule.
Modulescanbeloadedandunloadedusingthemodprobecommand.
ThesemodulesusuallycontainparametersthatcanbeusedtofurthertunethedevicedriverandNIC.
Themodinfocommandcanbeusedtoviewtheseparameters.
Documentingspecificdriverparametersisbeyondthescopeofthisdocument.
Pleaserefertothehardwaremanual,driverdocumentation,orhardwarevendorforanexplanationoftheseparameters.
TheLinuxkernelexportsthecurrentsettingsformoduleparametersviathesysfspath/sys/module//parametersForexample,giventhedriverparameters:#modinfomlx4_enfilename:/lib/modules/2.
6.
32-246.
el6.
x86_64/kernel/drivers/net/mlx4/mlx4_en.
koversion:2.
0(Dec2011)license:DualBSD/GPLRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell17description:MellanoxConnectXHCAEthernetdriverauthor:LiranLiss,YevgenyPetrilindepends:mlx4_corevermagic:2.
6.
32-246.
el6.
x86_64SMPmod_unloadmodversionsparm:inline_thold:tresholdforusinginlinedata(int)parm:tcp_rss:EnableRSSforincommingTCPtrafficordisabled(0)(uint)parm:udp_rss:EnableRSSforincommingUDPtrafficordisabled(0)(uint)parm:pfctx:PrioritybasedFlowControlpolicyonTX[7:0].
Perprioritybitmask(uint)parm:pfcrx:PrioritybasedFlowControlpolicyonRX[7:0].
Perprioritybitmask(uint)Thecurrentvaluesofeachdriverparametercanbecheckedinsysfs.
Forexample,tocheckthecurrentsettingfortheudp_rssparameter:#ls/sys/module/mlx4_en/parametersinline_tholdnum_lropfcrxpfctxrss_maskrss_xortcp_rssudp_rss#cat/sys/module/mlx4_en/parameters/udp_rss1Somedriversallowthesevaluestobemodifiedwhilstloaded,butmanyvaluesrequirethedrivermoduletobeunloadedandreloadedtoapplyamoduleoption.
Loadingandunloadingofadrivermoduleisdonewiththemodprobecommand:#modprobe-r#modprobeFornon-persistentuse,amoduleparametercanalsobeenabledasthedriverisloaded:#modprobe-r#modprobe=Intheeventamodulecannotbeunloaded,arebootwillberequired.
Forexample,touseRPSinsteadofRSS,disableRSSasfollows:#echo'optionsmlx4_enudp_rss=0'>>/etc/modprobe.
d/mlx4_en.
confUnloadandreloadthedriver:#modprobe-rmlx4_en#modprobemlx4_enThisparametercouldalsobeloadedjustthistime:#modprobe-rmlx4_en#modprobemlx4_enudp_rss=0Confirmwhetherthatparameterchangetookeffect:#cat/sys/module/mlx4_en/parameters/udp_rss0Insomecases,driverparameterscanalsobecontrolledviatheethtoolcommand.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell18Forexample,theIntelSourceforgeigbdriverhastheinterruptmoderationparameterInterruptThrottleRate.
TheupstreamLinuxkerneldriverandtheRedHatEnterpriseLinuxdriverdonotexposethisparameterviaamoduleoption.
Instead,thesamefunctionalitycaninsteadbetunedviaethtool:#ethtool-CethXrx-usecs1000AdapterOffloadingInordertoreduceCPUloadfromthesystem,modernnetworkadaptershaveoffloadingfeatureswhichmovesomenetworkprocessingloadontothenetworkinterfacecard.
Forexample,thekernelcansubmitlarge(upto64k)TCPsegmentstotheNIC,whichtheNICwillthenbreakdownintoMTU-sizedsegments.
ThisparticularfeatureiscalledTCPSegmentationOffload(TSO).
Offloadingfeaturesareoftenenabledbydefault.
Itisbeyondthescopeofthisdocumenttocovereveryoffloadingfeaturein-depth,however,turningthesefeaturesoffisagoodtroubleshootingstepwhenasystemissufferingfrompoornetworkperformanceandre-test.
Ifthereisanperformanceimprovement,ideallynarrowthechangetoaspecificoffloadingparameter,thenreportthistoRedHatGlobalSupportServices.
Itisdesirabletohaveoffloadingenabledwhereverpossible.
Offloadingsettingsaremanagedbyethtool-KethX.
Commonsettingsinclude:GRO:GenericReceiveOffloadLRO:LargeReceiveOffloadTSO:TCPSegmentationOffloadRXcheck-summing=ProcessingofreceivedataintegrityTXcheck-summing=Processingoftransmitdataintegrity(requiredforTSO)#ethtool-keth0Featuresforeth0:rx-checksumming:ontx-checksumming:onscatter-gather:ontcp-segmentation-offload:onudp-fragmentation-offload:offgeneric-segmentation-offload:ongeneric-receive-offload:onlarge-receive-offload:onrx-vlan-offload:ontx-vlan-offload:onntuple-filters:offreceive-hashing:onJumboFramesThedefault802.
3Ethernetframesizeis1518bytes,or1522byteswithaVLANtag.
TheEthernetheaderconsumes18bytesofthis(or22byteswithVLANtag),leavinganeffectivemaximumpayloadof1500bytes.
JumboFramesareanunofficialextensiontoEthernetwhichnetworkequipmentvendorshavemadeade-factostandard,increasingthepayloadfrom1500to9000bytes.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell19WithregularEthernetframesthereisanoverheadof18bytesforevery1500bytesofdataplacedonthewire,or1.
2%overhead.
WithJumboFramesthereisanoverheadof18bytesforevery9000bytesofdataplacedonthewire,or0.
2%overhead.
TheabovecalculationsassumenoVLANtag,howeversuchatagwilladd4bytestotheoverhead,makingefficiencygainsevenmoredesirable.
Whentransferringlargeamountsofcontiguousdata,suchassendinglargefilesbetweentwosystems,theaboveefficiencycanbegainedbyusingJumboFrames.
Whentransferringsmallamountsofdata,suchaswebrequestswhicharetypicallybelow1500bytes,thereislikelynogaintobeseenfromusingalargerframesize,asdatapassingoverthenetworkwillbecontainedwithinsmallframesanyway.
ForJumboFramestobeconfigured,allinterfacesandnetworkequipmentinanetworksegment(i.
e.
broadcastdomain)mustsupportJumboFramesandhavetheincreasedframesizeenabled.
Refertoyournetworkswitchvendorforinstructionsonincreasingtheframesize.
OnRedHatEnterpriseLinux,increasetheframesizewithMTU=9000inthe/etc/sysconfig/network-scripts/ifcfg-filefortheinterface.
TheMTUcanbecheckedwiththeiplinkcommand:#iplink1:lo:mtu16436qdiscnoqueuestateUNKNOWNlink/loopback00:00:00:00:00:00brd00:00:00:00:00:002:eth0:mtu9000qdiscpfifo_faststateUPqlen1000link/ether52:54:00:36:b2:d1brdff:ff:ff:ff:ff:ffTCPTimestampsTCPTimestampsareanextensiontotheTCPprotocol,definedinRFC1323-TCPExtensionsforHighPerformance-http://tools.
ietf.
org/html/rfc1323TCPTimestampsprovideamonotonicallyincreasingcounter(onLinux,thecounterismillisecondssincesystemboot)whichcanbeusedtobetterestimatetheround-trip-timeofaTCPconversation,resultinginmoreaccurateTCPWindowandbuffercalculations.
Mostimportantly,TCPTimestampsalsoprovideProtectionAgainstWrappedSequenceNumbersastheTCPheaderdefinesaSequenceNumberasa32-bitfield.
Givenasufficientlyfastlink,thisTCPSequenceNumbernumbercanwrap.
Thisresultsinthereceiverbelievingthatthesegmentwiththewrappednumberactuallyarrivedearlierthanitsprecedingsegment,andincorrectlydiscardingthesegment.
Ona1gigabitpersecondlink,TCPSequenceNumberscanwrapin17seconds.
Ona10gigabitpersecondlink,thisisreducedtoaslittleas1.
7seconds.
Onfastlinks,enablingTCPTimestampsshouldbeconsideredmandatory.
TCPTimestampsprovideanalternative,non-wrapping,methodtodeterminetheageandorderofasegment,preventingwrappedTCPSequenceNumbersfrombeingaproblem.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell20EnsureTCPTimestampsareenabled:#sysctlnet.
ipv4.
tcp_timestampsnet.
ipv4.
tcp_timestamps=1Iftheabovecommandindicatesthattcp_timestamps=0,enableTCPTimestamps:#sysctl-wnet.
ipv4.
tcp_timestamps=1TCPSACKTCPSelectiveAcknowledgments(SACK)isTCPextensiondefinedinRFC2018-TCPSelectiveAcknowledgmentOptions-http://tools.
ietf.
org/html/rfc2018AbasicTCPAcknowledgment(ACK)onlyallowsthereceivertoadvisethesenderwhichbyteshavebeenreceived.
Whenpacketlossoccurs,thisrequiresthesendertoretransmitallbytesfromthepointofloss,whichcanbeinefficient.
SACKallowsasendertospecifywhichbyteshavebeenlostandwhichbyteshavebeenreceived,sothesendercanretransmitonlythelostbytes.
ThereissomeresearchavailableinthenetworkingcommunitywhichshowsenablingSACKonhigh-bandwidthlinkscancauseunnecessaryCPUcyclestobespentcalculatingSACKvalues,reducingoverallefficiencyofTCPconnections.
Thisresearchimpliestheselinksaresofast,theoverheadofretransmittingsmallamountsofdataislessthantheoverheadofcalculatingthedatatoprovideaspartofaSelectiveAcknowledgment.
Unlessthereishighlatencyorhighpacketloss,itismostlikelybettertokeepSACKturnedoffoverahighperformancenetwork.
SACKcanbeturnedoffwithkerneltunables:#sysctl-wnet.
ipv4.
tcp_sack=0TCPWindowScalingTCPWindowScalingisanextensiontotheTCPprotocol,definedinRFC1323-TCPExtensionsforHighPerformance-http://tools.
ietf.
org/html/rfc1323IntheoriginalTCPdefinition,theTCPsegmentheaderonlycontainsan8-bitvaluefortheTCPWindowSize,whichisinsufficientforthelinkspeedsandmemorycapabilitiesofmoderncomputing.
TheTCPWindowScalingextensionwasintroducedtoallowalargerTCPReceiveWindow.
ThisisachievedbyaddingascalingvaluetotheTCPoptionswhichareaddedaftertheTCPheader.
TherealTCPReceiveWindowisbit-shiftedleftbythevalueoftheScalingFactorvalue,uptoamaximumsizeof1,073,725,440bytes,orclosetoonegigabyte.
TCPWindowScalingisnegotiatedduringthethree-wayTCPhandshake(SYN,SYN+ACK,ACK)whichopenseveryTCPconversation.
BothsenderandreceivermustsupportTCPWindowScalingfortheWindowScalingoptiontowork.
IfeitherorbothparticipantsdonotadvertiseWindowScalingabilityintheirhandshake,theconversationfallsbacktousingtheoriginal8-bitTCPWindowSize.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell21TCPWindowScalingisenabledbydefaultonRedHatEnterpriseLinux.
ThestatusofWindowScalingcanbeconfirmedwiththecommand:#sysctlnet.
ipv4.
tcp_window_scalingnet.
ipv4.
tcp_window_scaling=1TCPWindowScalingnegotiationcanbeviewedbytakingapacketcaptureoftheTCPhandshakewhichopensaconversation.
Inthepacketcapture,checktheTCPOptionsfieldofthethreehandshakepackets.
Ifeithersystem'shandshakepacketsdonotcontaintheTCPWindowScalingoption,itmaybenecessarytoenableTCPWindowScalingonthatsystem.
TCPBufferTuningOncenetworktrafficisprocessedfromthenetworkadapter,receptiondirectlyintotheapplicationisattempted.
Ifthatisnotpossible,dataisqueuedontheapplication'ssocketbuffer.
Thereare3queuestructuresinthesocket:sk_rmem_alloc={counter=121948},sk_wmem_alloc={counter=553},sk_omem_alloc={counter=0sk_rmem_allocisthereceivequeuek_wmem_allocisthetransmitqueuesk_omem_allocistheout-of-orderqueue,skbswhicharenotwithinthecurrentTCPWindowareplacedinthisqueueThereisalsothesk_rcvbufvariablewhichisthelimit,measuredinbytes,thatthesocketcanreceive.
Inthiscase:sk_rcvbuf=125336Fromtheaboveoutputitcanbecalculatedthatthereceivequeueisalmostfull.
Whensk_rmem_alloc>sk_rcvbuftheTCPstackwillcallaroutinewhich"collapses"thereceivequeue.
Thisisakindofhouse-keepingwherethekernelwilltrythefreespaceinthereceivequeuebyreducingoverhead.
However,thisoperationcomesataCPUcost.
Ifcollapsingfailstofreesufficientspaceforadditionaltraffic,thendatais"pruned",meaningthedataisdroppedfrommemoryandthepacketislost.
Therefore,itbesttotunearoundthisconditionandavoidthebuffercollapsingandpruningaltogether.
Thefirststepistoidentifywhetherbuffercollapsingandpruningisoccurring.
Runthefollowingcommandtodeterminewhetherthisisoccurringornot:RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell22#netstat-sn|egrep"prune|collap";sleep30;netstat-sn|egrep"prune|collap"17671packetsprunedfromreceivequeuebecauseofsocketbufferoverrun18671packetsprunedfromreceivequeuebecauseofsocketbufferoverrunIf"pruning"hasincreasedduringthisinterval,thentuningisrequired.
ThefirststepistoincreasethenetworkandTCPreceivebuffersettings.
Thisisagoodtimetocheckwhethertheapplicationcallssetsockopt(SO_RCVBUF).
Iftheapplicationdoescallthisfunction,thiswilloverridethedefaultsettingsandturnoffthesocket'sabilitytoauto-tuneitssize.
Thesizeofthereceivebufferwillbethesizespecifiedbytheapplicationandnogreater.
Considerremovingthesetsockopt(SO_RCVBUF)functioncallfromtheapplicationandallowingthebuffersizetoauto-tuneinstead.
Tuningtcp_rmemThesocketmemorytunablehasthreevalues,describingtheminimum,default,andmaximumvaluesinbytes.
ThedefaultmaximumonmostRedHatEnterpriseLinuxreleasesis4MiB.
Toviewthesesettings,thenincreasethembyafactorof4:#sysctlnet.
ipv4.
tcp_rmem4096873804194304#sysctl-wnet.
ipv4.
tcp_rmem="1638434952016777216"#sysctlnet.
core.
rmem_max4194304#sysctl-wnet.
core.
rmem_max=16777216Iftheapplicationcannotbechangedtoremovesetsockopt(SO_RCVBUF),thenincreasethemaximumsocketreceivebuffersizewhichmaybesetbyusingtheSO_RCVBUFsocketoption.
Arestartofanapplicationisonlyrequiredwhenthemiddlevalueoftcp_rmemischanged,asthesk_rcvbufvalueinthesocketisinitializedtothiswhenthesocketiscreated.
Changingthe3rdandmaximumvalueoftcp_rmemdoesnotrequireanapplicationrestartasthesevaluesaredynamicallyassignedbyauto-tuning.
TCPListenBacklogWhenaTCPsocketisopenedbyaserverinLISTENstate,thatsockethasamaximumnumberofunacceptedclientconnectionsitcanhandle.
Ifanapplicationisslowatprocessingclientconnections,ortheservergetsmanynewconnectionsrapidly(commonlyknownasaSYNflood),thenewconnectionsmaybelostorspeciallycraftedreplypacketsknownas"SYNcookies"maybesent.
Ifthesystem'snormalworkloadissuchthatSYNcookiesarebeingenteredintothesystemlogregularly,thesystemandapplicationshouldbetunedtoavoidthem.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell23Themaximumbackloganapplicationcanrequestisdictatedbythenet.
core.
somaxconnkerneltunable.
Anapplicationcanalwaysrequestalargerbacklog,butitwillonlygetabacklogaslargeasthismaximum.
Thisparametercanbecheckedandchangedasfollows:#sysctlnet.
core.
somaxconnnet.
core.
somaxconn=128#sysctl-wnet.
core.
somaxconn=2048net.
core.
somaxconn=2048#sysctlnet.
core.
somaxconnnet.
core.
somaxconn=2048Afterchangingthemaximumallowedbacklog,anapplicationmustberestartedforthechangetotakeeffect.
Additionally,afterchangingthemaximumallowedbacklog,theapplicationmustbemodifiedtoactuallysetalargerbacklogonitslisteningsocket.
ThefollowingisanexampleintheClanguageofthechangerequiredtoincreasethesocketbacklog:-rc=listen(sockfd,128);+rc=listen(sockfd,2048);if(rc/device/numa_nodeForexample:#cat/sys/class/net/eth3/device/numa_node1ThiscommandwilldisplaytheNUMAnodenumber,interruptsforthedeviceshouldbedirectedtotheNUMAnodethatthePCIedevicebelongsto.
Thiscommandmaydisplay-1whichindicatesthehardwareplatformisnotactuallynon-uniformandthekernelisjustemulatingor"faking"NUMA,oradeviceisonabuswhichdoesnothaveanyNUMAlocality,suchasaPCIbridge.
IdentifyingInterruptstoBalanceCheckthenumberRXandTXqueuesontheadapter:#egrep"CPU0|eth3"/proc/interruptsCPU0CPU1CPU2CPU3CPU4CPU5110:000000IR-PCI-MSI-edgeeth3-rx-0111:000000IR-PCI-MSI-edgeeth3-rx-1112:000000IR-PCI-MSI-edgeeth3-rx-2113:200000IR-PCI-MSI-edgeeth3-rx-3114:000000IR-PCI-MSI-edgeeth3-txRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell26QueuesareallocatedwhentheNICdrivermoduleisloaded.
Insomecases,thenumberofqueuescanbedynamicallyallocatedonlineusingtheethtool-Lcommand.
Theabovedevicehas4RXqueuesandoneTXqueue.
Statisticsaredifferentforeverynetworkdriver,butifanetworkdriverprovidesseparatequeuestatistics,thesecanbeseenwiththecommandethtool-SethXwhereethXistheinterfaceinquestion:#ethtool-Seth3rx0_packets:0rx0_bytes:0rx1_packets:0rx1_bytes:0rx2_packets:0rx2_bytes:0rx3_packets:2rx3_bytes:120GLOSSARYRSS:ReceiveSideScalingRSSissupportedbymanycommonnetworkinterfacecards.
Onreceptionofdata,aNICcansenddatatomultiplequeues.
EachqueuecanbeservicedbyadifferentCPU,allowingforefficientdataretrieval.
TheRSSactsasanAPIbetweenthedriverandthecardfirmwaretodeterminehowpacketsaredistributedacrossCPUcores,theideabeingthatmultiplequeuesdirectingtraffictodifferentCPUsallowsforfasterthroughputandlowerlatency.
RSScontrolswhichreceivequeuegetsanygivenpacket,whetherornotthecardlistenstospecificunicastEthernetaddresses,whichmulticastaddressesitlistensto,whichqueuepairsorEthernetqueuesgetcopiesofmulticastpackets,etc.
RSSConsiderationsDoesthedriverallowthenumberofqueuestobeconfiguredSomedriverswillautomaticallygeneratethenumberofqueuesduringbootdependingonhardwareresources.
Forothersit'sconfigurableviaethtool-L.
HowmanycoresdoesthesystemhaveRSSshouldbeconfiguredsoeachqueuegoestoadifferentCPUcore.
RPS:ReceivePacketSteeringReceivePacketSteeringisakernel-levelsoftwareimplementationofRSS.
Itresidesthehigherlayersofthenetworkstackabovethedriver.
RSSorRPSshouldbemutuallyexclusive.
RPSisdisabledbydefault.
RPSusesa2-tupleor4-tuplehashsavedintherxhashfieldofthepacketdefinition,whichisusedtodeterminetheCPUqueuewhichshouldprocessagivenpacket.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell27RFS:ReceiveFlowSteeringReceiveFlowSteeringtakesapplicationlocalityintoconsiderationwhensteeringpackets.
ThisavoidscachemisseswhentrafficarrivesonadifferentCPUcoretowheretheapplicationisrunning.
ReceiveSteeringReferenceFormoredetailsontheabovesteeringmechanisms,pleasereferto:https://www.
kernel.
org/doc/Documentation/networking/scaling.
txtNAPI:NewAPIThesoftwaremethodwhereadeviceispolledfornewnetworktraffic,insteadofthedeviceconstantlyraisinghardwareinterrupts.
skb,sk_buff:SocketbufferTherearedatabufferswhichareusedtotransportnetworkheadersandpayloaddatathroughtheLinuxkernel.
MTU:MaximumTransmissionUnitMTUdefinesthelargestcontiguousblockofdatawhichcanbesentacrossatransmissionmedium.
Ablockofdataistransmittedasasingleunitcommonlyreferredtoasaframeorpacket.
Eachdataunitwhichwillhaveaheadersizewhichdoesnotchange,makingitmoreefficienttosentasmuchdataaspossibleinagivendataunit.
Forexample,anEthernetheaderwithoutaVLANtagis18bytes.
Itismoreefficienttosend1500bytesofdataplusan18-byteheaderandlessefficienttosend1byteofdataplusan18-byteheader.
NUMA:NonUniformMemoryAccessAhardwarelayoutwhereprocessors,memory,anddevicesdonotallshareacommonbus.
Instead,somecomponentssuchasCPUsandmemoryaremorelocalormoredistantincomparisontoeachother.
NICTuningSummaryThefollowingisasummaryofpointswhichhavebeencoveredbythisdocumentindetail:SoftIRQmisses(netdevbudget)"tuned"tuningdaemon"numad"NUMAdaemonCPUpowerstatesInterruptbalancingRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell28PauseframesInterruptCoalescenceAdapterqueue(netdevbacklog)AdapterRXandTXbuffersAdapterTXqueueModuleparametersAdapteroffloadingJumboFramesTCPandUDPprotocoltuningNUMAlocalityRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell29

pacificrack:VPS降价,SSD价格下降

之前几个月由于CHIA挖矿导致全球固态硬盘的价格疯涨,如今硬盘挖矿基本上已死,硬盘的价格基本上恢复到常规价位,所以,pacificrack决定对全系Cloud server进行价格调整,降幅较大,“如果您是老用户,请通过续费管理或升级套餐,获取同步到最新的定价”。官方网站:https://pacificrack.com支持PayPal、支付宝等方式付款VPS特征:基于KVM虚拟,纯SSD raid...

knownhost西雅图/亚特兰大/阿姆斯特丹$5/月,2个IP1G内存/1核/20gSSD/1T流量

美国知名管理型主机公司,2006年运作至今,虚拟主机、VPS、云服务器、独立服务器等业务全部采用“managed”,也就是人工参与度高,很多事情都可以人工帮你处理,不过一直以来价格也贵。也不知道knownhost什么时候开始运作无管理型业务的,估计是为了扩展市场吧,反正是出来较长时间了。闲来无事,那就给大家介绍下“unmanaged VPS”,也就是无管理型VPS,低至5美元/月,基于KVM虚拟,...

搬瓦工香港 PCCW 机房已免费迁移升级至香港 CN2 GIA 机房

搬瓦工最新优惠码优惠码:BWH3HYATVBJW,节约6.58%,全场通用!搬瓦工关闭香港 PCCW 机房通知下面提炼一下邮件的关键信息,原文在最后面。香港 CN2 GIA 机房自从 2020 年上线以来,网络性能大幅提升,所有新订单都默认部署在香港 CN2 GIA 机房;目前可以免费迁移到香港 CN2 GIA 机房,在 KiwiVM 控制面板选择 HKHK_8 机房进行迁移即可,迁移会改变 IP...

红帽子企业为你推荐
域名注册com如何申请域名后缀是.com的官方网站?网站空间租赁网站空间必须通过租用得到吗?cm域名注册什么是CM域名?.cm .cm域名免费国内空间跪求国内最好的免费空间!便宜的虚拟主机哪里有便宜的国内虚拟主机?个人虚拟主机个人商城要选多大的虚拟主机?海外域名怎么挑选合适的国外域名?网站空间租用公司网站租用什么样的网站空间合适美国网站空间美国,韩国,香港网站空间北京网站空间求永久免费的网站服务器!
政务和公益机构域名注册管理中心 国外vps主机 日本软银 香港加速器 老鹰主机 香港主机 kddi 域名优惠码 gateone ssh帐号 华为云主机 远程登陆工具 mysql主机 java虚拟主机 gtt 网站在线扫描 raid10 申请网站 海外空间 网购分享 更多