auto红帽子企业

红帽子企业  时间:2021-01-19  阅读:()
RedHatEnterpriseLinuxNetworkPerformanceTuningGuideAuthors:JamieBainbridgeandJonMaxwellReviewer:NoahDavidsEditors:DayleParkerandChrisNegus03/25/2015Tuninganetworkinterfacecard(NIC)foroptimumthroughputandlatencyisacomplexprocesswithmanyfactorstoconsider.
Thesefactorsincludecapabilitiesofthenetworkinterface,driverfeaturesandoptions,thesystemhardwarethatRedHatEnterpriseLinuxisinstalledon,CPU-to-memoryarchitecture,amountofCPUcores,theversionoftheRedHatEnterpriseLinuxkernelwhichimpliesthedriverversion,nottomentiontheworkloadthenetworkinterfacehastohandle,andwhichfactors(speedorlatency)aremostimportanttothatworkload.
Thereisnogenericconfigurationthatcanbebroadlyappliedtoeverysystem,astheabovefactorsarealwaysdifferent.
Theaimofthisdocumentisnottoprovidespecifictuninginformation,buttointroducethereadertotheprocessofpacketreceptionwithintheLinuxkernel,thentodemonstrateavailabletuningmethodswhichcanbeappliedtoagivensystem.
PACKETRECEPTIONINTHELINUXKERNELTheNICringbufferReceiveringbuffersaresharedbetweenthedevicedriverandNIC.
Thecardassignsatransmit(TX)andreceive(RX)ringbuffer.
Asthenameimplies,theringbufferisacircularbufferwhereanoverflowsimplyoverwritesexistingdata.
ItshouldbenotedthattherearetwowaystomovedatafromtheNICtothekernel,hardwareinterruptsandsoftwareinterrupts,alsocalledSoftIRQs.
TheRXringbufferisusedtostoreincomingpacketsuntiltheycanbeprocessedbythedevicedriver.
ThedevicedriverdrainstheRXring,typicallyviaSoftIRQs,whichputstheincomingpacketsintoakerneldatastructurecalledansk_buffor"skb"tobeginitsjourneythroughthekernelanduptotheapplicationwhichownstherelevantsocket.
TheTXringbufferisusedtoholdoutgoingpacketswhicharedestinedforthewire.
Theseringbuffersresideatthebottomofthestackandareacrucialpointatwhichpacketdropcanoccur,whichinturnwilladverselyaffectnetworkperformance.
InterruptsandInterruptHandlersInterruptsfromthehardwareareknownas"top-half"interrupts.
WhenaNICreceivesincomingdata,itcopiesthedataintokernelbuffersusingDMA.
TheNICnotifiesthekernelofthisdatabyRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell1raisingahardinterrupt.
Theseinterruptsareprocessedbyinterrupthandlerswhichdominimalwork,astheyhavealreadyinterruptedanothertaskandcannotbeinterruptedthemselves.
HardinterruptscanbeexpensiveintermsofCPUusage,especiallywhenholdingkernellocks.
Thehardinterrupthandlerthenleavesthemajorityofpacketreceptiontoasoftwareinterrupt,orSoftIRQ,processwhichcanbescheduledmorefairly.
Hardinterruptscanbeseenin/proc/interruptswhereeachqueuehasaninterruptvectorinthe1stcolumnassignedtoit.
TheseareinitializedwhenthesystembootsorwhentheNICdevicedrivermoduleisloaded.
EachRXandTXqueueisassignedauniquevector,whichinformstheinterrupthandlerastowhichNIC/queuetheinterruptiscomingfrom.
Thecolumnsrepresentthenumberofincominginterruptsasacountervalue:#egrep"CPU0|eth2"/proc/interruptsCPU0CPU1CPU2CPU3CPU4CPU5105:14160600000IR-PCI-MSI-edgeeth2-rx-0106:01410910000IR-PCI-MSI-edgeeth2-rx-1107:20163785000IR-PCI-MSI-edgeeth2-rx-2108:30019437000IR-PCI-MSI-edgeeth2-rx-3109:000000IR-PCI-MSI-edgeeth2-txSoftIRQsAlsoknownas"bottom-half"interrupts,softwareinterruptrequests(SoftIRQs)arekernelroutineswhicharescheduledtorunatatimewhenothertaskswillnotbeinterrupted.
TheSoftIRQ'spurposeistodrainthenetworkadapterreceiveringbuffers.
Theseroutinesrunintheformofksoftirqd/cpu-numberprocessesandcalldriver-specificcodefunctions.
Theycanbeseeninprocessmonitoringtoolssuchaspsandtop.
Thefollowingcallstack,readfromthebottomup,isanexampleofaSoftIRQpollingaMellanoxcard.
Thefunctionsmarked[mlx4_en]aretheMellanoxpollingroutinesinthemlx4_en.
kodriverkernelmodule,calledbythekernel'sgenericpollingroutinessuchasnet_rx_action.
Aftermovingfromthedrivertothekernel,thetrafficbeingreceivedwillthenmoveuptothesocket,readyfortheapplicationtoconsume:mlx4_en_complete_rx_desc[mlx4_en]mlx4_en_process_rx_cq[mlx4_en]mlx4_en_poll_rx_cq[mlx4_en]net_rx_action__do_softirqrun_ksoftirqdsmpboot_thread_fnkthreadkernel_thread_starterkernel_thread_starter1lockheldbyksoftirqdRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell2SoftIRQscanbemonitoredasfollows.
EachcolumnrepresentsaCPU:#watch-n1grepRX/proc/softirqs#watch-n1grepTX/proc/softirqsNAPIPollingNAPI,orNewAPI,waswrittentomakeprocessingpacketsofincomingcardsmoreefficient.
Hardinterruptsareexpensivebecausetheycannotbeinterrupted.
Evenwithinterruptcoalescence(describedlaterinmoredetail),theinterrupthandlerwillmonopolizeaCPUcorecompletely.
ThedesignofNAPIallowsthedrivertogointoapollingmodeinsteadofbeinghard-interruptedforeveryrequiredpacketreceive.
Undernormaloperation,aninitialhardinterruptorIRQisraised,followedbyaSoftIRQhandlerwhichpollsthecardusingNAPIroutines.
ThepollingroutinehasabudgetwhichdeterminestheCPUtimethecodeisallowed.
ThisisrequiredtopreventSoftIRQsfrommonopolizingtheCPU.
Oncompletion,thekernelwillexitthepollingroutineandre-arm,thentheentireprocedurewillrepeatitself.
Figure1:SoftIRQmechanismusingNAPIpolltoreceivedataNetworkProtocolStacksOncetraffichasbeenreceivedfromtheNICintothekernel,itisthenprocessedbyprotocolhandlerssuchasEthernet,ICMP,IPv4,IPv6,TCP,UDP,andSCTP.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell3www.
redhat.
comCopyright2015RedHat,Inc.
"RedHat,"RedHatLinux,theRedHat"Shadowman"logo,andtheproductslistedaretrademarksofRedHat,Inc.
,registeredintheU.
S.
andothercountries.
LinuxistheregisteredtrademarkofLinusTorvaldsintheU.
S.
andothercountries.
Finally,thedataisdeliveredtoasocketbufferwhereanapplicationcanrunareceivefunction,movingthedatafromkernelspacetouserspaceandendingthekernel'sinvolvementinthereceiveprocess.
PacketegressintheLinuxkernelAnotherimportantaspectoftheLinuxkernelisnetworkpacketegress.
Althoughsimplerthantheingresslogic,theegressisstillworthacknowledging.
Theprocessworkswhenskbsarepasseddownfromtheprotocollayersthroughtothecorekernelnetworkroutines.
Eachskbcontainsadevfieldwhichcontainstheaddressofthenet_devicewhichitwilltransmittedthrough:intdev_queue_xmit(structsk_buff*skb){structnet_device*dev=skb->dev;>/etc/sysctl.
confThevaluesspecifiedintheconfigurationfilesareappliedatboot,andcanbere-appliedanytimeafterwardswiththesysctl-pcommand.
Thisdocumentwillshowtheruntimeconfigurationchangesforkerneltunables.
Persistingdesirablechangesacrossrebootsisanexerciseforthereader,accomplishedbyfollowingtheaboveexample.
IdentifyingthebottleneckPacketdropsandoverrunstypicallyoccurwhentheRXbufferontheNICcardcannotbedrainedfastenoughbythekernel.
Whentherateatwhichdataiscomingoffthenetworkexceedsthatrateatwhichthekernelisdrainingpackets,theNICthendiscardsincomingpacketsoncetheNICbufferisfullandincrementsadiscardcounter.
Thecorrespondingcountercanbeseeninethtoolstatistics.
ThemaincriteriahereareinterruptsandSoftIRQs,whichrespondtohardwareinterruptsandreceivetraffic,thenpollthecardfortrafficforthedurationspecifiedbynet.
core.
netdev_budget.
Thecorrectmethodtoobservepacketlossatahardwarelevelisethtool.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell6Theexactcountervariesfromdrivertodriver;pleaseconsultthedrivervendorordriverdocumentationfortheappropriatestatistic.
Asageneralrulelookforcounterswithnameslikefail,miss,error,discard,buf,fifo,fullordrop.
Statisticsmaybeupperorlowercase.
Forexample,thisdriverincrementsvariousrx_*_errorsstatistics:#ethtool-Seth3rx_errors:0tx_errors:0rx_dropped:0tx_dropped:0rx_length_errors:0rx_over_errors:3295rx_crc_errors:0rx_frame_errors:0rx_fifo_errors:3295rx_missed_errors:3295Therearevarioustoolsavailabletoisolateaproblemarea.
Locatethebottleneckbyinvestigatingthefollowingpoints:Theadapterfirmwarelevel-Observedropsinethtool-SethXstatisticsTheadapterdriverlevelTheLinuxkernel,IRQsorSoftIRQs-Check/proc/interruptsand/proc/net/softnet_statTheprotocollayersIP,TCP,orUDP-Usenetstat-sandlookforerrorcounters.
Herearesomecommonexamplesofbottlenecks:IRQsarenotgettingbalancedcorrectly.
Insomecasestheirqbalanceservicemaynotbeworkingcorrectlyorrunningatall.
Check/proc/interruptsandmakesurethatinterruptsarespreadacrossmultipleCPUcores.
Refertotheirqbalancemanual,ormanuallybalancetheIRQs.
Inthefollowingexample,interruptsaregettingprocessedbyonlyoneprocessor:#egrep"CPU0|eth2"/proc/interruptsCPU0CPU1CPU2CPU3CPU4CPU5105:143000000000IR-PCI-MSI-edgeeth2-rx-0106:120000000000IR-PCI-MSI-edgeeth2-rx-1107:139999900000IR-PCI-MSI-edgeeth2-rx-2108:135000000000IR-PCI-MSI-edgeeth2-rx-3109:8000000000IR-PCI-MSI-edgeeth2-txSeeifanyofthecolumnsbesidesthe1stcolumnof/proc/net/softnet_statareincreasing.
Inthefollowingexample,thecounterislargeforCPU0andbudgetneedstobeincreased:#cat/proc/net/softnet_stat0073d76b00000000000049ae00000000000000000000000000000000000000000000000000000000000000d20000000000000000000000000000000000000000000000000000000000000000000000000000015c000000000000000000000000000000000000000000000000000000000000000000000000RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell7SoftIRQsmaynotbegettingenoughCPUtimetopolltheadapterasperFigure1.
Usetoolslikesar,mpstat,ortoptodeterminewhatisconsumingCPUruntime.
Useethtool-SethXtocheckaspecificadapterforerrors:#ethtool-Seth3rx_over_errors:399rx_fifo_errors:399rx_missed_errors:399Dataismakingituptothesocketbufferqueuebutnotgettingdrainedfastenough.
Monitorthess-nmpcommandandlookforfullRXqueues.
Usethenetstat-scommandandlookforbufferpruningerrorsorUDPerrors.
ThefollowingexampleshowsUDPreceiveerrors:#netstat-suUdp:4218packetsreceived111999packetreceiveerrors333packetssentIncreasetheapplication'ssocketreceivebufferorusebufferauto-tuningbynotspecifyingasocketbuffersizeintheapplication.
Checkwhethertheapplicationcallssetsockopt(SO_RCVBUF)asthatwilloverridethedefaultsocketbuffersettings.
Applicationdesignisanimportantfactor.
Lookatstreamliningtheapplicationtomakeitmoreefficientatreadingdataoffthesocket.
OnepossiblesolutionistohaveseparateprocessesdrainingthesocketqueuesusingInter-ProcessCommunication(IPC)toanotherprocessthatdoesthebackgroundworklikediskI/O.
UsemultipleTCPstreams.
Morestreamsareoftenmoreefficientattransferringdata.
Usenetstat-neopatocheckhowmanyconnectionsanapplicationisusing:tcp000.
0.
0.
0:123450.
0.
0.
0:*LISTEN030580027840/.
/serveroff(0.
00/0/0)tcp1634285801.
0.
0.
8:123451.
0.
0.
6:57786ESTABLISHED030582127840/.
/serveroff(0.
00/0/0)UselargerTCPorUDPpacketsizes.
Eachindividualnetworkpackethasacertainamountofoverhead,suchasheaders.
Sendingdatainlargercontiguousblockswillreducethatoverhead.
Thisisdonebyspecifyingalargerbuffersizewiththesend()andrecv()functioncalls;pleaseseethemanpageofthesefunctionsfordetails.
Insomecases,theremaybeachangeindriverbehaviorafterupgradingtoanewkernelversionofRedHatEnterpriseLinux.
Ifadapterdropsoccurafteranupgrade,openasupportcasewithRedHatGlobalSupportServicestodeterminewhethertuningisrequired,orwhetherthisisadriverbug.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell8PerformanceTuningSoftIRQMissesIftheSoftIRQsdonotrunforlongenough,therateofincomingdatacouldexceedthekernel'scapabilitytodrainthebufferfastenough.
Asaresult,theNICbufferswilloverflowandtrafficwillbelost.
Occasionally,itisnecessarytoincreasethetimethatSoftIRQsareallowedtorunontheCPU.
Thisisknownasthenetdev_budget.
Thedefaultvalueofthebudgetis300.
ThiswillcausetheSoftIRQprocesstodrain300messagesfromtheNICbeforegettingofftheCPU:#sysctlnet.
core.
netdev_budgetnet.
core.
netdev_budget=300Thisvaluecanbedoubledifthe3rdcolumnin/proc/net/softnet_statisincreasing,whichindicatesthattheSoftIRQdidnotgetenoughCPUtime.
Smallincrementsarenormalanddonotrequiretuning.
Thisleveloftuningisseldomrequiredonasystemwithonlygigabitinterfaces.
However,asystempassingupwardsof10Gbpsmayneedthistunableincreased.
#catsoftnet_stat0073d76b00000000000049ae00000000000000000000000000000000000000000000000000000000000000d20000000000000000000000000000000000000000000000000000000000000000000000000000015c0000000000000000000000000000000000000000000000000000000000000000000000000000002a000000000000000000000000000000000000000000000000000000000000000000000000Forexample,tuningthevalueonthisNICfrom300to600willallowsoftinterruptstorunfordoublethedefaultCPUtime:#sysctl-wnet.
core.
netdev_budget=600TunedTunedisanadaptivesystemtuningdaemon.
Itcanbeusedtoapplyavarietyofsystemsettingsgatheredtogetherintoacollectioncalledaprofile.
AtunedprofilecancontaininstructionssuchasCPUgovernor,IOscheduler,andkerneltunablessuchasCPUschedulingorvirtualmemorymanagement.
TunedalsoincorporatesamonitoringdaemonwhichcancontrolordisablepowersavingabilityofCPUs,disks,andnetworkdevices.
Theaimofperformancetuningistoapplysettingswhichenablethemostdesirableperformance.
Tunedcanautomatealargepartofthiswork.
First,installtuned,startthetuningdaemonservice,andenabletheserviceonboot:#yum-yinstalltuned#servicetunedstart#chkconfigtunedonRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell9Listtheperformanceprofiles:#tuned-admlistAvailableprofiles:-throughput-performance-default-desktop-powersave-enterprise-storage.
.
.
Thecontentsofeachprofilecanbeviewedinthe/etc/tune-profiles/directory.
Weareconcernedaboutsettingaperformanceprofilesuchasthroughput-performance,latency-performance,orenterprise-storage.
Setaprofile:#tuned-admprofilethroughput-performanceSwitchingtoprofile'throughput-performance'.
.
.
Theselectedprofilewillapplyeverytimethetunedservicestarts.
Thetunedserviceisdescribedfurtherinmantuned.
NumadSimilartotuned,numadisadaemonwhichcanassistwithprocessandmemorymanagementonsystemswithNon-UniformMemoryAccess(NUMA)architecture.
Numadachievesthisbymonitoringsystemtopologyandresourceusage,thenattemptingtolocateprocessesforefficientNUMAlocalityandefficiency,whereaprocesshasasufficientlylargememorysizeandCPUload.
Thenumadservicealsorequirescgroups(Linuxkernelcontrolgroups)tobeenabled.
#servicecgconfigstartStartingcgconfigservice:[OK]#servicenumadstartStartingnumad:[OK]Bydefault,asofRedHatEnterpriseLinux6.
5,numadwillmanageanyprocesswithover300Mbofmemoryusageand50%ofonecoreCPUusage,andtrytouseanygivenNUMAnodeupto85%capacity.
Numadcanbemorefinelytunedwiththedirectivesdescribedinmannumad.
PleaserefertotheUnderstandingNUMAarchitecturesectionlaterinthisdocumenttoseeifyoursystemisaNUMAsystemornot.
CPUPowerStatesTheACPIspecificationdefinesvariouslevelsofprocessorpowerstatesor"C-states",withC0RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell10beingtheoperatingstate,C1beingthehaltstate,plusprocessormanufacturersimplementingvariousadditionalstatestoprovideadditionalpowersavingsandrelatedadvantagessuchaslowertemperatures.
Unfortunately,transitioningbetweenpowerstatesiscostlyintermsoflatency.
Asweareconcernedwithmakingtheresponsivenessofthesystemashighaspossible,itisdesirabletodisableallprocessor"deepsleep"states,leavingonlyoperatingandhalt.
ThismustbeaccomplishedfirstinthesystemBIOSorEFIfirmware.
AnystatessuchasC6,C3,C1Eorsimilarshouldbedisabled.
WecanensurethekernelneverrequestsaC-statebelowC1byaddingprocessor.
max_cstate=1tothekernellineintheGRUBbootloaderconfiguration.
Insomeinstances,thekernelisabletooverridethehardwaresettingandtheadditionalparameterintel_idle.
max_cstate=0mustbeaddedtosystemswithIntelprocessors.
Thesleepstateoftheprocessorcanbeconfirmedwith:#cat/sys/module/intel_idle/parameters/max_cstate0Ahighervalueindicatesthatadditionalsleepstatesmaybeentered.
Thepowertoputility'sIdleStatspagecanshowhowmuchtimeisbeingspentineachC-state.
IRQBalanceIRQBalanceisaservicewhichcanautomaticallybalanceinterruptsacrossCPUcores,basedonrealtimesystemconditions.
Itisvitalthatthecorrectversionofirqbalanceisrunningforaparticularkernel.
ForNUMAsystems,irqbalance-1.
0.
4-8.
el6_5orgreaterisrequiredforRedHatEnterpriseLinux6.
5andirqbalance-1.
0.
4-6.
el6_4orgreaterisrequiredforRedHatEnterpriseLinux6.
4.
SeetheUnderstandingNUMAarchitecturesectionlaterinthisdocumentformanuallybalancingirqbalanceforNUMAsystems.
#rpm-qirqbalanceirqbalance-0.
55-29.
el6.
x86_64ManualbalancingofinterruptsTheIRQaffinitycanalsobemanuallybalancedifdesired.
RedHatstronglyrecommendsusingirqbalancetobalanceinterrupts,asitdynamicallybalancesinterruptsdependingonsystemusageandotherfactors.
However,manuallybalancinginterruptscanbeusedtodetermineifirqbalanceisnotbalancingIRQsinaoptimummannerandthereforecausingpacketloss.
Theremaybesomeveryspecificcaseswheremanuallybalancinginterruptspermanentlycanbebeneficial.
Forthiscase,theinterruptswillbemanuallyassociatedwithaCPUusingSMPaffinity.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell11Thereare2waystodothis;withabitmaskorusingsmp_affinity_listwhichisavailablefromRedHatEnterpriseLinux6onwards.
Tomanuallybalanceinterrupts,theirqbalanceserviceneedstobestoppedandpersistentlydisabled:#chkconfigirqbalanceoff#serviceirqbalancestopStoppingirqbalance:[OK]ViewtheCPUcoreswhereadevice'sinterruptisallowedtobereceived:#grep"CPU0|eth3"/proc/interruptsCPU0CPU1CPU2CPU3CPU4CPU5110:113600000IR-PCI-MSI-edgeeth3-rx-0111:200000IR-PCI-MSI-edgeeth3-rx-1112:000000IR-PCI-MSI-edgeeth3-rx-2113:000000IR-PCI-MSI-edgeeth3-rx-3114:000000IR-PCI-MSI-edgeeth3-tx#cat/proc/irq/110/smp_affinity_list0-5OnewaytomanuallybalancetheCPUcoresiswithascript.
Thefollowingscriptisasimpleproof-of-conceptexample:#!
/bin/bash#nic_balance.
sh#usagenic_balance.
shcpu=0grep$1/proc/interrupts|awk'{print$1}'|sed's/://'|whilereadadoecho$cpu>/proc/irq/$a/smp_affinity_listecho"echo$cpu>/proc/irq/$a/smp_affinity_list"if[$cpu=$2]thencpu=0filetcpu=cpu+1doneTheabovescriptreportsthecommandsitranasfollows:#shbalance.
sheth35echo0>/proc/irq/110/smp_affinity_listecho1>/proc/irq/111/smp_affinity_listecho2>/proc/irq/112/smp_affinity_listecho3>/proc/irq/113/smp_affinity_listecho4>/proc/irq/114/smp_affinity_listecho5>/proc/irq/131/smp_affinity_listTheabovescriptisprovidedunderaCreativeCommonsZerolicense.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell12EthernetFlowControl(a.
k.
a.
PauseFrames)PauseframesareEthernet-levelflowcontrolbetweentheadapterandtheswitchport.
Theadapterwillsend"pauseframes"whentheRXorTXbuffersbecomefull.
Theswitchwillstopdataflowingforatimespanintheorderofmillisecondsorless.
Thisisusuallyenoughtimetoallowthekerneltodraintheinterfacebuffers,thuspreventingthebufferoverflowandsubsequentpacketdropsoroverruns.
Ideally,theswitchwillbuffertheincomingdataduringthepausetime.
Howeveritisimportanttorealizethatthislevelofflowcontrolisonlybetweentheswitchandtheadapter.
Ifpacketsaredropped,thehigherlayerssuchasTCP,ortheapplicationinthecaseofUDPand/ormulticast,shouldinitiaterecovery.
PauseframesandFlowControlneedtobeenabledonboththeNICandswitchportforthisfeaturetotakeeffect.
PleaserefertoyournetworkequipmentmanualorvendorforinstructiononhowtoenableFlowControlonaport.
Inthisexample,FlowControlisdisabled:#ethtool-aeth3Pauseparametersforeth3:Autonegotiate:offRX:offTX:offToenableFlowControl:#ethtool-Aeth3rxon#ethtool-Aeth3txonToconfirmFlowControlisenabled:#ethtool-aeth3Pauseparametersforeth3:Autonegotiate:offRX:onTX:onInterruptCoalescence(IC)Interruptcoalescencereferstotheamountoftrafficthatanetworkinterfacewillreceive,ortimethatpassesafterreceivingtraffic,beforeissuingahardinterrupt.
Interruptingtoosoonortoofrequentlyresultsinpoorsystemperformance,asthekernelstops(or"interrupts")arunningtasktohandletheinterruptrequestfromthehardware.
InterruptingtoolatemayresultintrafficnotbeingtakenofftheNICsoonenough.
Moretrafficmayarrive,overwritingtheprevioustrafficstillwaitingtobereceivedintothekernel,resultingintrafficloss.
MostmodernNICsanddriverssupportIC,andmanyallowthedrivertoautomaticallymoderatethenumberofinterruptsgeneratedbythehardware.
TheICsettingsusuallycompriseof2maincomponents,timeandnumberofpackets.
Timebeingthenumbermicroseconds(u-secs)thattheNICwillwaitbeforeinterruptingthekernel,andthenumberbeingthemaximumnumberofRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell13packetsallowedtobewaitinginthereceivebufferbeforeinterruptingthekernel.
ANIC'sinterruptcoalescencecanbeviewedusingethtool-cethXcommand,andtunedusingtheethtool-CethXcommand.
Adaptivemodeenablesthecardtoauto-moderatetheIC.
Inadaptivemode,thedriverwillinspecttrafficpatternsandkernelreceivepatterns,andestimatecoalescingsettingson-the-flywhichaimtopreventpacketloss.
Thisisusefulwhenmanysmallpacketsarebeingreceived.
Higherinterruptcoalescencefavorsbandwidthoverlatency.
AVOIPapplication(latency-sensitive)mayrequirelesscoalescencethanafiletransferprotocol(throughput-sensitive).
Differentbrandsandmodelsofnetworkinterfacecardshavedifferentcapabilitiesanddefaultsettings,sopleaserefertothemanufacturer'sdocumentationfortheadapteranddriver.
OnthissystemadaptiveRXisenabledbydefault:#ethtool-ceth3Coalesceparametersforeth3:AdaptiveRX:onTX:offstats-block-usecs:0sample-interval:0pkt-rate-low:400000pkt-rate-high:450000rx-usecs:16rx-frames:44rx-usecs-irq:0rx-frames-irq:0ThefollowingcommandturnsadaptiveICoff,andtellstheadaptertointerruptthekernelimmediatelyuponreceptionofanytraffic:#ethtool-Ceth3adaptive-rxoffrx-usecs0rx-frames0ArealisticsettingistoallowatleastsomepacketstobufferintheNIC,andatleastsometimetopass,beforeinterruptingthekernel.
Validrangesmaybefrom1tohundreds,dependingonsystemcapabilitiesandtrafficreceived.
TheAdapterQueueThenetdev_max_backlogisaqueuewithintheLinuxkernelwheretrafficisstoredafterreceptionfromtheNIC,butbeforeprocessingbytheprotocolstacks(IP,TCP,etc).
ThereisonebacklogqueueperCPUcore.
Agivencore'squeuecangrowautomatically,containinganumberofpacketsuptothemaximumspecifiedbythenetdev_max_backlogsetting.
Thenetif_receive_skb()kernelfunctionwillfindthecorrespondingCPUforapacket,andenqueuepacketsinthatCPU'squeue.
Ifthequeueforthatprocessorisfullandalreadyatmaximumsize,packetswillbedropped.
Totunethissetting,firstdeterminewhetherthebacklogneedsincreasing.
The/proc/net/softnet_statfilecontainsacounterinthe2rdcolumnthatisincrementedwhenthenetdevbacklogqueueoverflows.
Ifthisvalueisincrementingovertime,thennetdev_max_backlogneedstobeincreased.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell14Eachlineofthesoftnet_statfilerepresentsaCPUcorestartingfromCPU0:Line1=CPU0Line2=CPU1Line3=CPU2andsoon.
Thefollowingsystemhas12CPUcores:#wc-l/proc/net/softnet_stat12Whenapacketisunabletobeplacedintoabacklogqueue,thefollowingcodeisexecutedwhereget_cpu_varidentifiestheappropriateprocessorqueue:__get_cpu_var(netdev_rx_stat).
dropped++;Theabovecodethenincrementsthedroppedstatisticforthatqueue.
Eachlineinthesoftnet_statfilerepresentsthenetif_rx_statsstructureforthatCPU.
Thatdatastructurecontains:structnetif_rx_stats{unsignedtotal;unsigneddropped;unsignedtime_squeeze;unsignedcpu_collision;unsignedreceived_rps;};The1stcolumnisthenumberofframesreceivedbytheinterrupthandler.
The2ndcolumnisthenumberofframesdroppedduetonetdev_max_backlogbeingexceeded.
The3rdcolumnisthenumberoftimesksoftirqdranoutofnetdev_budgetorCPUtimewhentherewasstillworktobedone.
TheothercolumnsmayvarydependingontheversionRedHatEnterpriseLinux.
Usingthefollowingexample,thefollowingcountersforCPU0andCPU1arethefirsttwolines:#catsoftnet_stat0073d76b00000000000049ae00000000000000000000000000000000000000000000000000000000000000d20000000000000000000000000000000000000000000000000000000000000000000000000000015c0000000000000000000000000000000000000000000000000000000000000000000000000000002a000000000000000000000000000000000000000000000000000000000000000000000000.
.
.
Fortheaboveexample,netdev_max_backlogdoesnotneedtobechangedasthenumberofdropshasremainedat0:RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell15ForCPU0Totaldroppedno_budgetlock_contention0073d76b00000000000049ae00000000ForCPU1Totaldroppedno_budgetlock_contention000000d2000000000000000000000000Thestatisticsineachcolumnareprovidedinhexadecimal.
Thedefaultnetdev_max_backlogvalueis1000.
However,thismaynotbeenoughformultipleinterfacesoperatingat1Gbps,orevenasingleinterfaceat10Gbps.
Trydoublingthisvalueandobservingthe/proc/net/softnet_statfile.
Ifdoublingthevaluereducestherateatwhichdropsincrement,doubleagainandtestagain.
Repeatthisprocessuntiltheoptimumsizeisestablishedanddropsdonotincrement.
Thebacklogcanbechangedwithwithfollowingcommand,whereXisthedesiredvaluetobeset:#sysctl-wnet.
core.
netdev_max_backlog=XAdapterRXandTXBufferTuningAdapterbufferdefaultsarecommonlysettoasmallersizethanthemaximum.
Often,increasingthereceivebuffersizeisaloneenoughtopreventpacketdrops,asitcanallowthekernelslightlymoretimetodrainthebuffer.
Asaresult,thiscanpreventpossiblepacketloss.
Thefollowinginterfacehasthespacefor8kilobytesofbufferbutisonlyusing1kilobyte:#ethtool-geth3Ringparametersforeth3:Pre-setmaximums:RX:8192RXMini:0RXJumbo:0TX:8192Currenthardwaresettings:RX:1024RXMini:0RXJumbo:0TX:512IncreaseboththeRXandTXbufferstothemaximum:#ethtool-Geth3rx8192tx8192Thischangecanbemadewhilsttheinterfaceisonline,thoughapauseintrafficwillbeseen.
Thesesettingscanbepersistedbywritingascriptat/sbin/ifup-local.
Thisisdocumentedontheknowledgebaseat:HowdoIrunascriptorprogramimmediatelyaftermynetworkinterfacegoesuphttps://access.
redhat.
com/knowledge/solutions/8694RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell16AdapterTransmitQueueLengthThetransmitqueuelengthvaluedeterminesthenumberofpacketsthatcanbequeuedbeforebeingtransmitted.
Thedefaultvalueof1000isusuallyadequatefortoday'shighspeed10Gbpsoreven40Gbpsnetworks.
However,ifthenumbertransmiterrorsareincreasingontheadapter,considerdoublingit.
Useip-slinktoseeifthereareanydropsontheTXqueueforanadapter.
#iplink2:em1:mtu1500qdiscpfifo_fastmasterbr0stateUPmodeDEFAULTgroupdefaultqlen1000link/etherf4:ab:cd:1e:4c:c7brdff:ff:ff:ff:ff:ffRX:bytespacketserrorsdroppedoverrunmcast71017768832606195240001098117TX:bytespacketserrorsdroppedcarriercollsns10373833340369601900000Thequeuelengthcanbemodifiedwiththeiplinkcommand:#iplinksetdevem1txqueuelen2000#iplink2:em1:mtu1500qdiscpfifo_fastmasterbr0stateUPmodeDEFAULTgroupdefaultqlen2000link/etherf4:ab:cd:1e:4c:c7brdff:ff:ff:ff:ff:ffTopersistthisvalueacrossreboots,audevrulecanbewrittentoapplythequeuelengthtotheinterfaceasitiscreated,orthenetworkscriptscanbeextendedwithascriptat/sbin/ifup-localasdescribedontheknowledgebaseat:HowdoIrunascriptorprogramimmediatelyaftermynetworkinterfacegoesuphttps://access.
redhat.
com/knowledge/solutions/8694ModuleparametersEachnetworkinterfacedriverusuallycomesasloadablekernelmodule.
Modulescanbeloadedandunloadedusingthemodprobecommand.
ThesemodulesusuallycontainparametersthatcanbeusedtofurthertunethedevicedriverandNIC.
Themodinfocommandcanbeusedtoviewtheseparameters.
Documentingspecificdriverparametersisbeyondthescopeofthisdocument.
Pleaserefertothehardwaremanual,driverdocumentation,orhardwarevendorforanexplanationoftheseparameters.
TheLinuxkernelexportsthecurrentsettingsformoduleparametersviathesysfspath/sys/module//parametersForexample,giventhedriverparameters:#modinfomlx4_enfilename:/lib/modules/2.
6.
32-246.
el6.
x86_64/kernel/drivers/net/mlx4/mlx4_en.
koversion:2.
0(Dec2011)license:DualBSD/GPLRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell17description:MellanoxConnectXHCAEthernetdriverauthor:LiranLiss,YevgenyPetrilindepends:mlx4_corevermagic:2.
6.
32-246.
el6.
x86_64SMPmod_unloadmodversionsparm:inline_thold:tresholdforusinginlinedata(int)parm:tcp_rss:EnableRSSforincommingTCPtrafficordisabled(0)(uint)parm:udp_rss:EnableRSSforincommingUDPtrafficordisabled(0)(uint)parm:pfctx:PrioritybasedFlowControlpolicyonTX[7:0].
Perprioritybitmask(uint)parm:pfcrx:PrioritybasedFlowControlpolicyonRX[7:0].
Perprioritybitmask(uint)Thecurrentvaluesofeachdriverparametercanbecheckedinsysfs.
Forexample,tocheckthecurrentsettingfortheudp_rssparameter:#ls/sys/module/mlx4_en/parametersinline_tholdnum_lropfcrxpfctxrss_maskrss_xortcp_rssudp_rss#cat/sys/module/mlx4_en/parameters/udp_rss1Somedriversallowthesevaluestobemodifiedwhilstloaded,butmanyvaluesrequirethedrivermoduletobeunloadedandreloadedtoapplyamoduleoption.
Loadingandunloadingofadrivermoduleisdonewiththemodprobecommand:#modprobe-r#modprobeFornon-persistentuse,amoduleparametercanalsobeenabledasthedriverisloaded:#modprobe-r#modprobe=Intheeventamodulecannotbeunloaded,arebootwillberequired.
Forexample,touseRPSinsteadofRSS,disableRSSasfollows:#echo'optionsmlx4_enudp_rss=0'>>/etc/modprobe.
d/mlx4_en.
confUnloadandreloadthedriver:#modprobe-rmlx4_en#modprobemlx4_enThisparametercouldalsobeloadedjustthistime:#modprobe-rmlx4_en#modprobemlx4_enudp_rss=0Confirmwhetherthatparameterchangetookeffect:#cat/sys/module/mlx4_en/parameters/udp_rss0Insomecases,driverparameterscanalsobecontrolledviatheethtoolcommand.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell18Forexample,theIntelSourceforgeigbdriverhastheinterruptmoderationparameterInterruptThrottleRate.
TheupstreamLinuxkerneldriverandtheRedHatEnterpriseLinuxdriverdonotexposethisparameterviaamoduleoption.
Instead,thesamefunctionalitycaninsteadbetunedviaethtool:#ethtool-CethXrx-usecs1000AdapterOffloadingInordertoreduceCPUloadfromthesystem,modernnetworkadaptershaveoffloadingfeatureswhichmovesomenetworkprocessingloadontothenetworkinterfacecard.
Forexample,thekernelcansubmitlarge(upto64k)TCPsegmentstotheNIC,whichtheNICwillthenbreakdownintoMTU-sizedsegments.
ThisparticularfeatureiscalledTCPSegmentationOffload(TSO).
Offloadingfeaturesareoftenenabledbydefault.
Itisbeyondthescopeofthisdocumenttocovereveryoffloadingfeaturein-depth,however,turningthesefeaturesoffisagoodtroubleshootingstepwhenasystemissufferingfrompoornetworkperformanceandre-test.
Ifthereisanperformanceimprovement,ideallynarrowthechangetoaspecificoffloadingparameter,thenreportthistoRedHatGlobalSupportServices.
Itisdesirabletohaveoffloadingenabledwhereverpossible.
Offloadingsettingsaremanagedbyethtool-KethX.
Commonsettingsinclude:GRO:GenericReceiveOffloadLRO:LargeReceiveOffloadTSO:TCPSegmentationOffloadRXcheck-summing=ProcessingofreceivedataintegrityTXcheck-summing=Processingoftransmitdataintegrity(requiredforTSO)#ethtool-keth0Featuresforeth0:rx-checksumming:ontx-checksumming:onscatter-gather:ontcp-segmentation-offload:onudp-fragmentation-offload:offgeneric-segmentation-offload:ongeneric-receive-offload:onlarge-receive-offload:onrx-vlan-offload:ontx-vlan-offload:onntuple-filters:offreceive-hashing:onJumboFramesThedefault802.
3Ethernetframesizeis1518bytes,or1522byteswithaVLANtag.
TheEthernetheaderconsumes18bytesofthis(or22byteswithVLANtag),leavinganeffectivemaximumpayloadof1500bytes.
JumboFramesareanunofficialextensiontoEthernetwhichnetworkequipmentvendorshavemadeade-factostandard,increasingthepayloadfrom1500to9000bytes.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell19WithregularEthernetframesthereisanoverheadof18bytesforevery1500bytesofdataplacedonthewire,or1.
2%overhead.
WithJumboFramesthereisanoverheadof18bytesforevery9000bytesofdataplacedonthewire,or0.
2%overhead.
TheabovecalculationsassumenoVLANtag,howeversuchatagwilladd4bytestotheoverhead,makingefficiencygainsevenmoredesirable.
Whentransferringlargeamountsofcontiguousdata,suchassendinglargefilesbetweentwosystems,theaboveefficiencycanbegainedbyusingJumboFrames.
Whentransferringsmallamountsofdata,suchaswebrequestswhicharetypicallybelow1500bytes,thereislikelynogaintobeseenfromusingalargerframesize,asdatapassingoverthenetworkwillbecontainedwithinsmallframesanyway.
ForJumboFramestobeconfigured,allinterfacesandnetworkequipmentinanetworksegment(i.
e.
broadcastdomain)mustsupportJumboFramesandhavetheincreasedframesizeenabled.
Refertoyournetworkswitchvendorforinstructionsonincreasingtheframesize.
OnRedHatEnterpriseLinux,increasetheframesizewithMTU=9000inthe/etc/sysconfig/network-scripts/ifcfg-filefortheinterface.
TheMTUcanbecheckedwiththeiplinkcommand:#iplink1:lo:mtu16436qdiscnoqueuestateUNKNOWNlink/loopback00:00:00:00:00:00brd00:00:00:00:00:002:eth0:mtu9000qdiscpfifo_faststateUPqlen1000link/ether52:54:00:36:b2:d1brdff:ff:ff:ff:ff:ffTCPTimestampsTCPTimestampsareanextensiontotheTCPprotocol,definedinRFC1323-TCPExtensionsforHighPerformance-http://tools.
ietf.
org/html/rfc1323TCPTimestampsprovideamonotonicallyincreasingcounter(onLinux,thecounterismillisecondssincesystemboot)whichcanbeusedtobetterestimatetheround-trip-timeofaTCPconversation,resultinginmoreaccurateTCPWindowandbuffercalculations.
Mostimportantly,TCPTimestampsalsoprovideProtectionAgainstWrappedSequenceNumbersastheTCPheaderdefinesaSequenceNumberasa32-bitfield.
Givenasufficientlyfastlink,thisTCPSequenceNumbernumbercanwrap.
Thisresultsinthereceiverbelievingthatthesegmentwiththewrappednumberactuallyarrivedearlierthanitsprecedingsegment,andincorrectlydiscardingthesegment.
Ona1gigabitpersecondlink,TCPSequenceNumberscanwrapin17seconds.
Ona10gigabitpersecondlink,thisisreducedtoaslittleas1.
7seconds.
Onfastlinks,enablingTCPTimestampsshouldbeconsideredmandatory.
TCPTimestampsprovideanalternative,non-wrapping,methodtodeterminetheageandorderofasegment,preventingwrappedTCPSequenceNumbersfrombeingaproblem.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell20EnsureTCPTimestampsareenabled:#sysctlnet.
ipv4.
tcp_timestampsnet.
ipv4.
tcp_timestamps=1Iftheabovecommandindicatesthattcp_timestamps=0,enableTCPTimestamps:#sysctl-wnet.
ipv4.
tcp_timestamps=1TCPSACKTCPSelectiveAcknowledgments(SACK)isTCPextensiondefinedinRFC2018-TCPSelectiveAcknowledgmentOptions-http://tools.
ietf.
org/html/rfc2018AbasicTCPAcknowledgment(ACK)onlyallowsthereceivertoadvisethesenderwhichbyteshavebeenreceived.
Whenpacketlossoccurs,thisrequiresthesendertoretransmitallbytesfromthepointofloss,whichcanbeinefficient.
SACKallowsasendertospecifywhichbyteshavebeenlostandwhichbyteshavebeenreceived,sothesendercanretransmitonlythelostbytes.
ThereissomeresearchavailableinthenetworkingcommunitywhichshowsenablingSACKonhigh-bandwidthlinkscancauseunnecessaryCPUcyclestobespentcalculatingSACKvalues,reducingoverallefficiencyofTCPconnections.
Thisresearchimpliestheselinksaresofast,theoverheadofretransmittingsmallamountsofdataislessthantheoverheadofcalculatingthedatatoprovideaspartofaSelectiveAcknowledgment.
Unlessthereishighlatencyorhighpacketloss,itismostlikelybettertokeepSACKturnedoffoverahighperformancenetwork.
SACKcanbeturnedoffwithkerneltunables:#sysctl-wnet.
ipv4.
tcp_sack=0TCPWindowScalingTCPWindowScalingisanextensiontotheTCPprotocol,definedinRFC1323-TCPExtensionsforHighPerformance-http://tools.
ietf.
org/html/rfc1323IntheoriginalTCPdefinition,theTCPsegmentheaderonlycontainsan8-bitvaluefortheTCPWindowSize,whichisinsufficientforthelinkspeedsandmemorycapabilitiesofmoderncomputing.
TheTCPWindowScalingextensionwasintroducedtoallowalargerTCPReceiveWindow.
ThisisachievedbyaddingascalingvaluetotheTCPoptionswhichareaddedaftertheTCPheader.
TherealTCPReceiveWindowisbit-shiftedleftbythevalueoftheScalingFactorvalue,uptoamaximumsizeof1,073,725,440bytes,orclosetoonegigabyte.
TCPWindowScalingisnegotiatedduringthethree-wayTCPhandshake(SYN,SYN+ACK,ACK)whichopenseveryTCPconversation.
BothsenderandreceivermustsupportTCPWindowScalingfortheWindowScalingoptiontowork.
IfeitherorbothparticipantsdonotadvertiseWindowScalingabilityintheirhandshake,theconversationfallsbacktousingtheoriginal8-bitTCPWindowSize.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell21TCPWindowScalingisenabledbydefaultonRedHatEnterpriseLinux.
ThestatusofWindowScalingcanbeconfirmedwiththecommand:#sysctlnet.
ipv4.
tcp_window_scalingnet.
ipv4.
tcp_window_scaling=1TCPWindowScalingnegotiationcanbeviewedbytakingapacketcaptureoftheTCPhandshakewhichopensaconversation.
Inthepacketcapture,checktheTCPOptionsfieldofthethreehandshakepackets.
Ifeithersystem'shandshakepacketsdonotcontaintheTCPWindowScalingoption,itmaybenecessarytoenableTCPWindowScalingonthatsystem.
TCPBufferTuningOncenetworktrafficisprocessedfromthenetworkadapter,receptiondirectlyintotheapplicationisattempted.
Ifthatisnotpossible,dataisqueuedontheapplication'ssocketbuffer.
Thereare3queuestructuresinthesocket:sk_rmem_alloc={counter=121948},sk_wmem_alloc={counter=553},sk_omem_alloc={counter=0sk_rmem_allocisthereceivequeuek_wmem_allocisthetransmitqueuesk_omem_allocistheout-of-orderqueue,skbswhicharenotwithinthecurrentTCPWindowareplacedinthisqueueThereisalsothesk_rcvbufvariablewhichisthelimit,measuredinbytes,thatthesocketcanreceive.
Inthiscase:sk_rcvbuf=125336Fromtheaboveoutputitcanbecalculatedthatthereceivequeueisalmostfull.
Whensk_rmem_alloc>sk_rcvbuftheTCPstackwillcallaroutinewhich"collapses"thereceivequeue.
Thisisakindofhouse-keepingwherethekernelwilltrythefreespaceinthereceivequeuebyreducingoverhead.
However,thisoperationcomesataCPUcost.
Ifcollapsingfailstofreesufficientspaceforadditionaltraffic,thendatais"pruned",meaningthedataisdroppedfrommemoryandthepacketislost.
Therefore,itbesttotunearoundthisconditionandavoidthebuffercollapsingandpruningaltogether.
Thefirststepistoidentifywhetherbuffercollapsingandpruningisoccurring.
Runthefollowingcommandtodeterminewhetherthisisoccurringornot:RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell22#netstat-sn|egrep"prune|collap";sleep30;netstat-sn|egrep"prune|collap"17671packetsprunedfromreceivequeuebecauseofsocketbufferoverrun18671packetsprunedfromreceivequeuebecauseofsocketbufferoverrunIf"pruning"hasincreasedduringthisinterval,thentuningisrequired.
ThefirststepistoincreasethenetworkandTCPreceivebuffersettings.
Thisisagoodtimetocheckwhethertheapplicationcallssetsockopt(SO_RCVBUF).
Iftheapplicationdoescallthisfunction,thiswilloverridethedefaultsettingsandturnoffthesocket'sabilitytoauto-tuneitssize.
Thesizeofthereceivebufferwillbethesizespecifiedbytheapplicationandnogreater.
Considerremovingthesetsockopt(SO_RCVBUF)functioncallfromtheapplicationandallowingthebuffersizetoauto-tuneinstead.
Tuningtcp_rmemThesocketmemorytunablehasthreevalues,describingtheminimum,default,andmaximumvaluesinbytes.
ThedefaultmaximumonmostRedHatEnterpriseLinuxreleasesis4MiB.
Toviewthesesettings,thenincreasethembyafactorof4:#sysctlnet.
ipv4.
tcp_rmem4096873804194304#sysctl-wnet.
ipv4.
tcp_rmem="1638434952016777216"#sysctlnet.
core.
rmem_max4194304#sysctl-wnet.
core.
rmem_max=16777216Iftheapplicationcannotbechangedtoremovesetsockopt(SO_RCVBUF),thenincreasethemaximumsocketreceivebuffersizewhichmaybesetbyusingtheSO_RCVBUFsocketoption.
Arestartofanapplicationisonlyrequiredwhenthemiddlevalueoftcp_rmemischanged,asthesk_rcvbufvalueinthesocketisinitializedtothiswhenthesocketiscreated.
Changingthe3rdandmaximumvalueoftcp_rmemdoesnotrequireanapplicationrestartasthesevaluesaredynamicallyassignedbyauto-tuning.
TCPListenBacklogWhenaTCPsocketisopenedbyaserverinLISTENstate,thatsockethasamaximumnumberofunacceptedclientconnectionsitcanhandle.
Ifanapplicationisslowatprocessingclientconnections,ortheservergetsmanynewconnectionsrapidly(commonlyknownasaSYNflood),thenewconnectionsmaybelostorspeciallycraftedreplypacketsknownas"SYNcookies"maybesent.
Ifthesystem'snormalworkloadissuchthatSYNcookiesarebeingenteredintothesystemlogregularly,thesystemandapplicationshouldbetunedtoavoidthem.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell23Themaximumbackloganapplicationcanrequestisdictatedbythenet.
core.
somaxconnkerneltunable.
Anapplicationcanalwaysrequestalargerbacklog,butitwillonlygetabacklogaslargeasthismaximum.
Thisparametercanbecheckedandchangedasfollows:#sysctlnet.
core.
somaxconnnet.
core.
somaxconn=128#sysctl-wnet.
core.
somaxconn=2048net.
core.
somaxconn=2048#sysctlnet.
core.
somaxconnnet.
core.
somaxconn=2048Afterchangingthemaximumallowedbacklog,anapplicationmustberestartedforthechangetotakeeffect.
Additionally,afterchangingthemaximumallowedbacklog,theapplicationmustbemodifiedtoactuallysetalargerbacklogonitslisteningsocket.
ThefollowingisanexampleintheClanguageofthechangerequiredtoincreasethesocketbacklog:-rc=listen(sockfd,128);+rc=listen(sockfd,2048);if(rc/device/numa_nodeForexample:#cat/sys/class/net/eth3/device/numa_node1ThiscommandwilldisplaytheNUMAnodenumber,interruptsforthedeviceshouldbedirectedtotheNUMAnodethatthePCIedevicebelongsto.
Thiscommandmaydisplay-1whichindicatesthehardwareplatformisnotactuallynon-uniformandthekernelisjustemulatingor"faking"NUMA,oradeviceisonabuswhichdoesnothaveanyNUMAlocality,suchasaPCIbridge.
IdentifyingInterruptstoBalanceCheckthenumberRXandTXqueuesontheadapter:#egrep"CPU0|eth3"/proc/interruptsCPU0CPU1CPU2CPU3CPU4CPU5110:000000IR-PCI-MSI-edgeeth3-rx-0111:000000IR-PCI-MSI-edgeeth3-rx-1112:000000IR-PCI-MSI-edgeeth3-rx-2113:200000IR-PCI-MSI-edgeeth3-rx-3114:000000IR-PCI-MSI-edgeeth3-txRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell26QueuesareallocatedwhentheNICdrivermoduleisloaded.
Insomecases,thenumberofqueuescanbedynamicallyallocatedonlineusingtheethtool-Lcommand.
Theabovedevicehas4RXqueuesandoneTXqueue.
Statisticsaredifferentforeverynetworkdriver,butifanetworkdriverprovidesseparatequeuestatistics,thesecanbeseenwiththecommandethtool-SethXwhereethXistheinterfaceinquestion:#ethtool-Seth3rx0_packets:0rx0_bytes:0rx1_packets:0rx1_bytes:0rx2_packets:0rx2_bytes:0rx3_packets:2rx3_bytes:120GLOSSARYRSS:ReceiveSideScalingRSSissupportedbymanycommonnetworkinterfacecards.
Onreceptionofdata,aNICcansenddatatomultiplequeues.
EachqueuecanbeservicedbyadifferentCPU,allowingforefficientdataretrieval.
TheRSSactsasanAPIbetweenthedriverandthecardfirmwaretodeterminehowpacketsaredistributedacrossCPUcores,theideabeingthatmultiplequeuesdirectingtraffictodifferentCPUsallowsforfasterthroughputandlowerlatency.
RSScontrolswhichreceivequeuegetsanygivenpacket,whetherornotthecardlistenstospecificunicastEthernetaddresses,whichmulticastaddressesitlistensto,whichqueuepairsorEthernetqueuesgetcopiesofmulticastpackets,etc.
RSSConsiderationsDoesthedriverallowthenumberofqueuestobeconfiguredSomedriverswillautomaticallygeneratethenumberofqueuesduringbootdependingonhardwareresources.
Forothersit'sconfigurableviaethtool-L.
HowmanycoresdoesthesystemhaveRSSshouldbeconfiguredsoeachqueuegoestoadifferentCPUcore.
RPS:ReceivePacketSteeringReceivePacketSteeringisakernel-levelsoftwareimplementationofRSS.
Itresidesthehigherlayersofthenetworkstackabovethedriver.
RSSorRPSshouldbemutuallyexclusive.
RPSisdisabledbydefault.
RPSusesa2-tupleor4-tuplehashsavedintherxhashfieldofthepacketdefinition,whichisusedtodeterminetheCPUqueuewhichshouldprocessagivenpacket.
RedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell27RFS:ReceiveFlowSteeringReceiveFlowSteeringtakesapplicationlocalityintoconsiderationwhensteeringpackets.
ThisavoidscachemisseswhentrafficarrivesonadifferentCPUcoretowheretheapplicationisrunning.
ReceiveSteeringReferenceFormoredetailsontheabovesteeringmechanisms,pleasereferto:https://www.
kernel.
org/doc/Documentation/networking/scaling.
txtNAPI:NewAPIThesoftwaremethodwhereadeviceispolledfornewnetworktraffic,insteadofthedeviceconstantlyraisinghardwareinterrupts.
skb,sk_buff:SocketbufferTherearedatabufferswhichareusedtotransportnetworkheadersandpayloaddatathroughtheLinuxkernel.
MTU:MaximumTransmissionUnitMTUdefinesthelargestcontiguousblockofdatawhichcanbesentacrossatransmissionmedium.
Ablockofdataistransmittedasasingleunitcommonlyreferredtoasaframeorpacket.
Eachdataunitwhichwillhaveaheadersizewhichdoesnotchange,makingitmoreefficienttosentasmuchdataaspossibleinagivendataunit.
Forexample,anEthernetheaderwithoutaVLANtagis18bytes.
Itismoreefficienttosend1500bytesofdataplusan18-byteheaderandlessefficienttosend1byteofdataplusan18-byteheader.
NUMA:NonUniformMemoryAccessAhardwarelayoutwhereprocessors,memory,anddevicesdonotallshareacommonbus.
Instead,somecomponentssuchasCPUsandmemoryaremorelocalormoredistantincomparisontoeachother.
NICTuningSummaryThefollowingisasummaryofpointswhichhavebeencoveredbythisdocumentindetail:SoftIRQmisses(netdevbudget)"tuned"tuningdaemon"numad"NUMAdaemonCPUpowerstatesInterruptbalancingRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell28PauseframesInterruptCoalescenceAdapterqueue(netdevbacklog)AdapterRXandTXbuffersAdapterTXqueueModuleparametersAdapteroffloadingJumboFramesTCPandUDPprotocoltuningNUMAlocalityRedHatEnterpriseLinuxNetworkPerformanceTuningGuide|Bainbridge,Maxwell29

零途云:香港站群云服务器16IP220元/月,云服务器低至39元/月

零途云(Lingtuyun.com)新上了香港站群云服务器 – CN2精品线路,香港多ip站群云服务器16IP/5M带宽,4H4G仅220元/月,还有美国200g高防云服务器低至39元/月起。零途云是一家香港公司,主要产品香港cn2 gia线路、美国Cera线路云主机,美国CERA高防服务器,日本CN2直连服务器;同时提供香港多ip站群云服务器。即日起,购买香港/美国/日本云服务器享受9折优惠,新...

Digital-VM:服务器,$80/月;挪威/丹麦英国/Digital-VM:日本/新加坡/digital-vm:日本VPS仅$2.4/月

digital-vm怎么样?digital-vm在今年1月份就新增了日本、新加坡独立服务器业务,但是不知为何,期间终止了销售日本服务器和新加坡服务器,今天无意中在webhostingtalk论坛看到Digital-VM在发日本和新加坡独立服务器销售信息。服务器硬件是 Supermicro、采用最新一代 Intel CPU、DDR4 RAM 和 Enterprise Samsung SSD内存,默认...

wordpress通用企业主题 wordpress高级企业自适应主题

wordpress高级企业自适应主题,通用型企业展示平台 + 流行宽屏设计,自适应PC+移动端屏幕设备,完美企业站功能体验+高效的自定义设置平台。一套完美自适应多终端移动屏幕设备的WordPress高级企业自适应主题, 主题设置模块包括:基本设置、首页设置、社会化网络设置、底部设置、SEO设置; 可以自定义设置网站通用功能模块、相关栏目、在线客服及更多网站功能。点击进入:wordpress高级企业...

红帽子企业为你推荐
虚拟主机服务器虚拟主机与独立服务器区别服务器租赁服务器出租是什么意思,来点简单能看得懂的虚拟主机租用虚拟主机服务器租用要怎么选择?中文域名注册查询哪里有可以查询中文域名是否被注册的地方?中文域名注册查询怎么查我们公司的中文域名是被谁注册的?虚拟主机推荐便宜的虚拟主机,推荐几个免费虚拟主机申请谁有1年免费的虚拟主机申请地址吖?域名注册查询如何知道域名注册信息?海外域名什么叫海外域名?重庆网站空间重庆建网站的公司 我司准备建一个好点的网站,求推荐
域名转让 东莞服务器租用 淘宝抢红包攻略 国外bt nerd 好看的留言 mobaxterm 个人免费空间 京东商城0元抢购 双十一秒杀 服务器合租 河南移动m值兑换 idc查询 免费私人服务器 web服务器搭建 789 浙江服务器 supercache 杭州电信 七牛云存储 更多