detail0x00000006

0x00000006  时间:2021-04-04  阅读:()
IntelOmni-PathIPandLNetRouterDesignGuideRev.
10.
0January2020Doc.
No.
:H99668,Rev.
:10.
0YoumaynotuseorfacilitatetheuseofthisdocumentinconnectionwithanyinfringementorotherlegalanalysisconcerningIntelproductsdescribedherein.
YouagreetograntIntelanon-exclusive,royalty-freelicensetoanypatentclaimthereafterdraftedwhichincludessubjectmatterdisclosedherein.
Nolicense(expressorimplied,byestoppelorotherwise)toanyintellectualpropertyrightsisgrantedbythisdocument.
Allinformationprovidedhereissubjecttochangewithoutnotice.
ContactyourIntelrepresentativetoobtainthelatestIntelproductspecificationsandroadmaps.
Theproductsdescribedmaycontaindesigndefectsorerrorsknownaserratawhichmaycausetheproducttodeviatefrompublishedspecifications.
Currentcharacterizederrataareavailableonrequest.
Inteltechnologies'featuresandbenefitsdependonsystemconfigurationandmayrequireenabledhardware,softwareorserviceactivation.
Performancevariesdependingonsystemconfiguration.
Nocomputersystemcanbeabsolutelysecure.
Checkwithyoursystemmanufacturerorretailerorlearnmoreatintel.
com.
Intel,theIntellogo,IntelXeonPhi,andXeonaretrademarksofIntelCorporationintheU.
S.
and/orothercountries.
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
Copyright2015–2020,IntelCorporation.
Allrightsreserved.
IntelOmni-PathIPandLNetRouterDesignGuideJanuary20202Doc.
No.
:H99668,Rev.
:10.
0RevisionHistoryDateRevisionDescriptionJanuary202010.
0Updatesforthisrevisioninclude:UpdatedlinkstoIntelOmni-PathdocumentationsetinIntelOmni-PathDocumentationLibrary.
UpdatedSingleRouteronpage17.
UpdatedsoftwaredownloadlinkinRHEL*andSLES*Installation.
October20199.
0Updatesforthisrevisioninclude:UpdatedPrefacetoincludenewBestPractices.
Addednewsection,Prerequisites,toRouterRedundancy/FailoverwithVRRPv3.
Addednewsection,IPRoutingGuidance.
December20188.
0Updatesforthisrevisioninclude:UpdatedIntroductionUpdatedLinuxIPRouter,renamedandreorganizedseveralsubsectionsUpdatedHardwareRequirementsUpdatedSoftwareRequirements,renamedandreorganizedseveralsubsectionsUpdatedInstallationStepsReorderedMTUmaterialandaddednewsubsections:ConcerningMaximumTransmissionUnits(MTUs)ReorderedandupdatedtheOPAInterfaceTuningsectionandaddednewmaterialforAcceleratedDatagramModeandDatagramModeorConnectedMode-WhichisBetterUpdatedEthernetInterfaceTuningUpdatedMeasuringBaselineIPPerformanceUpdatedLNetRouterDeploymentChecklistUpdatedOverviewofConfiguringLNetUpdatedSoftwareCompatibilityUpdatedExample1:LegacyStoragewithInfiniBand*CardConnectedtoNewComputeNodesUsingIntelOPAUpdatedConfigureLNetRouters(Example1)UpdatedExample2:NewStoragewithIntelOPAConnectedtoLegacyComputeNodesonInfiniBand*CardsUpdatedConfigureLNetRouters(Example2)September20187.
0Updatesforthisrevisioninclude:RewroteandreorganizedthebulkofPerformanceBenchmarkingandTuning.
AddedanotetoLNetTroubleshootingtospecifyLustre*version.
AddedthePreface.
UpdatedtheIntroduction.
UpdatedVirtualRouterswithFault-Tolerance.
UpdatedMultipleActiveRouterswithFaultTolerance.
April20186.
0Updatedtheexamplefigures.
UpdatedtheLNetmaterialinSection3.
RemovedallreferencestoLoadBalancingusingIPVS.
Changedbuildkeepalivedtopre-builtkeepalived.
continued.
.
.
RevisionHistory—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
03DateRevisionDescriptionAppendix:gratarpworkaroundforkeepalived.
May20175.
0Changedtitleofdocumenttoinclude"IP".
January20174.
0UpdatedSection3.
August20163.
0MinorcorrectioninSection2.
5.
2.
1AddingStaticRoutes.
February20162.
0AddedLNetRoutercontent.
AddedRHEL7.
2SLES12SP1.
November20151.
0Initialrelease.
IntelOmni-PathFabric—RevisionHistoryIntelOmni-PathIPandLNetRouterDesignGuideJanuary20204Doc.
No.
:H99668,Rev.
:10.
0ContentsRevisionHistory.
3Preface.
9IntendedAudience.
9IntelOmni-PathDocumentationLibrary.
9HowtoSearchtheIntelOmni-PathDocumentationSet.
11ClusterConfiguratorforIntelOmni-PathFabric.
12DocumentationConventions.
12BestPractices.
13LicenseAgreements.
13TechnicalSupport.
131.
0Introduction.
141.
1SpecialConventionsUsed.
152.
0LinuxIPRouter.
162.
1IPOverFabric.
162.
2IllustrativeLinuxIPRouterUseCases.
172.
2.
1SingleRouter.
172.
2.
2VirtualRouterswithFault-Tolerance.
172.
2.
3MultipleActiveRouterswithFaultTolerance.
182.
3HardwareRequirements.
192.
4SoftwareRequirements.
202.
4.
1BIOSSettings.
202.
4.
2RHEL*andSLES*Installation.
202.
4.
3IntelOmni-PathSoftwareInstallation.
212.
5RouterConfiguration.
232.
5.
1NetworkInterfaceNamingConsistency.
232.
5.
210/40/100GbitEthernet.
262.
6ClientConfiguration.
272.
6.
1IntelOmni-PathNodeBuild.
272.
6.
2IBNodeBuild.
272.
7RouterRedundancy/FailoverwithVRRPv3.
282.
7.
1Prerequisites.
282.
7.
2keepalivedInstallation.
292.
7.
3IPRoutingGuidance.
292.
7.
4EnsuringkeepalivedSuccessfullyLaunchesatBootTime.
302.
7.
5Configuringkeepalived.
302.
8PerformanceBenchmarkingandTuning.
322.
8.
1ConcerningMaximumTransmissionUnits(MTUs)322.
8.
2BIOSTuning.
342.
8.
3OPAInterfaceTuning.
342.
8.
4EthernetInterfaceTuning.
392.
8.
5PersistingEthernetTuning.
412.
8.
6MeasuringBaselineIPPerformance.
422.
8.
7RouterSaturation.
42Contents—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
053.
0LNetRouter.
443.
1Lustre.
org.
443.
1.
1LNetRouterDeploymentChecklist.
443.
2Overview.
453.
3OverviewofConfiguringLNet.
463.
4LNetTroubleshooting.
483.
5DesigningLNetRouterstoConnectIntelOPAandInfiniBand*503.
5.
1HardwareDesignandTuning.
513.
5.
2CPUSelection.
513.
5.
3MemoryConsiderations.
533.
5.
4SoftwareCompatibility.
543.
6PracticalImplementations.
543.
6.
1Example1:LegacyStoragewithInfiniBand*CardConnectedtoNewComputeNodesUsingIntelOPA.
543.
6.
2Example2:NewStoragewithIntelOPAConnectedtoLegacyComputeNodesonInfiniBand*Cards.
58AppendixAFirewallConfigurationforVRRPonRHEL.
63AppendixBUsingkeepalivedVersion1.
3.
5.
64IntelOmni-PathFabric—ContentsIntelOmni-PathIPandLNetRouterDesignGuideJanuary20206Doc.
No.
:H99668,Rev.
:10.
0Figures1DirectFabricAttachedMulti-homedStorage.
142Lustre*(LNet)RoutingtoInfiniBand*orEthernet.
143IPRoutingtoInfiniBand*orEthernet.
144SingleRouterExample.
175Fault-tolerantRouterPairExample.
186LoadDistributionAcrossMultipleRoutersExample.
197MappingofNetworkLayersandIPStackLayers.
328LargerMTUSizeEnablesFullOPA-sizePayloadinIPoFabricDatagramPackets.
339HeterogeneousTopology.
4610LNetRouter.
4711NetworkTroubleshootingUsingLNet.
4812CPUUtilizationforTwoLNetRoutersRoutingBetweenanIntelOPANetworkandaMellanoxFDRNetwork.
5213PCIe*SlotAllocation.
5314NetworkTopologyforExample1Configuration.
5515NetworkTopologyforAddingNewOPA-ConnectedServers.
59Figures—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
07Tables1CPUsAssignmentinExampleofIPRouter(AcceleratedIPoFabricDisabled)382LNetRouterCPUTuning.
523SampleTable.
54IntelOmni-PathFabric—TablesIntelOmni-PathIPandLNetRouterDesignGuideJanuary20208Doc.
No.
:H99668,Rev.
:10.
0PrefaceThismanualispartofthedocumentationsetfortheIntelOmni-PathFabric(IntelOPFabric),whichisanend-to-endsolutionconsistingofIntelOmni-PathHostFabricInterfaces(HFIs),IntelOmni-Pathswitches,andfabricmanagementanddevelopmenttools.
TheIntelOPFabricdeliversthenextgeneration,High-PerformanceComputing(HPC)networksolutionthatisdesignedtocost-effectivelymeetthegrowth,density,andreliabilityrequirementsoflarge-scaleHPCclusters.
BoththeIntelOPFabricandstandardInfiniBand*(IB)areabletosendInternetProtocol(IP)trafficoverthefabric,orIPoFabric.
Inthisdocument,however,itmayalsobereferredtoasIPoverIBorIPoIB.
Fromasoftwarepointofview,IPoFabricbehavesthesamewayasIPoIB,andinfactusesanib_ipoibdrivertosendIPtrafficovertheib0/ib1ports.
IntendedAudienceTheintendedaudiencefortheIntelOmni-Path(IntelOP)documentsetisnetworkadministratorsandotherqualifiedpersonnel.
IntelOmni-PathDocumentationLibraryIntelOmni-PathpublicationsareavailableatthefollowingURL,underLatestReleaseLibrary:https://www.
intel.
com/content/www/us/en/design/products-and-solutions/networking-and-io/fabric-products/omni-path/downloads.
htmlUsethetaskslistedinthistabletofindthecorrespondingIntelOmni-Pathdocument.
TaskDocumentTitleDescriptionUsingtheIntelOPAdocumentationsetIntelOmni-PathFabricQuickStartGuideAroadmaptoIntel'scomprehensivelibraryofpublicationsdescribingallaspectsoftheproductfamily.
ThisdocumentoutlinesthemostbasicstepsforgettingyourIntelOmni-PathArchitecture(IntelOPA)clusterinstalledandoperational.
SettingupanIntelOPAclusterIntelOmni-PathFabricSetupGuideProvidesahighleveloverviewofthestepsrequiredtostageacustomer-basedinstallationoftheIntelOmni-PathFabric.
Proceduresandkeyreferencedocuments,suchasIntelOmni-Pathuserguidesandinstallationguides,areprovidedtoclarifytheprocess.
Additionalcommandsandbestknownmethodsaredefinedtofacilitatetheinstallationprocessandtroubleshooting.
continued.
.
.
Preface—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
09TaskDocumentTitleDescriptionInstallinghardwareIntelOmni-PathFabricSwitchesHardwareInstallationGuideDescribesthehardwareinstallationandinitialconfigurationtasksfortheIntelOmni-PathSwitches100Series.
Thisincludes:IntelOmni-PathEdgeSwitches100Series,24and48-portconfigurableEdgeswitches,andIntelOmni-PathDirectorClassSwitches100Series.
IntelOmni-PathHostFabricInterfaceInstallationGuideContainsinstructionsforinstallingtheHFIinanIntelOPAcluster.
InstallinghostsoftwareInstallingHFIfirmwareInstallingswitchfirmware(externally-managedswitches)IntelOmni-PathFabricSoftwareInstallationGuideDescribesusingaText-basedUserInterface(TUI)toguideyouthroughtheinstallationprocess.
Youhavetheoptionofusingcommandlineinterface(CLI)commandstoperformtheinstallationorinstallusingtheLinux*distributionsoftware.
ManagingaswitchusingChassisViewerGUIInstallingswitchfirmware(managedswitches)IntelOmni-PathFabricSwitchesGUIUserGuideDescribesthegraphicaluserinterface(GUI)oftheIntelOmni-PathFabricChassisViewerGUI.
Thisdocumentprovidestask-orientedproceduresforconfiguringandmanagingtheIntelOmni-PathSwitchfamily.
Help:GUIembeddedhelpfilesManagingaswitchusingtheCLIInstallingswitchfirmware(managedswitches)IntelOmni-PathFabricSwitchesCommandLineInterfaceReferenceGuideDescribesthecommandlineinterface(CLI)taskinformationfortheIntelOmni-PathSwitchfamily.
Help:-helpforeachCLIManagingafabricusingFastFabricIntelOmni-PathFabricSuiteFastFabricUserGuideProvidesinstructionsforusingthesetoffabricmanagementtoolsdesignedtosimplifyandoptimizecommonfabricmanagementtasks.
ThemanagementtoolsconsistofText-basedUserInterface(TUI)menusandcommandlineinterface(CLI)commands.
Help:-helpandmanpagesforeachCLI.
Also,allhostCLIcommandscanbeaccessedasconsolehelpintheFabricManagerGUI.
ManagingafabricusingFabricManagerIntelOmni-PathFabricSuiteFabricManagerUserGuideTheFabricManagerusesawelldefinedmanagementprotocoltocommunicatewithmanagementagentsineveryIntelOmni-PathHostFabricInterface(HFI)andswitch.
ThroughtheseinterfacestheFabricManagerisabletodiscover,configure,andmonitorthefabric.
IntelOmni-PathFabricSuiteFabricManagerGUIUserGuideProvidesanintuitive,scalabledashboardandsetofanalysistoolsforgraphicallymonitoringfabricstatusandconfiguration.
Thisdocumentisauser-friendlyalternativetotraditionalcommand-linetoolsforday-to-daymonitoringoffabrichealth.
Help:FabricManagerGUIembeddedhelpfilesConfiguringandadministeringIntelHFIandIPoIBdriverRunningMPIapplicationsonIntelOPAIntelOmni-PathFabricHostSoftwareUserGuideDescribeshowtosetupandadministertheHostFabricInterface(HFI)afterthesoftwarehasbeeninstalled.
TheaudienceforthisdocumentincludesclusteradministratorsandMessage-PassingInterface(MPI)applicationprogrammers.
WritingandrunningmiddlewarethatusesIntelOPAIntelPerformanceScaledMessaging2(PSM2)Programmer'sGuideProvidesareferenceforprogrammersworkingwiththeIntelPSM2ApplicationProgrammingInterface(API).
ThePerformanceScaledMessaging2API(PSM2API)isalow-leveluser-levelcommunicationsinterface.
continued.
.
.
IntelOmni-PathFabric—PrefaceIntelOmni-PathIPandLNetRouterDesignGuideJanuary202010Doc.
No.
:H99668,Rev.
:10.
0TaskDocumentTitleDescriptionOptimizingsystemperformanceIntelOmni-PathFabricPerformanceTuningUserGuideDescribesBIOSsettingsandparametersthathavebeenshowntoensurebestperformance,ormakeperformancemoreconsistent,onIntelOmni-PathArchitecture.
Ifyouareinterestedinbenchmarkingtheperformanceofyoursystem,thesetipsmayhelpyouobtainbetterperformance.
DesigninganIPorLNetrouteronIntelOPAIntelOmni-PathIPandLNetRouterDesignGuideDescribeshowtoinstall,configure,andadministeranIPoIBroutersolution(Linux*IPorLNet)forinter-operatingbetweenIntelOmni-PathandalegacyInfiniBand*fabric.
BuildingContainersforIntelOPAfabricsBuildingContainersforIntelOmni-PathFabricsusingDocker*andSingularity*ApplicationNoteProvidesbasicinformationforbuildingandrunningDocker*andSingularity*containersonLinux*-basedcomputerplatformsthatincorporateIntelOmni-Pathnetworkingtechnology.
WritingmanagementapplicationsthatinterfacewithIntelOPAIntelOmni-PathManagementAPIProgrammer'sGuideContainsareferenceforprogrammersworkingwiththeIntelOmni-PathArchitectureManagement(IntelOPAMGT)ApplicationProgrammingInterface(API).
TheIntelOPAMGTAPIisaC-APIpermittingin-bandandout-of-bandqueriesoftheFM'sSubnetAdministratorandPerformanceAdministrator.
UsingNVMe*overFabricsonIntelOPAConfiguringNon-VolatileMemoryExpress*(NVMe*)overFabricsonIntelOmni-PathArchitectureApplicationNoteDescribeshowtoimplementasimpleIntelOmni-PathArchitecture-basedpoint-to-pointconfigurationwithonetargetandonehostserver.
Learningaboutnewreleasefeatures,openissues,andresolvedissuesforaparticularreleaseIntelOmni-PathFabricSoftwareReleaseNotesIntelOmni-PathFabricManagerGUIReleaseNotesIntelOmni-PathFabricSwitchesReleaseNotes(includesmanagedandexternally-managedswitches)IntelOmni-PathFabricUnifiedExtensibleFirmwareInterface(UEFI)ReleaseNotesIntelOmni-PathFabricThermalManagementMicrochip(TMM)ReleaseNotesIntelOmni-PathFabricFirmwareToolsReleaseNotesHowtoSearchtheIntelOmni-PathDocumentationSetManyPDFreaders,suchasAdobe*ReaderandFoxit*Reader,allowyoutosearchacrossmultiplePDFsinafolder.
Followthesesteps:1.
DownloadandunzipalltheIntelOmni-PathPDFsintoasinglefolder.
2.
OpenyourPDFreaderanduseCTRL-SHIFT-FtoopentheAdvancedSearchwindow.
3.
SelectAllPDFdocumentsin.
.
.
4.
SelectBrowseforLocationinthedropdownmenuandnavigatetothefoldercontainingthePDFs.
5.
EnterthestringyouarelookingforandclickSearch.
Useadvancedfeaturestofurtherrefineyoursearchcriteria.
RefertoyourPDFreaderHelpfordetails.
Preface—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
011ClusterConfiguratorforIntelOmni-PathFabricTheClusterConfiguratorforIntelOmni-PathFabricisavailableat:http://www.
intel.
com/content/www/us/en/high-performance-computing-fabrics/omni-path-configurator.
html.
Thistoolgeneratessampleclusterconfigurationsbasedonkeyclusterattributes,includingaside-by-sidecomparisonofuptofourclusterconfigurations.
Thetoolalsogeneratespartslistsandclusterdiagrams.
DocumentationConventionsThefollowingconventionsarestandardforIntelOmni-Pathdocumentation:Note:providesadditionalinformation.
Caution:indicatesthepresenceofahazardthathasthepotentialofcausingdamagetodataorequipment.
Warning:indicatesthepresenceofahazardthathasthepotentialofcausingpersonalinjury.
Textinbluefontindicatesahyperlink(jump)toafigure,table,orsectioninthisguide.
Linkstowebsitesarealsoshowninblue.
Forexample:SeeLicenseAgreementsonpage13formoreinformation.
Formoreinformation,visitwww.
intel.
com.
Textinboldfontindicatesuserinterfaceelementssuchasmenuitems,buttons,checkboxes,keynames,keystrokes,orcolumnheadings.
Forexample:ClicktheStartbutton,pointtoPrograms,pointtoAccessories,andthenclickCommandPrompt.
PressCTRL+PandthenpresstheUPARROWkey.
TextinCourierfontindicatesafilename,directorypath,orcommandlinetext.
Forexample:Enterthefollowingcommand:sh.
/install.
binTextinitalicsindicatesterms,emphasis,variables,ordocumenttitles.
Forexample:RefertoIntelOmni-PathFabricSoftwareInstallationGuidefordetails.
Inthisdocument,thetermchassisreferstoamanagedswitch.
Proceduresandinformationmaybemarkedwithoneofthefollowingqualifications:(Linux)–TasksareonlyapplicablewhenLinux*isbeingused.
(Host)–TasksareonlyapplicablewhenIntelOmni-PathFabricHostSoftwareorIntelOmni-PathFabricSuiteisbeingusedonthehosts.
(Switch)–TasksareapplicableonlywhenIntelOmni-PathSwitchesorChassisarebeingused.
Tasksthataregenerallyapplicabletoallenvironmentsarenotmarked.
IntelOmni-PathFabric—PrefaceIntelOmni-PathIPandLNetRouterDesignGuideJanuary202012Doc.
No.
:H99668,Rev.
:10.
0BestPracticesIntelrecommendsthatusersupdatetothelatestversionsofIntelOmni-Pathfirmwareandsoftwaretoobtainthemostrecentfunctionalandsecurityupdates.
Toimprovesecurity,theadministratorshouldlogoutusersanddisablemulti-userloginspriortoperformingprovisioningandsimilartasks.
LicenseAgreementsThissoftwareisprovidedunderoneormorelicenseagreements.
Pleaserefertothelicenseagreement(s)providedwiththesoftwareforspecificdetail.
Donotinstallorusethesoftwareuntilyouhavecarefullyreadandagreetothetermsandconditionsofthelicenseagreement(s).
Byloadingorusingthesoftware,youagreetothetermsofthelicenseagreement(s).
Ifyoudonotwishtosoagree,donotinstallorusethesoftware.
TechnicalSupportTechnicalsupportforIntelOmni-Pathproductsisavailable24hoursaday,365daysayear.
PleasecontactIntelCustomerSupportorvisithttp://www.
intel.
com/omnipath/supportforadditionaldetail.
Preface—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
0131.
0IntroductionAnIntelOmni-PathArchitecture(OPA)fabricoftenrequiresconnectivitytoaparallelfilesystem.
ItisrecommendedtoattachthestorageunitsdirectlytotheOPAfabricusingahostfabricinterfacetoenabletheuseofRDMAforbulkdatatransfers.
WhenstoragecannotbedirectlyattachedtotheOPAfabric,agateway/routermaybedeployedtoprovidestorageconnectivity.
ForLustre*basedfilesystems,LNetrouterscanbeusedtoconnecttheOPAfabrictoanothernetworkwiththestorageunits.
Inothercases,IProutersmaybeconsidered.
Figure1.
DirectFabricAttachedMulti-homedStorageFigure2.
Lustre*(LNet)RoutingtoInfiniBand*orEthernetFigure3.
IPRoutingtoInfiniBand*orEthernetThisdesignguidedescribesseveralpossibleroutingoptions:LNetRouter-TheLNetrouterisanodethatcanforwardLustre*trafficfromonetypeofnetworktoanotherusingtheLustre*stackinstalledinthenode.
ItisspecificallydesignedtorouteLustre*trafficnativelyfromonenetworktypetoanotherinadditiontotheabilitytoperformload-balancingandfailover.
WhenstoragenodescannotbeattachedtotheOPAfabricwiththeLustre*filesystem,anLNetroutermaybeconfiguredtoprovidenecessaryconnectivity.
LNetroutingisintroducedinLNetRouteronpage44.
IPRouter-TheIProuterisanodethatcanforwardIPpacketsfromonetypeofnetworktoanotherusingadefaultIPstackinLinux.
ItcanbeusedtoprovideIPconnectivitybetweenOPAandothernetworktypes,whenstoragenodescannotbeattachedtotheOPAfabric.
BasedoncardtypesinstalledintheIProuternode,IP-routingbetweenOPAfabricsand10/25/40/50/100/200/400GbpsEthernetand/ortoSDR/DDR/FDR/EDR/HDRInfiniBand*networksusingIPoIBmaybeconsidered.
IProuting,forbothOPA-InfiniBand*andOPA-Ethernet,isintroducedinLinuxIPRouteronpage16.
IntelOmni-PathFabric—IntroductionIntelOmni-PathIPandLNetRouterDesignGuideJanuary202014Doc.
No.
:H99668,Rev.
:10.
0Appendicesprovideadditionalinformationrelatedtofirewallandkeepalivedconfiguration.
SpecialConventionsUsedSpecialconventionsusedinthisdocumentinclude:#precedingacommandindicatesthecommandistobeenteredasroot.
$indicatesacommandistobeenteredasauser.
indicatestheplaceholdertextthatappearsbetweentheanglebracketsistobereplacedwithanappropriatevalue.
1.
1Introduction—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
0152.
0LinuxIPRouterThissectiondiscussestheuseofgateway/routernodesforIProutingbetweenOPAfabricsandexistingInfiniBand*(IB)fabrics,overhigh-speednetworkconnections.
ThebasicdesigncanalsobeusedforIProutingbetweenOPAfabricsandexistingEthernet*networks.
Endusersincludesystemsadministrators,networkadministrators,clusteradministratorsandotherqualifiedpersonneltaskedwithcouplingtheIntelOmni-PathFabrictoexistingIBfabricsorEthernetnetworks.
IPOverFabricThetraditionalmethodforsendingIPtrafficoveranInfiniBand*fabricisIPoverIBorIPoIB(RFC4391,4392).
Inthisdocument,runningIPoIBonanIntelOmni-PathfabricisreferredtoasIPoverFabricorIPoFabric.
IPoFabricsupportsbothdatagrammodeandconnectedmodeconfigurationandoperation.
StartingwiththeIFS10.
9IntelOmni-Pathsoftwarerelease,improvementswereintroducedtomakeIPoFabricrunbetterindatagrammode.
Improvementsinclude:Amulti-queuenetworkinterfaceforIPoFabricthatbetterleveragesexistingOPAhardwarefeaturestoaccelerateparallelsendsandreceivesTheabilitytosupportlargerjumbodatagrammodeMTUsizes,uptoapproximately10K,forimprovedperformance.
TheseIPoFabricperformanceimprovementscandirectlyresultinperformanceimprovementforIProuters,especiallywhenroutingtoanEthernetnetworkthatemploysaJumboMTUsize.
ThoughIPoFabricoffersfeaturesnotfoundinstandardIPoIB,auseralreadyusedtoworkingwithIPoIBshouldfindworkingwithIPoFabricfamiliar;itisgenerallyconfiguredandoperatessimilarly.
BothIPoIBandIPoFabricprovideastandardnetworkdeviceinterfacetotheLinuxnetworkingstack.
Thus,itispossibletoconfigureaLinuxnodetoactasagateway/routernodeforroutingIPtrafficbetweenanInfiniBandinterfaceandanOPAinterfaceonthesamehost,namedasanIProuterinthisdocument.
WithanIProuter,applicationscansharedataandfilesbetweenIBandOPAfabricsusingstandardnetworkprotocols,likeTCP/IP.
InadditiontoroutingTCP/IPpacketsbetweenanIBandanOPAfabric,thesamemethodologycanbeusedtoroutebetweenanOPAfabricandanEthernetnetwork.
Thebiggestdifferenceistheinterface-specificconfigurationusedontherouter,butthegeneralconfigurationandroutingsetupissimilarregardlessofthetypeofthenon-OPAnetwork.
2.
1IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202016Doc.
No.
:H99668,Rev.
:10.
0IllustrativeLinuxIPRouterUseCasesThefollowingthreeexamplesdemonstratesomewaysinwhichanIProuter,orgroupofIProuters,maybeusedtoconnectanOPAfabrictomanagementorstorageonanexternalnetwork.
Thesecasesareillustrative,andarenotintendedtobeexhaustive;othercasesmaybepossible.
SingleRouterThefigurebelowillustratesacasewithasinglerouterconnectingtheIntelOmni-PathfabrictotheIBfabric.
ThisisthesimplestsetuppossibletogainconnectivitybetweentheIntelOmni-PathandIBfabrics.
Thereisnofail-overandnoload-balancing.
Thisisatypicalsetupusedfortestinganddevelopment.
Figure4.
SingleRouterExampleItispossibletoincreasereliabilityofasinglerouterconfigurationbyaddingasecondnetworkinterfaceandusingIPoIBbondinginactive/standbymode.
Inthecaseofanadapterorcablefailure,thetrafficwillusethesecondaryIPoFabricinterface.
RefertoIPoIBBondingsectionoftheIntelOmni-PathFabricHostSoftwareUserGuidefordetails.
VirtualRouterswithFault-ToleranceThissetupincludestworoutersrunningVirtualRouterRedundancyProtocol(VRRP)toprovidefault-toleranceandfail-over.
Ifasinglerouterfailswiththissetup,havingtworoutersprovidesautomaticfail-overofthevirtualIPaddressusedforroutingbetweenthesubnetsfromonephysicalrouternodetoadifferentphysicalrouternode.
2.
22.
2.
12.
2.
2LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
017Figure5.
Fault-tolerantRouterPairExampleAvariantofthisconfigurationusestwovirtualaddresses.
Eachrouterismasterforoneoftheaddresses,withtheoppositeonebeingthebackup.
Thisallowsfullutilizationofbothrouterswhenbothareactive.
Afailureofeitheronemovesthefailedrouter'sprimaryvirtualaddresstotheremainingrouter.
MultipleActiveRouterswithFaultToleranceThissetupcontainsthreerouters:twomastersandonebackup.
Inthisscenariothetrafficloadisdistributedacrossthetwomasterrouterstoallowforgreateraggregatethroughput.
TheroutersareonthesameIPsubnet,andtrafficisdirectedtoonerouterortheotherusingstaticroutesintheendnodes.
Asinpreviousexamples,keepalivedisusedtoprovideredundancy,thistimeinanN+1fashion.
Inthisexample,router1hasVIPs192.
168.
100.
1and192.
168.
200.
1,whilerouter2hasVIPs192.
168.
100.
2and192.
168.
200.
2.
Thebackuproutersitsinstandbymodetoaccommodatethefailureofoneorbothmasterrouters.
NOTEThereareotherschemesthatcouldbeusedtodistributeloadacrossmultiplerouters;thisexampleisjustonescheme.
2.
2.
3IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202018Doc.
No.
:H99668,Rev.
:10.
0Figure6.
LoadDistributionAcrossMultipleRoutersExampleAvariantofthisconfigurationusesthreevirtualIPaddresses.
Thebackuprouterforthefirsttwovirtualaddressescanserveasprimaryrouterforthethirdaddress.
Oneofthefirsttworouterscanserveasbackupforthethird.
Thisallowsfullutilizationofallrouterswhiletheyareactive.
HardwareRequirementsAserverwithatleastoneIntelXeonProcessorwitheightormorecores,PCIe*Gen3x16linkinterfaces,andaminimumof8GBRAMmayberequiredfortheminimumtestingrequirement.
Forbestperformance,allIOchannelsonthememorycontrollershouldbepopulatedequally,inabalancedconfiguration.
Systemsusingdual-portoradditionalHFI/HCAadaptersmaybeabletotakeadvantageofasecondIntelXeonProcessor.
TheneedsofIProutingapplicationsgenerallydifferfromtheneedsofcomputeapplicationsforperformance.
ForIProutingapplications,Intelgenerallyrecommendsuseofserversthatareabletorunatahigherper-coreclockfrequency,comparedtoserversthatmayhavemorecoresthatrunatalowerper-coreclockfrequency.
However,thesearesimplygeneralizations,andnotspecificrecommendations.
Asaknowledgeablereadercanappreciate,serverandplatformtechnologyevolvesquickly,andarouter'sperformancewillbedependentonmanyfactors.
Thesefactorsmayincludetypesandspeedofmemoryused,typesofprocessor(s),numberofsockets,numberofcorespersocket,clockspeedsofcores,linkspeeds,characteristicsoftheplatform'sPCIeandbusarchitecture,physicalinterfacecharacteristics,BIOSsettings,kernelanddriversoftwareversionsandtunings,testconfiguration,packetsizes,andworkloadcharacteristicsjusttonameafew.
Thisisnotanexhaustivelist.
Manyfactors,includingcontentionforavailablePCIe,interrupt,andmemorybandwidth,inadditiontoavailabilityofsuitablecardslotsinaplatformandotherfactorsmaylimit2.
3LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
019thepracticalbenefitsofaddingadditionalHFIand/orothernetworkadaptercardstocreatearouternode.
Forthesereasons,itisadvisedtoconsultwithanapplicationsspecialistwhenplanningyourrouternodes.
Whenplanningforarouter,inadditiontotheplatformconsiderations,additionalcomponentsneedtobeconsidered::IntelOmni-PathFabricHFI(oneormoreforeachrouter)QDR/FDR/EDRHCA,or10/25/40/50/100GbpsEthernetcards(one,ormore,foreachrouter)Interfacecablestoconnecttotherouter'sOPAport(s)Interfacecablestoconnecttotherouter'sIB/Ethernetport(s)AvailableswitchportstoconnectthosecablestoineachroutednetworkRackspaceandpowerfortheroutersSoftwareRequirementsBIOSSettingsSettingthesystemBIOSisanimportantstepinconfiguringaclustertoprovidethebestmixofapplicationperformanceandpowerefficiency.
Inthefollowing,wespecifysettingsthatshouldmaximizetheIntelOmni-PathFabricandapplicationperformance.
Optimallysettingssimilartotheseshouldbeusedduringaclusterbring-upandvalidationphasetoshowthatthefabricisperformingasexpected.
Forthelongterm,youmaywanttosettheBIOStoprovidemorepowersavings,eventhoughthatwillreduceoverallapplicationandfabricperformancetosomeextent.
ForBIOSsettings,referencetheIntelOmni-PathFabricPerformanceTuningUserGuide.
RHEL*andSLES*InstallationTheRedHat*EnterpriseLinux(RHEL)orSUSE*SPx(SLES)softwareversionusedmustbeonthelistofversionssupportedbyIntelOPA,availablefromtheIntelResource&DesignCenter(https://www.
intel.
com/content/www/us/en/design/products-and-solutions/networking-and-io/fabric-products/omni-path/downloads.
html).
ForadditionalOSsoftwarerequirements,refertotheIntelOmni-PathFabricSoftwareReleaseNotesandIntelOmni-PathFabricSoftwareInstallationGuide.
NOTEInstructionsforCentOSandScientificLinuxarethesameasforRHEL.
NOTEYoumayberequiredtodisablethefirewallandSELinuxduringinstallation.
PleaseseeFirewallConfigurationforVRRPonRHELonpage63foradditionalinformation.
2.
42.
4.
12.
4.
2IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202020Doc.
No.
:H99668,Rev.
:10.
01.
DisablefirewalldduringtheIntelOmni-PathBasicinstallationperiod:#systemctldisablefirewalld#systemctlstopfirewalld2.
DisableSELinuxduringtheinstallperiod:#setenforce03.
Edit/etc/sysconfig/selinuxandchangeSELINUX=enforcingtoSELINUX=disabledtokeepSELinuxdisabledonreboot.
IntelOmni-PathSoftwareInstallationSoftwareRequirementsIntelOmni-PathIFSonIntelOmni-PathnodeIntelOmni-PathBasiconCoexistroutersWhenroutingbetweenOPAandIB,eachfabricrequirestheirownsubnetmanager,andneitherfabric'ssubnetmanagershouldberunontherouternode.
TheIntelOmni-PathFabricSuiteFabricManagershouldNOTbeenabledonthecoexistrouters,norshouldtheIBsubnetmanager.
Wheninstallingsoftware,theIntelOmni-PathBasicinstallpackageshouldbeusedontherouternodebecauseitdoesnotcontaintheIntelOmni-PathFabricSuiteFabricManager,northeInfiniBand*opensmservice.
NOTEUnlessspecificallyidentified,theinstructionsforIntelOmni-PathsoftwareinstallationareforbothRHEL*andSLES*.
InstallationSteps1.
InstalltheIntelOmni-PathBasicbuildontheroutersusinginstructionsfromtheIntelOmni-PathFabricSoftwareInstallationGuideandapplytuningsforIPoFabricasrecommendedbytheIntelOmni-PathFabricPerformanceTuningUserGuide.
2.
4.
32.
4.
3.
12.
4.
3.
2LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
021NOTEToensurethatthemlx4_ibmoduleloadscorrectlyatboottime,addthefollowinglineto/etc/modprobe.
d/mlx4.
conf:installmlx4_ib/usr/sbin/modprobe--ignore-install-fmlx4_ibThisisnecessarytoensurethatmodprobeskipsthecheckofsymbolsignaturesthataredifferentfromIFS-builtib_mad,ib_sa,ib_umadmodulessharingthesameAPI.
Youshouldalsoseethefollowingmlxmodules:#lsmod|grepmlxmlx4_ib1585522mlx4_en945300vxlan374091mlx4_enib_sa339495rdma_cm,ib_cm,mlx4_ib,rdma_ucm,ib_ipoibib_mad611795hfi1,ib_cm,ib_sa,mlx4_ib,ib_umadptp189332igb,mlx4_enmlx4_core2542862mlx4_en,mlx4_ibib_core8831112hfi1,rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,ib_mad,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoibNOTEmlx4_ibmightbereplacedbymlx5_ibonlaterversionsofthesoftware.
ConsultInfiniBand*vendordocumentationformoredetails.
2.
WithIntelOmni-PathsoftwareinstalledandtheHFI1andmlxdriversloaded,confirmthatbothdevicesareactiveontheirrespectivefabrics.
a.
TochecktheIntelOmni-PathHFI,runopainfo:#opainfohfi1_0:1PortGID:0xfe80000000000000:0011750101574238PortState:ActiveLinkSpeedAct:25GbEn:25GbLinkWidthAct:4En:4LinkWidthDnGrdActTx:4Rx:4En:3,4LCRCAct:14-bitEn:14-bit,16-bit,48-bitMgmt:TrueLID:0x00000006-0x00000006SMLID:0x00000001SL:0QSFP:PassiveCu,2mTEConnectivityP/N2821076-2RevBXmitData:961MBPkts:13540087RecvData:28702MBPkts:15932547LinkQuality:5(Excellent)IntegrityErr:0ErrRecovery0NoticethatthePortStateisActive.
IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202022Doc.
No.
:H99668,Rev.
:10.
0b.
TochecktheInfiniBand*HCA,runibstat:#ibstatCA'mlx4_0'CAtype:MT4099Numberofports:1Firmwareversion:2.
34.
5000Hardwareversion:0NodeGUID:0x0002c903003d5bf0SystemimageGUID:0x0002c903003d5bf3Port1:State:ActivePhysicalstate:LinkUpRate:56Baselid:17LMC:0SMlid:5Capabilitymask:0x02514868PortGUID:0x0002c903003d5bf1Linklayer:InfiniBandNoticethatStateisActive.
3.
IftheIntelOmni-PathHFIisinactive,confirmthattheIntelOmni-PathFabricSuiteFabricManager(opafm)isrunningoneithertheIntelOmni-PathswitchoranIntelOmni-Pathnode.
IftheIBHCAisinactive,checkthattheIBsubnetmanager(opensm)isrunningontheIBswitchoranIBnode.
RouterConfigurationNetworkInterfaceNamingConsistencyNetworkInterfaceNamingUsingExplicitDriverLoadingOnLinux,theorderinwhichtheHFI/HCAdriversareloadeddetermineswhichinterfacename(ib0,ib1…)isassociatedwiththedriver.
Thenetworkinterfaceconfigurationfilesin/etc/sysconfig/network-scriptsforRHEL*or/etc/sysconfig/networkforSLES*dependontheappropriatedriverbeingassignedtothecorrectinterface.
Youcaneitherobservethebehavioronyoursystemafterthecardshavebeeninstalledoryoucanexplicitlycontroltheorderinwhichthedriversareinstalled.
Forcontrollingdriverinstallation,youmustfirst"blacklist"thedriversinvolvedbyusingatextfilelocatedin/etc/modprobe.
dcontainingsomethingsimilartothefollowing(exampleisforroutingOPA-IB):#BlacklisttheInfiniBand/IntelOmni-Pathdriverstopreventthe#systemautomaticallyloadingthematstartup.
blacklistmlx4_coreblacklisthfi12.
52.
5.
12.
5.
1.
1LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
023Ascriptstoredin/etc/sysconfig/modulesisthenusedtostartthedriversinaparticularorder:#!
/bin/sh#Influencetheorderoftheinterfaces-thefirstoneloaded#getsib0thenextib1,etc.
if[!
-c/dev/ipath0];then/sbin/modprobe--forcemlx4_core>/dev/null2>&1fiif[!
-c/dev/hfi1];then/sbin/modprobehfi1>/dev/null2>&1fiNetworkInterfaceNamingUsingudevAnothermethodforsettingdevicenamesisthroughudev.
Inthefollowingexample,youseethe/etc/udev/rules.
d/70-persistent-ipoib.
rulesfile.
Thedirectionsinthefileareveryclearandtothepoint.
>>>>>>#Thisisasampleudevrulesfilethatdemonstrateshowtogetudevto#setthenameofIPoIBinterfacestowhateveryouwant.
Thereisa#16characterlimitonnetworkdevicenames.
##Importantitemstonote:ATTR{type}=="32"isIPoIBinterfaces,andthe#ATTR{address}matchmuststartwith*andonlyreferencethelast8#bytesoftheaddressorelsetheaddressmightnotmatchonanygiven#startoftheIPoIBstack##Note:asofrhel7,udeviscasesensitiveontheaddressfieldmatch#andalladdressesneedtobeinlowercase.
#ACTION=="add",SUBSYSTEM=="net",DRIVERS=="*",ATTR{type}=="32",ATTR{address}=="*00:11:75:01:01:57:51:59",NAME="ib0"ACTION=="add",SUBSYSTEM=="net",DRIVERS=="*",ATTR{type}=="32",ATTR{address}=="*00:02:c9:03:00:33:13:81",NAME="ib1"mtu65520qdiscpfifo_faststateUPqlen256link/infiniband80:00:00:02:fe:80:00:00:00:00:00:00:00:11:75:01:01:57:42:38brd00:ff:ff:ff:ff:12:40:1b:80:01:00:00:00:00:00:00:ff:ff:ff:ffinet192.
168.
200.
11/24brd192.
168.
200.
255scopeglobalib0valid_lftforeverpreferred_lftforeverinet6fe80::211:7501:157:4238/64scopelinkvalid_lftforeverpreferred_lftforever5:ib1:mtu65520qdiscpfifo_faststateUPqlen256link/infiniband80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:3d:5b:f1brd00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ffinet192.
168.
100.
11/24brd192.
168.
100.
255scopeglobalib1valid_lftforeverpreferred_lftforeverinet192.
168.
100.
1/32scopeglobalib1LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
025valid_lftforeverpreferred_lftforeverinet6fe80::202:c903:3d:5bf1/64scopelinkvalid_lftforeverpreferred_lftforeverEnableforwardingontherouters:1.
Thecoexistingroutersneedtoabletoforwardpackets.
Foraquicktestofthefunctionality:#echo1>/proc/sys/net/ipv4/conf/ib0/forwarding#echo1>/proc/sys/net/ipv4/conf/ib1/forwarding2.
Forpersistencethroughareboot,addthefollowingentriesto/usr/lib/sysctl.
d/00-system.
conf:net.
ipv4.
conf.
ib0.
forwarding=1net.
ipv4.
conf.
ib1.
forwarding=1ThiswilllimitIPforwardingtoonlytheIB/IntelOmni-Pathinterfaces.
ConfiguringNetworkDeviceswithyast2forSLES*TheIntelOmni-PathBasicinstallationwillaskifyouwanttoconfiguretheIPoIBnetworkscripts.
Youcanacceptthisoruseyast2toconfiguretheinterfacesorcreatethefilesmanuallyin/etc/sysconfig/network/.
Thefollowingisanexampleofthe/etc/sysconfig/network/ifcfg-ib0filecreatedbyyast2:BOOTPROTO='static'BROADCAST=''ETHTOOL_OPTIONS=''IPADDR='192.
168.
100.
10/24'IPOIB_MODE='connected'MTU='65520'NAME=''NETWORK=''REMOTE_IPADDR=''STARTMODE='auto'10/40/100GbitEthernetTheIntelOmni-PathFabricLinuxroutercanbeusedwith10/40/100GbitEthernetdevices.
Thefollowingcommandusesthenmclicommandtoconfigurea10/40/100GbitEthernetdevice:#nmcliconaddtypeethernetcon-nameeth2ifnameeth2mtu9000ip4192.
168.
100.
20/24NOTEIntelrecommendsanMTUof9000(jumboframe)for10/40/100GbitEthernet.
2.
5.
1.
42.
5.
2IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202026Doc.
No.
:H99668,Rev.
:10.
0ClientConfigurationTotesttheroutersaminimumofoneIntelOmni-PathnodeandoneIBnode(asstandalonerepresentativesoftheirrespectivefabrics,apartfromthesubnetmanagernode)needtobeconfiguredandconnectedtothesameswitchedenvironmentthattheroutersareon.
ThenetworkconfigurationonRHEL*andSLES*usesthesamesetupasmentionedpreviouslytoconfiguretheIntelOmni-PathHFIortheIBHCAnetworkdevices.
NOTETherouternodeshouldnotberunningtheIntelOmni-PathFabricSuiteFabricManagerortheIBsubnetmanager.
IntelOmni-PathNodeBuildTheIntelOmni-PathnodeinthisenvironmentshouldbebuiltfromtheIntelOmni-PathIFSpackage.
ThisprovidesallthepackagesfromtheIntelOmni-PathBasicbuildandaddstheIntelOmni-PathFabricSuiteFabricManager.
IftheFabricManagerisrunningontheIntelOmni-PathswitchthentheinstallationoftheIntelOmni-PathBasicsoftwarewillbesufficient.
IBNodeBuildTheIBnodenetworkinterfaceworksdirectlyfromabasebuildofRHEL*orSLES*,andasubnetmanagercanbeaddedwiththeadditionoftheopensmpackageifthesubnetmanagerisnotrunningontheInfiniBand*switch.
ForboththeIntelOmni-PathandIBnodes,astaticrouteneedstobeaddedtothenetworkconfigurationfortheclientnodestousetheLinuxrouterasagatewaytotheopposingfabric.
Basedontheexamplenetworktopologyprovidedinthisdocument,therouteswillbeaddedasdescribedinthefollowingsections.
AddingStaticRoutesOntheIBnode:#iprouteadd192.
168.
200.
0/24via192.
168.
100.
1Thiscanbewrittento/etc/sysconfig/network-scripts/route-tokeeptheroutingtablespersistentthroughareboot.
Example:#cat/etc/sysconfig/network-scripts/route-ib0192.
168.
100.
0/24via192.
168.
200.
1devib02.
62.
6.
12.
6.
22.
6.
2.
1LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
027NOTESometimestheaboverouteisNOTaddedatsystemboot.
ThisisduetotheIntelOmni-PathandIBinterfacesnotbeingfullyoperationalonthefabricwhentheNetworkManagerservicestartsduetodelaystoanactivestatefromthefabricorsubnetmanagers,causingthestaticrouteconfigurationtofail.
Aworkaroundistoaddthefollowingto/etc/rc.
d/rc.
localonRHEL*or/etc/rc.
d/after.
localonSLES*:1.
Setexecutepermissionson/etc/rc.
d/rc.
local:#chmod+x/etc/rc.
d/rc.
localTherc.
localfile:sleep30#DelaysaddingthestaticrouteadditionuntiltheHFI/HCAisup.
/usr/sbin/iprouteadd192.
168.
100.
0/24via192.
168.
200.
1logger–pinfo"STATICROUTEDADDED"#Usethistoconfirmthattherc.
localsriptruns.
"2.
Starttherc-localservicebeforerebootingthesystemtoactivatetheservice.
#systemctlstartrc-local.
serviceAtthispoint,withbothroutersupandconnectedtothefabricsyoushouldbeabletopingnodesfromtheIntelOmni-PathfabricsubnetthroughtheroutertotheIBfabricsubnet.
RouterRedundancy/FailoverwithVRRPv3VirtualRouterRedundancyProtocol(VRRP)v3isusedtoprovidefault-tolerance,failoverandload-balancingontherouters.
VRRPv3isusedinthisdocumentation.
VRRPv3ispartofthekeepalivedservice.
VRRPv3RFC5798isavailablehere:https://tools.
ietf.
org/html/rfc5798.
PrerequisitesPriortousingrouterredundancy/failoverwithVRRP,thefailoverroutesneedtobespecifiedontheroutersthemselves.
Thenwhenonerouterfails,communicationwilltransfertotheHFIontheotherrouter.
ExampleTworouters,eachcontainingtwoHFIs:router01:ib0,192.
168.
1.
1(intendedtofailoverto192.
168.
1.
3)ib1,192.
168.
1.
2(intendedtofailoverto192.
168.
1.
4)router02:ib0,192.
168.
1.
32.
72.
7.
1IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202028Doc.
No.
:H99668,Rev.
:10.
0ib1,192.
168.
1.
4Thefollowingroutesarespecifiedontherouters:router01:iprouteadd192.
168.
1.
3devib0iprouteadd192.
168.
1.
4devib1router02:iprouteadd192.
168.
1.
1devib0iprouteadd192.
168.
1.
2devib1keepalivedInstallationkeepalivedmustbeversion1.
3.
6orlater.
Ifkeepalivedislessthan1.
5.
0,thenseeUsingkeepalivedVersion1.
3.
5onpage64foradditionalconfigurationinformationregardinggratuitousarps.
InstallingkeepalivedonRHELusingyum:#yuminstallkeepalived#systemctlenablekeepalivedSpecificversionsofkeepalivedcanbeinstalledbyusingrpm.
Theuserwillneedtoknowthelistofpackagedependenciesforkeepalived.
Additionally,theusermaydownloadandcompileanyversionofkeepalived.
Thesourcesareavailableathttp://www.
keepalived.
org/download.
htm.
Tocompilekeepalivedfromsource,itispossiblethatotherlibrariesusedbykeepalivedmayneedtobeinstalled.
Instructionsanddocumentationareavailableatwww.
keepalived.
org.
IPRoutingGuidanceInscenarioswheretwoIPoFabricinterfacesonthesamehost(forexample,ib0/ib1)areconfiguredonthesamesubnet,caremustbetakentodefinetheLinuxroutingtablesforthedesiredbehavior.
ThisisbecauseLinuxmayalwaysuseib0tosendreturnpacketsevenifthecommunicationrequestisinitiatedoverib1.
ThisalsoshouldbeconfiguredforproperVRRPoperation.
HostA:ib0:192.
168.
1.
1ib1:192.
168.
2.
1HostB:ib0:192.
168.
1.
2ib1:192.
168.
2.
22.
7.
22.
7.
3LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
029Ifanetmask,suchas255.
255.
0.
0,isdefinedfortheabovehosts,communicationoverib1betweenthehostsmaynotworkcorrectly.
Inthisscenario,definethefollowingroutesonbothhosts:iprouteadd192.
168.
1.
0/24viaib0iprouteadd192.
168.
2.
0/24viaib1Alternatively,youcanaddthesedefinitionstoroute-ib0/ib1files.
RelatedLinksAddingStaticRoutesonpage27EnsuringkeepalivedSuccessfullyLaunchesatBootTimeDuetothedelaycausedbythesubnetmanager/fabricmanagerservicesinbringingtheIBandIntelOmni-Pathdevicestoanactivestate,VRRPmayfailatboottime.
Thesolutionistoaddthislinetothe/etc/systemd/system/keepalivedxmlfile:"After=NetworkManager-wait-online.
service"Thisdelaysthestartofkeepalivedserviceuntilthenetworkdevicesareinanactivestate.
EnabletheNetworkManager-wait-online.
service:#systemctlenableNetworkManager-wait-online.
serviceThefollowingisanexampleofthekeepalived.
xmlfile:[Unit]Description=LVSandVRRPHighAvailabilityMonitorAfter=syslog.
targetnetwork.
targetAfter=NetworkManager-wait-online.
service[Service]Type=forkingKillMode=processEnvironmentFile=-/etc/sysconfig/keepalivedExecStart=/usr/sbin/keepalived$KEEPALIVED_OPTIONSExecReload=/bin/kill-HUP$MAINPID[Install]WantedBy=multi-user.
targetPlacethekeepalived.
xmlfilein/etc/systemd/system/.
ConfiguringkeepalivedTherearetwofilesinvolvedinconfiguringkeepalived.
Thefirstis/etc/keepalived/keepalived.
conf.
Thisistheconfigurationfilethatdefineskeepalivedkeywordsettings.
Seethemanpagekeepalived.
conf(5)forcompleteinformation.
2.
7.
42.
7.
5IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202030Doc.
No.
:H99668,Rev.
:10.
0Thesecondfileis/etc/sysconfig/keepalived.
Thisfilesetstheruntimeoptionsforkeepalived.
Twousefuloptionsare-Dfordetailedlogmessagesand-PtorunonlytheVRRPservice.
Thesearebothrecommendedsettingswhenbuildingandtestingtheroutersforfault-tolerance.
Thebasicsetupforfault-toleranceandautomaticfailoveristoconfigureonerouterasthemasterforbothsubnetvirtualIPsandtheotherasabackup.
Intheeventthemasterrouterfails,thebackuprouterwillimmediatelyassumethemasterrole.
Anotheroptionistoconfiguretherouterssothattheysplitthetraffic,withonerouterasthemasterononesubnetandthesecondrouterasthebackuponthesecondsubnet.
Thissetuphasthepotentialtoreduceloadonanindividualrouteriftrafficisgoingbackandforthbetweenthefabrics.
Belowisaconfigurationexampleforasimplemaster/backuprouterpair.
/etc/keepalived/keepalived.
confforrouter1(master):vrrp_instanceVI_1{stateMASTERinterfaceib0virtual_router_id1priority250authentication{auth_typePASSauth_passpassword}virtual_ipaddress{192.
168.
200.
1}}vrrp_instanceVI_2{interfaceib1stateMASTERvirtual_router_id2priority250authentication{auth_typePASSauth_passpassword}virtual_ipaddress{192.
168.
100.
1}}Thefollowingisanexampleof/etc/keepalived/keepalived.
confforrouter2(backup):vrrp_instanceVI_1{stateBACKUPinterfaceib0virtual_router_id1priority100authentication{auth_typePASSauth_passpassword}virtual_ipaddress{192.
168.
200.
1}}vrrp_instanceVI_2{LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
031interfaceib3stateBACKUPvirtual_router_id2priority100authentication{auth_typePASSauth_passpassword}virtual_ipaddress{192.
168.
100.
1}}}NOTEIntheabovekeepalived.
conffiletheauth_typevalueisPASS,whichtransmitsthepasswordinplaintext.
Anotheroptionisauth_typeAH,whichusesIPSECandtransmitsanencryptedpassword.
DiscussionofoverallVRRPprotocolsecurityisoutsideofscopeofthisdocument.
PleaseseeRFC5798formoreinformationathttps://tools.
ietf.
org/html/rfc5798.
PerformanceBenchmarkingandTuningConcerningMaximumTransmissionUnits(MTUs)ThereareMaximumTransmissionUnits(MTUs)thatapplybothtothedatagramsoftheInternetorIPlayerandtothepacketsoftheLinklayeroneithersideoftheIProuter.
ThefollowingdiagramshowsthelayeredTCP/IPprotocolstack.
Figure7.
MappingofNetworkLayersandIPStackLayersThetransportlayerperformshost-to-hostcommunicationsoneitherthesameordifferenthostsandoneitherthelocalnetworkorremotenetworksseparatedbyrouters.
TheTransmissionControlProtocol(TCP)providesflow-control,connectionestablishment,andreliabletransmissionofdata.
Theinternetworklayerhasthetaskofexchangingdatagrams(a.
k.
a.
packets)acrossnetworkboundaries.
ThislayerdefinestheaddressingandroutingstructuresusedfortheTCP/IPprotocolsuite.
TheprimaryprotocolinthisscopeistheInternetProtocol(IP),whichdefinesIPaddresses.
IPoIBandIPoFabricprovidenetworkinterfacesatthislayer.
2.
82.
8.
1IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202032Doc.
No.
:H99668,Rev.
:10.
0Thelinklayer,implementedbytheIntelOmni-PathfabricononesideoftheIProuter,andbyInfiniBand*orEthernetoranothernon-OPAtechnologyontheothersideoftheIProuter,includestheprotocolsusedtodescribethelocalnetworktopologyandtheinterfacesneededtotransmitpacketstonext-neighborhostsorswitches.
PathMTUInthecontextofIProuting,itistheeffectivesizeofthelargesttransmissionunitthatcanbesentendtoendwithoutfragmentationbetweentwocommunicatinginternetprotocol(IP)endpointsthatwewishtomaximizeforefficiencyofbulkdatatransfers.
Bydefinition,theeffectivepathMTUbetweencommunicatingendpointscannotbelargerthantheMTUofeitherendpoint,andthisinformationisexchangedduringtheSYNmessageofTCPconnections,sotheuseoflargerMTUsizesonallcommunicatingendpointsisencouraged.
ThelargestlinklevelMTUsizesupportedontheIPoFabricinterfacedependsonifIPoFabricisoperatingindatagrammodeorconnectedmode.
DatagramModeMTUConfiguring"jumbo"Ethernetdatagramsizes,andoperatingtherouterindatagrammodecanhelpwithrouterefficiency.
ForIPoFabricbandwidthbenchmarks,aprerequisiteforgoodthroughputperformanceistoutilizealinklayerMTUthatatleastmatchesthelinklayerMTUofwhatisonthenon-OPAsideoftherouter.
Forexample,whenanIProuterisroutingfromOPAtoanEthernetlinkwithaconfiguredjumboMTUsizeof9000bytes,itisbeneficialtoincreasetheIPoFabricMTUsizeontheOPAlinkaswell,greaterthanitsdefaultof2044bytes.
PriortotheOPAIFS10.
9softwarerelease,thelargestsizeconfigurablewas4092bytes,constrainedbywhatisdefinedforInfiniBandinterfaces.
WiththeIFS10.
9release,thedatagrammodeMTUforIPoFabriccanbespecifiedlarger,uptowhatfitsinanOPApacket.
Accommodatingthe4byteIPoIBheader,IPoFabricdatagramMTUsizesofupto10236bytescannowbeconfigured.
DetailsonhowtoconfigurethislargerdatagrammodeIPoFabricMTUsizeareprovidedintheIPoFabricsectionoftheIntelOmni-PathFabricPerformanceTuningUserGuide.
Figure8.
LargerMTUSizeEnablesFullOPA-sizePayloadinIPoFabricDatagramPacketsConnectedModeMTUAnotherwaytousealargereffectiveMTUsizefortransfersacrosstheOPAfabricistoutilizeIPoFabricinconnectedmode.
Whenoperatinginconnectedmode,dataissegmentedandre-assembled,asnecessary,todeliverapayloadfromoneOPAnode2.
8.
1.
12.
8.
1.
22.
8.
1.
3LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
033toanotheracrossthefabric.
Theunderlyingsegmentationandre-assemblyallowsforanIPlayerMTUtobespecifiedlargerthanwhatcanfitinasingleOPApacket,uptothemaximumsizesupportedbythenetworkingstack,whichis65520bytes.
WhenoperatingIPoFabricinConnectedMode,itisrecommendedtobeconfiguredwitha65520byteIPlayerMTU.
ThiscanbebeneficialforbulkdatatransfersoveranOPAfabric;however,notethatconnectedmodebecomeslessefficientforbulkdatatransfersusingsmallersegmentsizes.
WiththeintroductionofAcceleratedIPoFabricfordatagrammodeinIFS10.
9,itmaybeadvantageoustousedatagrammodewhenimplementinganIProuter(unlesstheroutingistoIPoIBalsoinconnectedmode).
NOTETheIntelOmni-PathandIBlinklevelfabricinterfacesonanIProuterareindependentlyconfigurableinterfaces.
ConnectedmodeforIPoFabricontheIntelOmni-PathfabricHFIinterfacecanbesetindependentlyfromthemodeofIPoIBconfiguredonthetheIBHCAinterface,andindependentlyofib_ipoib'sipoib_enhancedsettingthatmayhavetobeturnedofftoallowconnectedmodeontheIPoIBinterface.
ThesectionsonRouterConfigurationonpage23andClientConfigurationonpage27discussmethodsforconfiguringIPoIBtoautomaticallystartuponrebootandhowtoconfigurethenodesforIPoIBConnectedModeoperation.
OnewaytodoaquickcheckifyouareinconnectedmodewiththecorrespondingIPMTUsizeistocattwofiles:#cat/sys/class/net/ib0/modeconnected#cat/sys/class/net/ib0/mtu65520BIOSTuningRefertoBIOSSettingsonpage20forinformationaboutthedesiredBIOSsettingsforgoodperformanceforclientorrouternodes.
Herewejustemphasizethatfortherouternodes,IntelrecommendssettingHyper-ThreadingTechnologytoDisabled.
OPAInterfaceTuningDatagramModeorConnectedMode-WhichisBetterPriortotheIFS10.
9softwarerelease,performancewithconnectedmodegenerallyofferedbetteraggregatebandwidthperformancethanwithdatagrammode.
WiththeintroductionofAcceleratedIPoFabricfordatagrammodeintheIFS10.
9release,datagrammodemaybebetterperformingthanconnectedmodeforsomeapplications.
Theanswertowhichmodeisbetterforafabricmaydifferbasedontheapplication.
ForIProuting,withtheadventofAcceleratedIPoFabric,datagrammodemaybepreferred.
2.
8.
22.
8.
32.
8.
3.
1IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202034Doc.
No.
:H99668,Rev.
:10.
0Bydefault,theOPAFastFabrictoolswillinstallandconfigureconnectedmodefornewinstallations.
Installersmaywanttotryafewtargetapplicationrunsindatagrammodeaftertheinitialinstallationcompletesinconnectedmodetohelpdecidewhatmodewouldbebesttoleavethesysteminforoperations.
AcceleratedDatagramModeAcceleratedIPoFabricisenabledbydefaultforallOPAIPoFabricinterfacesonahost.
AnIPoFabricinterfaceoperatesindatagrammodeunderavarietyofconditions.
Datagrammodecommunicationsareusedforbroadcastandmulticastsendsandreceives,andwheneither,orbothofapairofcommunicatingnodesdoesnotsupportconnectedmode.
Indatagrammode,theMTUisdeterminedbythelesseroftheMTUconfiguredontheinterface,orthemulticastMTUsizedefinedbytheFabricManagerforthevirtualfabricbeingusedforIPoFabriccommunicationonthefabric.
ToincreasetheMTUallowedfordatagramcommunicationsbeyondthedefault2044value,themulticastMTUsizeforIPoFabricnetworkingneedstobeincreasedfromtheFabricManager,asdescribedintheIntelOmni-PathFabricPerformanceTuningUserGuide.
Ifauserneedstoreverttonon-accelerateddefaultIPoFabricforsomereason,itispossibletodosobysettingoptionshfi1ipoib_accel=0inthelocal/etc/modprobe.
d/hfi1.
confconfigurationfile,andthenreloadingthehfi1driver.
Tomakethischangepersistentacrossrebootsitisadvisedtorecreateinitrdusing:#dracut–fThecurrentstatusofAcceleratedIPoFabricmoduleparameterforanIPoFabricinterfacecanbedeterminedbyexaminingthe/sys/module/hfi1/parameters/ipoib_accelvalue,whereareturnvalueof1indicatesavailableandenabled.
AcceleratedIPoFabricTuningsAnIProuterisaspecializedapplication.
WhenconfiguringanIProuterforbestperformance,therearesomerecommendationsforIPoFabricthatdifferfromwhatisdescribedintheperformancetuningguide.
Thosedifferencesaredescribedhere.
ItisnotrecommendedtosetXPSandRPSontherouterifyouareusingDatagramModewithAcceleratedIPoFabric,whichalreadyemployscorespreading.
IProuternodeswillhaveinterruptstoprocessfromInfiniBand*orEthernetinterfacesinadditiontointerruptstoprocessfromOPAinterfaces.
ToconserveIRQsontherouter,adjustmentsarerecommendedtobemadetokrcvqsandnum_sdma.
WhenAcceleratedIPoFabricisenabled(asdiscussedintheIntelOmni-PathFabricPerformanceTuningUserGuide)allocatedkernelreceivequeuesarenotusedthewaytheyareinconnectedmode,andtheircountshouldbeminimaltofreecoresforotherpurposes(setkrcvqs=1).
Inaddition,forproperoperationoftherouter,ensurethatnum_sdmaislimitedto8insteadofthedefault16,fortheIProuter'sIPoFabricinterface.
Thereareafewoptionsforhowthislimitcanbeconfigured.
Optionsinclude:2.
8.
3.
22.
8.
3.
3LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
035EnabletheNetworkingvirtualfabricattheFM.
ThisisanadvancedtechniquethatresultsinIPoFabrictrafficbeingsegregatedintoitsownvirtualfabricwithitsownservicelevelseparatefromMPIandothertraffictypesonthefabric.
EnablingthissecondvirtualfabricwithitsownQoSrequirementsresultsinasubsetoftheavailablesdmaenginesbeingreservedateachinterfaceforIPoFabriccommunications.
Locallyconfigurethenum_sdmamoduleparameterandsetitequalto8.
(Thisisthepreferredapproach,unlessmultiplevirtualfabricsareenabledattheFabricManager.
)$cat/etc/modprobe.
d/hfi1.
confoptionshfi1krcvqs=1num_sdma=8ConnectedModeTuningsStartingwiththeIFS10.
9release,IntelrecommendsconsideringoperatingIProuterindatagrammodewithAcceleratedIPoFabric.
IftheIProuteristobeoperatedinconnectedmode,thenitisstillrecommendedtodesignhowthecoresshouldbeusedontheIProuter,andtoenableRPSasawaytofurtherspreadsomeoftheIPoFabricprocessingloadacrossthoseavailablecores.
ReceivePacketSteering(RPS)andTransmitPacketSteering(XPS)distributepacketprocessingamongmultipleCPUstoimprovetheperformanceofthenetworkstackwhentheunderlyingnetworkinterfaceisnotinherentlymulti-queue.
SuchasisthecasewithdefaultIPoIBusedinconnectedmode,andwhenacceleratedIPoFabrichasbeendisabled.
ThissectiondiscusseshowtoapplyRPSandXPSinanIProuterforbothOPAandnon-OPAinterfaces(e.
g.
,EthernetorInfiniBand*).
FormoreinformationaboutRPSandXPS,pleaserefertothe"ScalingintheLinuxNetworkingStack"documentavailableat:https://www.
kernel.
org/doc/Documentation/networking/scaling.
txtThefollowingrulesarerecommendedtodefineamaskofCPUsusedforXPSandRPS:RPSsettings—RPSsettingsforOPAinterfacesshouldbeconfiguredbasedonarecommendationintheIntelOmni-PathFabricPerformanceTuningUserGuide:RPSshouldbeconfiguredtodirecttheworkloadtothesameCPUsthatareservicingthereceivecontextinterrupts.
—RPSsettingsfornon-OPAcards(e.
g.
,EthernetorInfiniBand*)shouldbeconfiguredtoforwardpacketstoCPUswhereanOPAcardisconnected,butnotusedbyreceivecontextinterruptsorreceiveandsendcompletionqueueprocesses.
—RPSforOPAandnon-OPAinterfacesshouldnotbeconfiguredwhenusingAcceleratedIPoFabricandDatagramMode.
XPSsettings—NoperformancegainisexpectedforsettingsofOPAXPSandnon-OPAXPS.
Therecommendationistokeepthemasdefaultvalues,0.
2.
8.
3.
4IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202036Doc.
No.
:H99668,Rev.
:10.
0FormoredetailsofhowtosettheRPS,pleaserefertheIntelOmni-PathFabricPerformanceTuningUserGuide.
Incaseofnon-OPAcardswithmultiplereceiveandtransmitqueues,e.
g.
,Ethernet,theRPSandXPSsettingsmustbeappliedtoeachqueuesinthecards.
Ingeneral,RPSandXPSsettingsforadevicewithqueuecanbefoundinthefollowingpaths:RPS:/sys/class/net//queues/rx-/rps_cpusXPS:/sys/class/net//queues/tx-/xps_cpusWhereisthenameoftheEthernetorOPAinterfaceinLinux.
Example:AcceleratedIPoIBDisabledNOTEThefollowingexampleisforillustrativepurposesonly.
Yourconditionsmayvary.
TodetermineCPUswhoseIRQsOPAusesforkctxt,pleaseusethemethodmentionedbelow.
Formoredetails,pleaserefertotheDriverIRQAffinityAssignmentssectionoftheIntelOmni-PathFabricPerformanceTuningUserGuide.
First,identifytheCPUsocketwheretheadapterisconnectedusingthefollowingcommand:cat/sys/class/net//device/local_cpusForexample,with218-coreCPUswhereanOPAadapterislocatedinsocket1andanon-OPAadapterisatsocket0,theoutputofthecommandisasfollows:opa:00000000,00000000,00000000,00000000,0000000f,fffc0000non-opa:00000000,00000000,00000000,00000000,00000000,0003ffffNext,identifyCPUsforkernelreceivecontextsoftheOPAadapter(hfi1_0inthisexample)bytypingthefollowingcommand:dmesg|grephfi1_0|grepIRQ|grepctxt[79.
710022]hfi10000:81:00.
0:hfi1_0:IRQ:259,typeRCVCTXTctxt0->cpu:18[79.
718650]hfi10000:81:00.
0:hfi1_0:IRQ:260,typeRCVCTXTctxt1->cpu:19[79.
727298]hfi10000:81:00.
0:hfi1_0:IRQ:261,typeRCVCTXTctxt2->cpu:20[79.
735938]hfi10000:81:00.
0:hfi1_0:IRQ:262,typeRCVCTXTctxt3->cpu:21[79.
744592]hfi10000:81:00.
0:hfi1_0:IRQ:263,typeRCVCTXTctxt4->cpu:22[79.
753220]hfi10000:81:00.
0:hfi1_0:IRQ:264,typeRCVCTXTctxt5->cpu:23Forthisexample,systemhasbeenconfiguredwith5OPAhfi1kernelreceivecontexts(parameterkrcvqs=5tohfi1kernelmodule).
RCVTXTctxt0isformanagementtraffic,whichisassignedtoCPU18.
Theother5receivecontextsareforworkloadsandtheyareassignedtoCPU19-23.
InadditiontoCPUsocketsandCPUsforkernelreceivecontexts,CPUsusedforprocessingreceiveandsendcompletionqueuesneedtobeidentifiedtomapthenon-OPARPS.
ThosecompletionqueueprocessingCPUscanbeidentifiedbylaunchingqperforiperfinMeasuringBaselineIPPerformanceonpage42inordertosendpacketsfromcomputenodestoastorageserver,andbyobservingCPUbehaviorsoftheIProuter.
Whilerunningthebenchmark,aCPUwithhighutilizationamongnon-krcvqCPUsmeansthattheCPUisassignedforthereceivecompletionqueueLinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
037processing(OPArx-cqinthetablebelow),whilethenextcoreistypicallyusedforthesendcompletionqueueprocessing(OPAtx-cqinthetablebelow);onecanalsoverifytheassignmentofOPAtx-cqbylaunchingbenchmarkstoreversedirection,i.
e.
,bysendingpacketsfromstorageserverstocomputenodes.
Pleasenotethat,whenthenumberofcomputenodesincreases,theseCPUscanbehighlyutilizedandmaybecomethebottleneckoftherouter.
Forthisexample,CPU27and28areusedbyreceiveandsendcompletionqueueprocesses,respectively.
OnceCPUsockets,CPUsforthereceivecontextsandCPUsforreceiveandsendcompletionqueueprocessesareidentified,XPSandRPScanbeconfiguredbasedontherulesdescribedabove.
Forthisexample,ifandarethenamefornon-OPAandIPoFabricinterfacesrespectively,OPARPSisassignedtoCPU19-23,whilenon-OPARPSisassignedtoCPU24-26,29-35withthefollowingcommands:echo"0,00f80000">/sys/class/net//queues/rx-0/rps_cpusecho"f,e7000000">/sys/class/net//queues/rx-/rps_cpusThetablebelowsummarizesallRPSassignments(markedasbold,lastcolumn),andtheadditionalinformation(non-bold,3rdcolumn),suchasreceivecontextinterruptsandTX/RXcompletionqueueprocesses.
Table1.
CPUsAssignmentinExampleofIPRouter(AcceleratedIPoFabricDisabled)CoreSocketLoad181OPAkcrvq0191OPAkcrvq1OPARPS201OPAkcrvq2OPARPS211OPAkcrvq3OPARPS221OPAkcrvq4OPARPS231OPAkcrvq5OPARPS241Non-OPARPS251Non-OPARPS261Non-OPARPS271OPArx-cq281OPAtx-cq291Non-OPARPS301Non-OPARPS311Non-OPARPS321Non-OPARPS331Non-OPARPS341Non-OPARPS351Non-OPARPSIntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202038Doc.
No.
:H99668,Rev.
:10.
0Tomaketheabovetuningpersistafterreboot,refertoPersistingEthernetTuningonpage41.
EthernetInterfaceTuningIProuterperformancewhenroutingtoEthernetmaybeimprovedwithtuningsappliedtotheEthernetinterface.
TheEthernettuningsrecommendedinthissectionmainlyfocusonallowingpacketstobeforwardedwithminimalprocessingassoonastheyarriveattheEthernetinterfaceofanIProuter.
NOTEPleasenotethattuningrecommendedinthissectionmayincreasetheloadonCPUsbydisablingoffloadingcapabilitiesoftheEthernetinterfaceorbyincreasingthenumberofsysteminterrupts.
Thesehigh-CPUutilizationsmaybeacceptableforarouter,whereroutingpacketsistheprimaryworkloadforthenode.
However,usingthesetuningsforothernodes,suchasstorageserversandcomputenodes,maysignificantlyreduceperformanceofthesystem.
Therefore,tuningsdescribedhereshouldonlybetakenintoconsiderationforrouters.
TCP/IPSegmentationandReceiveOffloadsVariousEthernetoffloadingcapabilitiesimplementedintheLinux*networkstackareknowntoimprovetheperformanceofTCP/IPendpointsonanetwork.
However,whentheyareusedinanIProuterwithanOPAadapter,theycancreateunnecessaryoverheadtoprocessIPoFabricintheOPAadapter.
FulldescriptionoftheoffloadingtechniquesusedforLinuxkernel,canbefoundathttps://www.
kernel.
org/doc/Documentation/networking/segmentation-offloads.
txt.
DisablingsomeofthemmayincreaseperformanceofanIProuter.
Considerdisablingthefollowingtechnologies:GenericSegmentationOffload-GSOGenericReceiveOffload-GROTCPSegmentationOffload-TSOPartialGenericSegmentationOffload-GSO_PARTIALToturnthemoff,pleaseusethecommandbelow:#ethtool-Kgroofftsooffgsoofftx-gso-partialoffNOTETheoperationdescribedabovemaynegativelyimpacttheperformanceofanEthernetinterfacewithsmallMTU(e.
g.
,1500).
Pleaseverifythatthetuningworksasintendedonyourtargetnetwork.
Tomaketheabovetuningpersistafterreboot,refertoPersistingEthernetTuningonpage41.
2.
8.
42.
8.
4.
1LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
039AdaptiveRXandTXSimilartoEthernetoffloading,enablinginterruptcoalescenceparametersonEthernetmayintroduceextradelaystoIPpacketstobedeliveredtotheOPAadapter.
TheseparametersaresometimesenabledbydefaultinLinux.
TocheckthecoalescenceparametersoftheEthernet,usethecommandbelow:#ethtool-cCoalesceparametersforenp6s0f1:AdaptiveRX:onTX:onstats-block-usecs:0sample-interval:0pkt-rate-low:0pkt-rate-high:0[…]WhereisthenameoftheEthernetinterfaceinLinux.
DisablingtheadaptiveRXandTXandreducingtheirtimeoutsto0mayincreasetheperformanceoftheIProuter:sudoethtool-Cadaptive-txoffadaptive-rxoffsudoethtool-Crx-usecs0tx-usecs0ForfurtherinformationaboutAdaptiveRXandTX,pleaserefertheLinuxkerneldocumentationavailableat:https://www.
kernel.
org/doc/Documentation/networking/scaling.
txtTomaketheabovetuningpersistafterreboot,refertoPersistingEthernetTuningonpage41.
IPRouterwith40GbEIntelInterface(XL710)TheIntelEthernetflowdirectorisdescribedindetailintheIntroductiontoIntelEthernetFlowDirectorandMemcachedPerformancewhitepaper.
Theflowdirectorhastwomainmodes:theExternallyProgrammedmodeandtheautomaticApplicationTargetedRouting(ATR)mode.
Usethecommandbelowtocheckthecurrentstate:#ethtool--show-priv-flagsPrivateflagsforens865f0:MFP:offLinkPolling:offflow-director-atr:onveb-stats:offhw-atr-eviction:offlegacy-rx:offdisable-source-pruning:offvf-true-promisc-support:offItisrecommendedtodisableATRontheIntelEthernetinterfacesforbetterperformanceoftheIProuter.
TodisableATR,usethefollowingcommand:#ethtool--set-priv-flagsflow-director-atroffWhereisthenameoftheEthernetinterfaceinLinux.
2.
8.
4.
22.
8.
4.
3IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202040Doc.
No.
:H99668,Rev.
:10.
0Tomaketheabovetuningpersistafterreboot,refertoPersistingEthernetTuningonpage41.
PersistingEthernetTuningPrevioussectionsdescribehowtotuneEthernetsettingsandRPS/XPSsettingsforOPAandnon-OPAinterfaces.
Thisfollowingsectionsdescribehowtomakeconfigurationpersistentsothatthechangescanstillbeappliedaftersystemreboot.
PersistingwithConnectedModePerformthefollowingstepsusingrootprivileges:1.
EnsuretheEthernetdriver,OPAhfi1driverandipoibdriverareconfiguredtoautostart.
2.
Addthefollowinglinestothefile/etc/rc.
local:ethtool-Kgroofftsooffgsoofftx-gso-partialoffethtool-Cadaptive-txoffadaptive-rxoffethtool-Crx-usecs0tx-usecs0#followinglineshouldbesetforallrxqueuesforETHinterfaceecho"00000xyz">/sys/class/net//queues/rx-/rps_cpus#followinglineshouldbesetforallrxqueuesforETHinterfaceecho"00000abc">/sys/class/net//queues/rx-0/rps_cpus#followinglineshouldbesetonlyfor40GbEIntelinterface(XL710)ethtool--set-priv-flagsflow-director-atroffWhere:isnameofEthernetinterfaceinLinux.
isoftenib0,butcouldbeib1,opa,opa_ib0,andsoon.
"00000xyz","00000abc"representsthehexmask(number)foryoursituation.
RefertoConnectedModeTuningsonpage36forinstructionsonhowtoconstructthehexmask.
isaRXqueueid.
Thequeuenumbersstartswith0.
PleasenotethatanEthernetinterfacemaycontainmultipleRXqueues,andtheRPSmasksettingsshouldbeappliedtoallRXqueuesintheinterface.
3.
Makesure/etc/rc.
localscriptfileisexecutablebyissuing:#chmod+x/etc/rc.
local4.
Reboottoactivatethechanges.
PersistingwithDatagramModeandIPoFabricPerformthefollowingstepsusingrootprivileges:1.
EnsuretheEthernetdriver,OPAhfi1driverandipoibdriverareconfiguredtoautostart.
2.
8.
52.
8.
5.
12.
8.
5.
2LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
0412.
Addthefollowinglinestothefile/etc/rc.
local:ethtool-Kgroofftsooffgsoofftx-gso-partialoffethtool-Cadaptive-txoffadaptive-rxoffethtool-Crx-usecs0tx-usecs0#followinglineshouldbesetonlyfor40GbEIntelinterface(XL710)ethtool--set-priv-flagsflow-director-atroffWhere:isnameofEthernetinterfaceinLinux.
isoftenib0,butcouldbeib1,opa,opa_ib0,andsoon.
isaRXqueueid.
Thequeuenumbersstartswith0.
PleasenotethatanEthernetinterfacemaycontainmultipleRXqueues,andtheRPSmasksettingsshouldbeappliedtoallRXqueuesintheinterface.
3.
Makesure/etc/rc.
localscriptfileisexecutablebyissuing:#chmod+x/etc/rc.
local4.
Reboottoactivatethechanges.
MeasuringBaselineIPPerformanceBeforerunningstoragebenchmarks,whichwillrunoverIPprotocols,itisimportanttomeasuretheperformanceofIPovertheIntelOmni-PathandIBFabrics:betweenIntelOmni-Pathclients,betweenstorageservers,andbetweenanIntelOmni-Pathclientthroughtheroutertoastorageserver.
Pointtopointtools,suchasqperf,oriperf,asdescribedintheIPoFabricperformancesectionoftheIntelOmni-PathFabricPerformanceTuningUserGuidemaybeusedforthispurpose.
RouterSaturationInordertoidentifytheroutersaturationpoint,multipleiperf3canbelaunchedsimultaneouslyacrossmultiplecomputenodesandasinglestorageserver.
Thetotalnumberofcomputenodestosaturatearouterdependsontheconfigurationofthecomputenodes.
Itisrecommendedtostarttestingwith4computenodesand2iperf3processespercomputenode,andtoincreasethenumberofcomputenodesupto16untiltherouterissaturated.
Beforelaunchingiperf3clientprocessesonthecomputenodes,thestorageservershouldhaveiperf3serverprocessestoreceivemessagesfromtheiperf3clients.
iperf3exampleforaserverprocess(storageserver):iperf3-s-1-p-AOnceiperf3serversarelaunchedatthestorageserver,iperf3clientscanbelaunchedacrossmultiplecomputenodes.
2.
8.
62.
8.
7IntelOmni-PathFabric—LinuxIPRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202042Doc.
No.
:H99668,Rev.
:10.
0iperf3exampleforaclientprocess(computenode):iperf3-c-p-t20-Z-l1M-AIntheaboveexample,singleiperf3processwillrunfor20secondswitha1Mmessagesize.
Whileiperf3isrunning,thestorageservershouldhaveaserverprocesstoreceivemessagesfromtheclient.
Notethateach(client,server)iperf3pairshouldcontainthesame,andthetotalnumberofiperf3serverprocessesonthestorageservershouldbeequaltothetotalnumberofiperf3clientprocessesacrossallcomputenodes.
Oncethesaturationpointisidentifiedfrommultiplecomputenodestoastorageserver,itisrecommendedtotestthesaturationpointofthereversedirection,i.
e.
,fromstorageservertocomputenodes,bylaunchingiperf3serversonthecomputenodesandiperf3clientsonastorageserver.
Routersaturationpointswillvarydependingonrouterusage.
Ifthesaturationpointsdonotmeettherequirementsofthesystem,itisrecommendedtoaddmorerouterstoimproveperformance.
LinuxIPRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
0433.
0LNetRouterLustre.
orgTheprimarydocumentationonsettingupanLNetroutercanbefoundonLustre.
org.
ThedocumentationfoundonLustre.
orgismaintainedandupdatedbytheLustre*communityasnewdevelopmentsarereleasedandshouldbeconsideredtheprimaryconfigurationguideforLNetRoutersandLustre.
ThisguideismeanttosupplementthematerialfoundonLustre.
organdtoprovidesomeadditionaldetailsondifferentaspectsofconfiguration.
http://wiki.
lustre.
org/LNet_Router_Config_GuideThisdocumentincludesanoverviewofanLNetrouter,additionaldetailsaboutconfiguringanLNetrouter,LNetroutertroubleshooting,LNetrouterserverdesignconsiderationsandafewconfigurationexamples.
LNetRouterDeploymentChecklistToefficientlymakeuseofbothLustre.
organdthisguide,thefollowingchecklistaidesindefiningasetoflogicalstepstosetupanddeployanLNetrouter.
ItassumestheLustrefilesystemissetup,andcomputenodesandfabricareinstalled.
DesigntheroutersolutionSeeDesigningLNetRouterstoConnectIntelOPAandInfiniBand*onpage50orspeakwithyourvendorofchoice.
SetupofLNetRouterServersNetworkcardinstallationandOSinstallsInstallOPA/OFEDDriversandLustredrivershttp://wiki.
lustre.
org/LNet_Router_Config_Guide%23Software_InstallationsSetupandConfigureLNetRouterhttp://wiki.
lustre.
org/LNet_Router_Config_Guide(andOverviewofConfiguringLNetonpage46)LNetconfigurationcanbedoneinthreeways:1.
NetworkConfigurationbyaddingmoduleparametersinlustre.
conf2.
Dynamicnetworkconfigurationusinglnetctlcommand3.
Importing/exportingconfigurationsusingaYAMLfileformat(Optional)Performfinegrainroutingincaseofavailabilityofmultiplerouteshttp://wiki.
lustre.
org/LNet_Router_Config_Guide%23Fine-Grained_Routing(Optional)TunetheLNetparametershttp://wiki.
lustre.
org/LNet_Router_Config_Guide#LNet_TuningForMellanoxFDR/EDRcardsbasedonthemlx5driverintherouter,makechangestothetunables.
ConfigureLustreclients/mountLustreTestLNetroutingwithLNetSelfTesthttp://wiki.
lustre.
org/LNET_SelftestTroubleshootingSeeLNetTroubleshootingonpage48.
ThepracticalimplementationexamplesatthebottomofthedocumentalsoprovidestrongexamplesofLNetRouterconfiguration.
3.
13.
1.
1IntelOmni-PathFabric—LNetRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202044Doc.
No.
:H99668,Rev.
:10.
0OverviewLustrefilesystemshavetheuniquecapabilitytorunthesameglobalnamespaceacrossseveraldifferentnetworktopologies.
TheLNetcomponentsofLustreprovidethisabstractionlayer.
LNetisanindependentprojectfromLustreandisusedforotherprojectsbeyondtheLustrefilesystem.
LNetwasoriginallybasedontheSandiaPortalsproject.
LNetcansupportEthernet*,InfiniBand*,IntelOmni-Path,legacyfabrics(ELANandMyriNet*)andspecificcomputefabricsasCrayGemini*,Aries*,andCascade*.
LNetispartoftheLinuxkernelspaceandallowsforfullRDMAthroughputandzerocopycommunicationswhenavailable.
Lustrecaninitiateamulti-OSTreadorwriteusingasingleRemoteProcedureCall(RPC),whichallowstheclienttoaccessdatausingRDMA,regardlessoftheamountofdatabeingtransmitted.
LNetwasdevelopedtoprovidethemaximumflexibilityforconnectingdifferentnetworktopologiesusingLNetrouting.
LNet'sroutingcapabilitiesprovideanefficientprotocoltoenablebridgingbetweendifferentnetworks,e.
g.
,fromEthernet-to-InfiniBand,ortheuseofdifferentfabrictechnologiessuchasIntelOmni-PathArchitecture(IntelOPA)andInfiniBand*.
ThefollowingfigureshowsanexampleofhowtoconnectanexistingInfiniBand*network(storageandcomputenodes)tonewIntelOPAcomputenodes.
3.
2LNetRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
045Figure9.
HeterogeneousTopologyOverviewofConfiguringLNetToprepareforinstallation:1.
ObtainthePCIebusnumberanddevicenumberoftheIntelOmni-Path/InfiniBand*cards:ForIntelOmni-Path:$lspci|grepOmni-PathForInfiniBand*:$lspci|grepMellanox2.
CheckthattheIntelOmni-Path/InfiniBand*cardsareinstalledandnotetheinterfacenames.
UsethePCIebusnumberanddevicenumbertoidentifythecorrectcard:$ls-la/sys/class/net/NOTETurbomodeontheLNetroutermustbedisabledforconsistentperformance.
CPUrouterfrequencyhasalmostnegligibleeffectontheLNetrouterperformance3.
3IntelOmni-PathFabric—LNetRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202046Doc.
No.
:H99668,Rev.
:10.
0AnLNetrouterisaspecializedLustreclientwhereonlytheLNetisrunning.
Anindustry-standard,Intel-basedserverequippedwithtwosocketsisappropriateforthisrole.
TheLustrefilesystemisnotmountedontherouter,andasingleLNetroutercanservedifferentfilesystems.
InthecontextofLNetroutingbetweentwoRDMA-enablednetworks,in-memoryzerocopycapabilityisusedtooptimizelatencyandperformance.
Figure10.
LNetRouterConsiderthesimpleexampleshowninthefigureabove,where:StorageserversareonaMellanox-basedInfiniBandfabric–10.
10.
0.
0/24ClientsareonanIntelOPAfabric-10.
20.
0.
0/24TherouterisbetweenIntelOmni-PathfabricandInfiniBand*fabricat10.
10.
0.
1and10.
20.
0.
1Forthepurposeofsettingupthelustre.
conffilesinthisexample,theLustrenetworkontheIntelOmni-Pathfabricisnamedo2ib0andtheLustrenetworkontheInfiniBand*fabricisnamedo2ib2.
Thenetworkconfigurationontheservers(typicallycreatedin/etc/modprobe.
d/lustre.
conf)willbe:optionslnetnetworks="o2ib2(ib2)"routes="o2ib010.
10.
0.
1@o2ib2"ThenetworkconfigurationontheLNetrouter(typicallycreatedin/etc/modprobe.
d/lustre.
conf)willbe:optionslnetnetworks="o2ib0(opa_ib),o2ib2(edr_ib)""forwarding=enabled"LNetRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
047Thenetworkconfigurationontheclients(typicallycreatedin/etc/modprobe.
d/lustre.
conf)willbe:optionslnetnetworks="o2ib0(ib0)"routes="o2ib210.
20.
0.
1@o2ib0"RestartingLNetisnecessarytoapplythenewconfiguration.
ClientswillmounttheLustrefilesystemusingtheusualcommandline(assumingmgs1andmgs2aretheIPaddressesofthetwoLustreservershostingtheMGSserviceontheo2ib0networkintheexample):mount-tlustremgs1@o2ib0://Lustremountpointisstoredin/etc/fstab.
ReloadLustremodule(forfurtherdebugging,refertotheLNetDynamicConfigurationsectionoftheLNETRouterConfigurationGuideontheLustre.
orgwikitounderstandlnetctlcommands):$modprobe-vlnetLNetTroubleshootingNOTEThetroubleshootinginformationinthissectionisvalidforLustre*version2.
11andearlier.
LNetprovidesseveralmetricstotroubleshootanetwork.
Figure11.
NetworkTroubleshootingUsingLNet3.
4IntelOmni-PathFabric—LNetRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202048Doc.
No.
:H99668,Rev.
:10.
0Referencingthefigureabove,considerthefollowingconfiguration:SixLustreserversareonLAN0(o2ib0),aMellanox-basedInfiniBandnetwork–192.
168.
3.
[1-6]SixteenclientsareLAN1(o2ib1),anIntelOPAnetwork-192.
168.
5.
[100-115]TworoutersonLAN0andLAN1at192.
168.
3.
[7-8]and192.
168.
5.
[7-8]OneachLustreclientwecanseethestatusoftheconnectionsusingthe/proc/sys/lnet/peersmetricfile.
ThisfileshowsallNIDsknowntothisnode,andprovidesinformationonthequeuestate:#cat/proc/sys/lnet/peersnidrefsstatelastmaxrtrmintxminqueue192.
168.
5.
8@o2ib14up-18888-5050192.
168.
5.
7@o2ib14up-18888-4730Here,"state"isthestatusoftherouters.
Inthecaseofafailureofonepath,I/Owillberoutedthroughthesurvivingpath.
Whenbothpathsareavailable,RPCswillusebothpathsinround-robin.
Here,"max"isthemaximumnumberofconcurrentsendsfromthispeerand"tx"isthenumberofpeercreditscurrentlyavailableforthispeer.
Noticethenegativenumberinthe"min"column.
ThisnegativevaluemeansthatthenumberofslotsontheLNetwasnotsufficientandthequeuewasoverloaded.
Thisisanindicationtoincreasethenumberofpeercreditsandcredits(seeLNetTuning).
Increasingthecreditsvaluehassomedrawbacks,includingincreasedmemoryrequirementsandpossiblecongestioninnetworkswithaverylargenumberofpeers.
Thestatusoftheroutingtablecanbeobtainedfromthe/proc/fs/lnet/routesfilefromaclient:#cat/proc/fs/lnet/routesRoutingdisablednethopsprioritystateroutero2ib10up192.
168.
5.
8@o2ib1o2ib10up192.
168.
5.
7@o2ib1Thestatusoftherouterscanbeverifiedfromthe/proc/fs/lnet/routersfilefromaclient:#cat/proc/fs/lnet/routersrefrtr_refalive_cntstatelast_pingping_sentdeadlinedown_nirouter413up471NA0192.
168.
5.
7@o2ib1411up471NA0192.
168.
5.
8@o2ib1OneachLNetrouter,the/proc/sys/lnet/peersmetricshowsallNIDsknowntothisnode,andprovidesthefollowinginformation(valuesareexamplesandnotallinformationisshown):#cat/proc/sys/lnet/peersnidrefsstatelastmaxrtrmintxminqueue192.
168.
3.
4@o2ib01up16588-88-150192.
168.
3.
1@o2ib01up4788-68-80192.
168.
3.
6@o2ib01up16588-88-150192.
168.
3.
3@o2ib01down11588-88-120LNetRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
049192.
168.
3.
5@o2ib01up15388-88-80192.
168.
3.
2@o2ib01up83888870192.
168.
5.
113@o2ib11up6588-88-60192.
168.
3.
104@o2ib11down999988-88-10192.
168.
5.
109@o2ib11up12788-48-130192.
168.
5.
111@o2ib11up6788-88-260192.
168.
5.
114@o2ib11up17088-38-120192.
168.
5.
108@o2ib11up15188-48-70192.
168.
3.
106@o2ib11down9999884840192.
168.
5.
101@o2ib11up5888-38-90192.
168.
5.
103@o2ib11up17888-88-140192.
168.
5.
102@o2ib11up6388-48-180.
.
.
Intheoutputabove,wecanseesomeLustreclientsonLNet0aredown.
Creditsareinitializedtoallowacertainnumberofoperations.
Intheexampleintheabovetable,thisvalueis8(eight),shownunderthemaxcolumn.
LNetkeepstrackoftheminimumnumberofcreditseverseenovertimeshowingthepeakcongestionthathasoccurredduringthetimemonitored.
Feweravailablecreditsindicatesamorecongestedresource.
Thenumberofcreditscurrentlyinflight(numberoftransmitcredits)isshowninthe"tx"column.
Themaximumnumberofsendcreditsavailableisshowninthe"max"columnandthatneverchanges.
Thenumberofrouterbuffersavailableforconsumptionbyapeerisshowninthe"rtr"column.
Therefore,rtr-txisthenumberoftransmitsinflight.
Typically,rtr==max,althoughaconfigurationcanbesetsuchthatmax>=rtr.
Theratioofroutingbuffercreditstosendcredits(rtr/tx)thatislessthanmaxindicatesoperationsareinprogress.
Iftheratiortr/txisgreaterthanmax,operationsareblocking.
LNetalsolimitsconcurrentsendsandnumberofrouterbuffersallocatedtoasinglepeer,sothatnopeercanoccupyalltheseresources.
RealtimestatisticsoftheLNetroutercanbeobtainedusingtherouterstatcommand.
RouterstatwatchesLNetrouterstatistics.
Ifnointervalisspecified,statsaresampledandprintedonlyonce;otherwise,statsaresampledandprintedeveryinterval.
Outputincludesthefollowingfields:M-msgs_alloc(msgs_max)E-errorsS-send_count/send_lengthR-recv_count/recv_lengthF-route_count/route_lengthD-drop_count/drop_lengthDesigningLNetRouterstoConnectIntelOPAandInfiniBand*TheLNetroutercanbedeployedusinganindustrystandardserverwithenoughnetworkcardsandtheLNetsoftwarestack.
Designingacompletesolutionforaproductionenvironmentisnotaneasytask,butIntelisprovidingtools(LNetSelf-Test)totestandvalidatetheconfigurationandperformanceinadvance.
3.
5IntelOmni-PathFabric—LNetRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202050Doc.
No.
:H99668,Rev.
:10.
0ThegoalistodesignLNetrouterswithenoughbandwidthtosatisfythethroughputrequirementsoftheback-endstorage.
ThenumberofcomputenodesconnectedtoanLNetrouternormallydoesnotchangethedesignofthesolution.
ThebandwidthavailabletoanLNetrouterislimitedbytheslowestnetworktechnologyconnectedtotherouter.
Typically,Intelhasobserveda10-15%declineinbandwidthfromthenominalhardwarebandwidthoftheslowestcard,duetotheLNetrouter.
MultipleLNetRouterscanbeusedtoconnectstoragebetweendifferentfabrics.
Thisincludesprovidingadegreeofloadbalancing,failoverandperformancescaling.
TheperformancescalingtendstobelinearwithlargeblockIOandabalancedfabric,soaddingadditionalroutersincreasesthroughputtotheexistingLustrestorageinalinearfashion.
Ineverycase,Intelencouragesvalidatingtheimplementedsolutionusingtoolsprovidedbythenetworkinterfacemakerand/ortheLNetSelf-Testutility,whichisavailablewithLustre.
LNetrouterscanbecongestedifthenumberofcredits(peer_creditsandcredits)arenotsetproperly.
Forcommunicationtorouters,notonlyacreditandpeercreditmustbetuned,butaglobalrouterbufferandpeerrouterbuffercreditareneeded.
TodesignanLNetrouterinthiscontext,weneedtoconsiderthefollowingtopics:HardwaredesignandtuningSoftwarecompatibilityHardwareDesignandTuningWhendesigninganLNetrouterbetweentwodifferentnetworktechnologiessuchasMellanoxInfiniBandandIntelOPA,oneshouldconsiderthatLNetwasdevelopedtakingadvantageoftheRDMAzerocopycapability.
ThismakestheLNetrouterextremelyefficient.
ToachievehigherperformancefromIntelOPAinaLustrefilesystem,onemusttunetheLNetstackasdescribedinLNetTuningandintheIntelOmni-PathFabricPerformanceTuningUserGuide.
CPUSelectionGenerallyspeaking,theCPUperformanceisnotcriticalfortheLNetroutercode,andtherecentSMPaffinityimplementationenablestheLNetcodetoscaleonNUMAservers.
ThefollowingfigureshowstheCPUutilizationoftwoLNetroutersconfiguredforloadbalancingandroutinganIntelOPAclientnetworkandaMellanoxFDRstoragenetworkduringalargeIORtest.
Theactivitybetweenthetworoutersiscompletelyspecularandbalanced.
TheCPUutilizationisbelow10%tosustainaFDRcard.
TheCPUactivityistwotimesduringWRITEcomparedtoREAD.
BothrouterswereequippedwithtwoIntelXeonProcessors(E5-2697v2)clockedat2.
7GHz.
FurthertestingchangingthefrequencyspeedoftheCPUwasrunandprovedtheCPUfrequencyhadnoeffectontheLNetrouterperformance.
3.
5.
13.
5.
2LNetRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
051Figure12.
CPUUtilizationforTwoLNetRoutersRoutingBetweenanIntelOPANetworkandaMellanoxFDRNetworkToobtainhigherperformance,IntelsuggeststurningofftheHyper-ThreadingTechnologyandFrequencyScalingcapabilitiesoftheCPU(seetablebelow).
TestsusingdifferentCPUfrequenciesalsoindicateLNetRoutingisnotclockfrequencysensitive.
SelectinganIntelXeonProcessorwithamoderatenumberofcoresandmoderatefrequencyisrecommended.
Table2.
LNetRouterCPUTuningHardwareRecommendationCPUCosteffectivecurrentprocessorsuchastheIntelXeonGold6130ProcessorHyperthreadingOFFCPUFrequencyScalingDISABLEDItisimportanttoselecttherightPCIe*slotintheserverfortheIntelOPAandIBcardstoavoidlongdistancepathsintheNUMAarchitecture.
ForanoptimalconfigurationensurethenetworkcardsareinPCIebussesconnectedtothesameCPUsocket.
Seethefollowingfigure.
IntelOmni-PathFabric—LNetRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202052Doc.
No.
:H99668,Rev.
:10.
0Figure13.
PCIe*SlotAllocationMemoryConsiderationsAnLNetrouterusesadditionalcreditaccountingwhenitneedstoforwardapacketforanotherpeer:PeerRouterCredit:Thiscreditmanagesthenumberofconcurrentreceivesfromasinglepeerandpreventsinglepeerfromusingallrouterbufferresources.
Bydefaultthisvalueshouldbe0.
Ifthisvalueis0LNetrouterusespeer_credits.
RouterBufferCredit:ThiscreditallowsmessagestobequeuedandselectnondatapayloadRPCversusdataRPCtoavoidcongestion.
Infact,anLNetRouterhasalimitednumberofbuffers:—tiny_router_buffers–sizeofbufferformessagesof1pageinsizeTheseLNetkernelmoduleparameterscanbemonitoredusingthe/proc/sys/lnet/buffersfileandareavailableperCPT:pagescountcreditsmin05125125030512512504051251249705125125041409640964055140964096405014096409640481409640964072256256256244256256256248256256256240256256256246Negativenumbersinthe"min"columnaboveindicatethatthebuffershavebeenoversubscribed;wecanincreasethenumberofrouterbuffersforaparticularsizetoavoidstalling.
3.
5.
3LNetRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
053ThememoryutilizationoftheLNetrouterstackiscausedbythePeerRouterCreditandRouterBufferCreditparameters.
AnLNetrouterwithaRAMsizeof32GBormorehasenoughmemorytosustainverylargeconfigurationsfortheseparameters.
Ineverycase,thememoryconsumptionoftheLNetstackcanbemeasuredusingthe/proc/fs/sys/lnet/lnet_memusedmetricsfile.
Table3.
SampleTableHardwareRecommendationRAM32GBTechnologyDDR3orDDR4ECCSoftwareCompatibilityThissectiondiscussescompatibilityconsiderationsforthesoftwarestacktobeused:TheIntelFabricSuite(IFS)forIntelOPAsupportsRHEL*7.
4.
TheMellanoxOFED3.
xstackissupportedfromLustre*softwareversion2.
4orlater.
PracticalImplementationsThissectioncoverstwoofthemostcommonnetworkinterfaceconfigurations:Example1:LegacystoragewithInfiniBand*cardConnextX-3/IB/4connectedtonewcomputenodesusingIntelOPA.
Example2:NewstoragewithIntelOPAconnectedtolegacycomputenodesonInfiniBand*cardConnextX-3/IB/4.
NOTEThroughouttheseexamples,IPaddressesareexamplesonly.
Inthesetwoexampleconfigurations,wemadetheassumptionthatallthecomponentscanbeupgraded.
PleaseconsultanIntelLustrespecialistfornon-destructivemethodstoupgradeLustre.
WewilluseasmuchaspossibletheDynamicLNetConfiguration(DLC)technology.
Example1:LegacyStoragewithInfiniBand*CardConnectedtoNewComputeNodesUsingIntelOPAThefollowingfigureshowsthissimplifiednetworktopology:aLustreclientequippedwithanIntelOmni-PathfabriccardanLNetrouterequippedwithanIntelOPAcardandanInfiniBand*cardalegacyLustreserverequippedwithanInfiniBand*card3.
5.
43.
63.
6.
1IntelOmni-PathFabric—LNetRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202054Doc.
No.
:H99668,Rev.
:10.
0Figure14.
NetworkTopologyforExample1ConfigurationToachievethisconfiguration,performthefollowingprocedures:1.
UpgradeallLustreservers,LustreclientsandLNetrouterstoLustre*softwareversion2.
10orlater.
2.
PerformthestepsintheConfigureLustre*Clients(Example1)onpage55.
3.
PerformthestepsintheConfigureLNetRouters(Example1)onpage56.
4.
PerformthestepsintheConfigureLustre*Servers(Example1)onpage57.
ConfigureLustre*Clients(Example1)ThefollowingcommandsarebasedonthetopologyinFigure14onpage55.
#modprobelnet#lnetctllnetconfigure#lnetctlnetadd--neto2ib1--ifib1#lnetctlrouteadd--neto2ib0--gateway192.
168.
5.
152@o2ib1#lnetctlnetshow--verbosenet:-net:lonid:0@lostatus:uptunables:peer_timeout:0peer_credits:0peer_buffer_credits:0credits:0CPT:"[0,0,0,0]"-net:o2ib1nid:192.
168.
5.
151@o2ib1status:upinterfaces:0:ib1lndtunables:peercredits_hiw:64map_on_demand:32concurrent_sends:256fmr_pool_size:2048fmr_flush_trigger:512fmr_cache:1tunables:peer_timeout:180peer_credits:128peer_buffer_credits:0credits:1024CPT:"[0,0,0,0]"3.
6.
1.
1LNetRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
055#lnetctlrouteshow--verboseroute:-net:o2ibgateway:192.
168.
5.
152@o2ib1hop:1priority:0state:upTomaketheconfigurationpermanent:#lnetctlexport>/etc/sysconfig/lnet.
conf#echo"o2ib0:{gateway:192.
168.
5.
152@o2ib1}">/etc/sysconfig/lnet_routes.
conf#systemctlenablelnetConfigureLNetRouters(Example1)Bydefault,Lustre*softwarewilldeploythefollowingko2iblndconfiguration(/etc/modprobe.
d/koblnd.
conf)tooptimizeanyexistingIntelOPAcards:aliasko2iblnd-opako2iblndoptionsko2iblnd-opapeer_credits=128peer_credits_hiw=64credits=1024concurrent_sends=256ntx=2048map_on_demand=32fmr_pool_size=2048fmr_flush_trigger=512fmr_cache=1installko2iblnd/usr/sbin/ko2iblnd-probeSomeoftheaboveparametersarenotcompatiblewithInfiniBand*cards,sowewilluseDLCtosetper-cardparametersusingthefollowingprocedure:#modprobelnet#lnetctllnetconfigure#lnetctlnetadd--neto2ib1--ifib1#lnetctlnetadd--neto2ib0--ifib0#lnetctlsetrouting1#lnetctlnetshow--verbosenet:-net:lonid:0@lostatus:uptunables:peer_timeout:0peer_credits:0peer_buffer_credits:0credits:0CPT:"[0,0,0,0]"-net:o2ib1nid:192.
168.
5.
152@o2ib1status:upinterfaces:0:ib1lndtunables:peercredits_hiw:64map_on_demand:32concurrent_sends:256fmr_pool_size:2048fmr_flush_trigger:512fmr_cache:1tunables:peer_timeout:180peer_credits:128peer_buffer_credits:0credits:1024CPT:"[0,0,0,0]"3.
6.
1.
2IntelOmni-PathFabric—LNetRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202056Doc.
No.
:H99668,Rev.
:10.
0-net:o2ibnid:192.
168.
3.
104@o2ibstatus:upinterfaces:0:ib0lndtunables:peercredits_hiw:64map_on_demand:32concurrent_sends:256fmr_pool_size:2048fmr_flush_trigger:512fmr_cache:1tunables:peer_timeout:180peer_credits:128peer_buffer_credits:0credits:1024CPT:"[0,0,0,0]"Toedittheconfigurationandtomakeitpermanent:#lnetctlexport>/etc/sysconfig/lnet.
confInfiniBand*cardsbasedonthemlx5driverarenotcompatiblewiththemap_on_demand=32andotherparameters.
FortheInfiniBand*card,editthelnet.
conffileandaddthefollowingnewparameters(boldedbelow).
-net:o2ibnid:192.
168.
3.
104@o2ibstatus:upinterfaces:0:ib0lndtunables:peercredits_hiw:64map_on_demand:32concurrent_sends:256fmr_pool_size:2048fmr_flush_trigger:512fmr_cache:1tunables:peer_timeout:180peer_credits:128peer_buffer_credits:0credits:256CPT:"[0,0,0,0]"Toenabletheconfigurationatstartup:#systemctlenablelnetConfigureLustre*Servers(Example1)TheLustreserversshouldalreadybeconfigured,howeverweneedtochangetheconfigurationinordertoaddtheroutingpathtotheIntelOPAnetworkandenablethenewIntelOPAclientsthroughtheLNetrouters.
3.
6.
1.
3LNetRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
057RemoveanyLNetconfigurationnormallyin/etc/modprobe.
d/lustre.
conf#modprobelnet#lnetctllnetconfigure#lnetctlnetadd--neto2ib0--ifib0#lnetctlrouteadd--neto2ib1--gateway192.
168.
3.
104@o2ib0#lnetctlnetshow--verbosenet:-net:lonid:0@lostatus:uptunables:peer_timeout:0peer_credits:0peer_buffer_credits:0credits:0CPT:"[0,0,0,0]"-net:o2ibnid:192.
168.
3.
106@o2ibstatus:upinterfaces:0:ib0lndtunables:peercredits_hiw:4map_on_demand:0concurrent_sends:8fmr_pool_size:512fmr_flush_trigger:384fmr_cache:1tunables:peer_timeout:180peer_credits:8peer_buffer_credits:0credits:256CPT:"[0,0,0,0]"#lnetctlrouteshow--verboseroute:-net:o2ib1gateway:192.
168.
3.
104@o2ibhop:1priority:0state:upTomaketheconfigurationpermanent:#lnetctlexport>/etc/sysconfig/lnet.
conf#echo"o2ib1:{gateway:192.
168.
3.
104@o2ib0}">/etc/sysconfig/lnet_routes.
conf#systemctlenablelnetExample2:NewStoragewithIntelOPAConnectedtoLegacyComputeNodesonInfiniBand*CardsThefollowingfigureshowsanothercommonexampleofasimplifiednetworktopology:alegacyLustre*clientconnectedviaanInfiniBand*cardanLNetRouterimplementingIntelOPAandInfiniBand*cardsaLustreserverconnectedviaanIntelOPAcard.
3.
6.
2IntelOmni-PathFabric—LNetRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202058Doc.
No.
:H99668,Rev.
:10.
0Figure15.
NetworkTopologyforAddingNewOPA-ConnectedServersToachievethisconfiguration,performthefollowingprocedures:1.
UpgradeallLustreclients,LustreserversandLNetRouterstoLustre*software2.
10orlater.
ConfigureLustre*Clients(Example2)ThefollowingcommandsarebasedonthetopologyinFigure15onpage59.
Reconfiguretheclientsafterupgrade,removingthe/etc/modprobe.
d/lustre.
conf#modprobelnet#lnetctllnetconfigure#lnetctlnetadd--neto2ib0--ifib0#lnetctlrouteadd--neto2ib1--gateway192.
168.
3.
104@o2ib0#lnetctlnetshow--verbosenet:-net:lonid:0@lostatus:uptunables:peer_timeout:0peer_credits:0peer_buffer_credits:0credits:0CPT:"[0,0,0,0]"-net:o2ibnid:192.
168.
3.
102@o2ibstatus:upinterfaces:0:ib0lndtunables:peercredits_hiw:4map_on_demand:0concurrent_sends:8fmr_pool_size:512fmr_flush_trigger:384fmr_cache:1tunables:peer_timeout:180peer_credits:8peer_buffer_credits:0credits:256CPT:"[0,0,0,0]"#lnetctlrouteshow--verboseroute:-net:o2ib13.
6.
2.
1LNetRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
059gateway:192.
168.
3.
104@o2ibhop:1priority:0state:upTomaketheconfigurationpermanent:#lnetctlexport>/etc/sysconfig/lnet.
conf#echo"o2ib1:{gateway:192.
168.
3.
104@o2ib0}">/etc/sysconfig/lnet_routes.
conf#systemctlenablelnetConfigureLNetRouters(Example2)Bydefault,Lustre*softwarewilldeploythefollowingko2iblndconfiguration(/etc/modprobe.
d/koblnd.
conf)tooptimizeanyexistingOPAcard:aliasko2iblnd-opako2iblndoptionsko2iblnd-opapeer_credits=128peer_credits_hiw=64credits=1024concurrent_sends=256ntx=2048map_on_demand=32fmr_pool_size=2048fmr_flush_trigger=512fmr_cache=1installko2iblnd/usr/sbin/ko2iblnd-probeSomeofthetunablesarenotcompatiblewithInfiniBand*cards,sowewilluseDLCtosetper-cardtunablesusingthefollowingprocedure:#modprobelnet#lnetctllnetconfigure#lnetctlnetadd--neto2ib1--ifib1#lnetctlnetadd--neto2ib0--ifib0#lnetctlsetrouting1#lnetctlnetshow--verbosenet:-net:lonid:0@lostatus:uptunables:peer_timeout:0peer_credits:0peer_buffer_credits:0credits:0CPT:"[0,0,0,0]"-net:o2ib1nid:192.
168.
5.
152@o2ib1status:upinterfaces:0:ib1lndtunables:peercredits_hiw:64map_on_demand:32concurrent_sends:256fmr_pool_size:2048fmr_flush_trigger:512fmr_cache:1tunables:peer_timeout:180peer_credits:128peer_buffer_credits:0credits:1024CPT:"[0,0,0,0]"3.
6.
2.
2IntelOmni-PathFabric—LNetRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202060Doc.
No.
:H99668,Rev.
:10.
0-net:o2ibnid:192.
168.
3.
104@o2ibstatus:upinterfaces:0:ib0lndtunables:peercredits_hiw:64map_on_demand:32concurrent_sends:256fmr_pool_size:2048fmr_flush_trigger:512fmr_cache:1tunables:peer_timeout:180peer_credits:128peer_buffer_credits:0credits:1024CPT:"[0,0,0,0]"Toedittheconfigurationandtomaketheconfigurationpermanent,wewillexportit:#lnetctlexport>/etc/sysconfig/lnet.
confInfiniBand*cardsbasedonthemlx5driverarenotcompatiblewiththemap_on_demand=32andotherparameters.
FortheInfiniBand*card,editthelnet.
conffileandaddthefollowingnewparameters(boldedbelow):-net:o2ibnid:192.
168.
3.
104@o2ibstatus:upinterfaces:0:ib0lndtunables:peercredits_hiw:64map_on_demand:32concurrent_sends:256fmr_pool_size:2048fmr_flush_trigger:512fmr_cache:1tunables:peer_timeout:180peer_credits:128peer_buffer_credits:0credits:256CPT:"[0,0,0,0]"Toenabletheconfigurationatstartup:#systemctlenablelnetConfigureLustre*Servers(Example2)#modprobelnet#lnetctllnetconfigure#lnetctlnetadd--neto2ib1--ifib1#lnetctlrouteadd--neto2ib0--gateway192.
168.
5.
152@o2ib1#lnetctlnetshow--verbosenet:-net:lonid:0@lo3.
6.
2.
3LNetRouter—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
061status:uptunables:peer_timeout:0peer_credits:0peer_buffer_credits:0credits:0CPT:"[0,0,0,0]"-net:o2ib1nid:192.
168.
5.
153@o2ib1status:upinterfaces:0:ib1lndtunables:peercredits_hiw:64map_on_demand:32concurrent_sends:256fmr_pool_size:2048fmr_flush_trigger:512fmr_cache:1tunables:peer_timeout:180peer_credits:128peer_buffer_credits:0credits:1024CPT:"[0,0,0,0]"#lnetctlrouteshow--verboseroute:-net:o2ibgateway:192.
168.
5.
152@o2ib1hop:1priority:0state:upTomaketheconfigurationpermanent:#lnetctlexport>/etc/sysconfig/lnet.
conf#echo"o2ib0:{gateway:192.
168.
5.
152@o2ib1}">/etc/sysconfig/lnet_routes.
conf#systemctlenablelnetIntelOmni-PathFabric—LNetRouterIntelOmni-PathIPandLNetRouterDesignGuideJanuary202062Doc.
No.
:H99668,Rev.
:10.
0AppendixAFirewallConfigurationforVRRPonRHELVRRPusesmulticastaddress224.
0.
0.
18forVRRPAdvertisementmessagesbetweenmasterandbackuprouters.
Thistrafficcanbeobservedwithtcpdump.
ThefollowingisatcpdumpexampleofaVRRPadvertisement.
Router1#tcpdump-iib0host224.
0.
0.
18-v-v11:26:47.
454956IP(tos0xc0,ttl255,id8057,offset0,flags[none],protoVRRP(112),length40)192.
168.
100.
10>vrrp.
mcast.
net:vrrp192.
168.
100.
10>vrrp.
mcast.
net:VRRPv2,Advertisement,vrid2,prio250,authtypesimple,intvl1s,length20,addrs:192.
168.
100.
1auth"password"1.
ToallowVRRPAdvertisementsthroughthefirewallonaRedHat*EnterpriseLinux*systemusingfirewalld:#firewall-cmd--permanent--zone=public--add-rich-rule='rulefamily=ipv4destinationaddress=224.
0.
0.
18protocolvalue=ipaccept'2.
Reloadfirewalldtocommitthechange:#firewall-cmd--reload3.
Verifythenewfirewallentry:#firewall-cmd--list-allpublic(default,active)interfaces:eth0eth1ib0ib1sources:224.
0.
0.
8services:dhcpv6-clientsshports:masquerade:noforward-ports:icmp-blocks:richrules:rulefamily="ipv4"destinationaddress="224.
0.
0.
18"protocolvalue="ip"acceptInanetworkwheremultiplerouterpairsareinuseonthesamesubnetsitiscriticalthattheVRRPadvertisementsareisolatedwithineachrouterpair.
Ascanbeseenintheaboveoutputfromtcpdumpapasswordisusedineachadvertisement.
Toensurethatadvertisementsfromotherrouterpairsdonotinterfere,auniquepasswordshouldbeusedforeachrouterpair.
FirewallConfigurationforVRRPonRHEL—IntelOmni-PathFabricIntelOmni-PathIPandLNetRouterJanuary2020DesignGuideDoc.
No.
:H99668,Rev.
:10.
063AppendixBUsingkeepalivedVersion1.
3.
5Whenakeepalivednodebecomesmaster,itaddsthemasterIPaddresstothespecifiedinterfaceandissuesgratuitousARPpacketstotellthenodesconnectedtothatinterfacetoupdatetheirARPCachewiththemacaddressofthenewmaster.
keepalivedpriortoversion1.
5.
0isnotabletodothisonanIPoIBinterface,soaworkaroundisrequired.
Version1.
3.
5ofkeepalivedwastestedwiththisworkaround.
Inthe/etc/keepalived/keepalived.
conffile,foreachmasterinstanceblockwithan"ibX"namedinterface,addanotify_masterlinetoissuethegratuitousARPs.
Anexampleisshownbelowforillustrativepurposes.
Adjustasapplicableforyourinstallation.
vrrp_instanceVI_1{stateMASTERinterfaceib0virtual_router_id1priority250authentication{auth_typePASSauth_passpassword}virtual_ipaddress{192.
168.
200.
1}notify_master"/sbin/arping-q-U-c5-Iib0192.
168.
200.
1"rootroot}IntelOmni-PathFabric—UsingkeepalivedVersion1.
3.
5IntelOmni-PathIPandLNetRouterDesignGuideJanuary202064Doc.
No.
:H99668,Rev.
:10.
0

bgpto:独立服务器夏季促销,日本机器6.5折、新加坡7.5折,20M带宽,低至$93/月

bgp.to对日本机房、新加坡机房的独立服务器在搞特价促销,日本独立服务器低至6.5折优惠,新加坡独立服务器低至7.5折优惠,所有优惠都是循环的,终身不涨价。服务器不限制流量,支持升级带宽,免费支持Linux和Windows server中文版(还包括Windows 10). 特色:自动部署,无需人工干预,用户可以在后台自己重装系统、重启、关机等操作!官方网站:https://www.bgp.to...

腾讯云轻量服务器老用户续费优惠和老用户复购活动

继阿里云服务商推出轻量服务器后,腾讯云这两年对于轻量服务器的推广力度还是比较大的。实际上对于我们大部分网友用户来说,轻量服务器对于我们网站和一般的业务来说是绝对够用的。反而有些时候轻量服务器的带宽比CVM云服务器够大,配置也够好,更有是价格也便宜,所以对于初期的网站业务来说轻量服务器是够用的。这几天UCLOUD优刻得香港服务器稳定性不佳,于是有网友也在考虑搬迁到腾讯云服务器商家,对于轻量服务器官方...

菠萝云:带宽广州移动大带宽云广州云:广州移动8折优惠,月付39元

菠萝云国人商家,今天分享一下菠萝云的广州移动机房的套餐,广州移动机房分为NAT套餐和VDS套餐,NAT就是只给端口,共享IP,VDS有自己的独立IP,可做站,商家给的带宽起步为200M,最高给到800M,目前有一个8折的优惠,另外VDS有一个下单立减100元的活动,有需要的朋友可以看看。菠萝云优惠套餐:广州移动NAT套餐,开放100个TCP+UDP固定端口,共享IP,8折优惠码:gzydnat-8...

0x00000006为你推荐
12306崩溃为什么12306进不去阿丽克丝·布莱肯瑞吉阿丽克斯布莱肯瑞吉演的美国恐怖故事哪两集bbs.99nets.com怎么制作RO单机psbc.com怎样登录wap.psbc.com百度指数词百度指数我创建的新词m.kan84.net那里有免费的电影看?www.585ccc.com手机ccc认证查询,求网址关键词分析关键词分析的考虑思路是怎样的,哪个数据是最重要的月风随笔关于春夏秋冬的散文网页源代码什么是网页源代码!打开网页后怎么找?
出租服务器 中国万网虚拟主机 过期已备案域名 免费申请网页 大硬盘 安云加速器 紫田 天猫双十一抢红包 100m免费空间 老左来了 ntfs格式分区 isp服务商 服务器监测 paypal注册教程 最漂亮的qq空间 西安服务器托管 上海电信测速 美国盐湖城 电信网络测速器 石家庄服务器托管 更多