MLX5 poll mode driver
=====================
-The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support for
-**Mellanox ConnectX-4 EN** and **Mellanox ConnectX-4 Lx EN** families of
-10/25/40/50/100 Gb/s adapters as well as their virtual functions (VF) in
-SR-IOV context.
+The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support
+for **Mellanox ConnectX-4**, **Mellanox ConnectX-4 Lx** and **Mellanox
+ConnectX-5** families of 10/25/40/50/100 Gb/s adapters as well as their
+virtual functions (VF) in SR-IOV context.
Information and documentation about these adapters can be found on the
`Mellanox website <http://www.mellanox.com>`__. Help is also provided by the
be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX5_PMD=y`` and
recompiling DPDK.
-.. warning::
-
- ``CONFIG_RTE_BUILD_COMBINE_LIBS`` with ``CONFIG_RTE_BUILD_SHARED_LIB``
- is not supported and thus the compilation will fail with this configuration.
-
Implementation details
----------------------
- Multiple TX and RX queues.
- Support for scattered TX and RX frames.
-- IPv4, TCPv4 and UDPv4 RSS on any number of queues.
+- IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues.
- Several RSS hash keys, one for each flow type.
+- Configurable RETA table.
- Support for multiple MAC addresses.
- VLAN filtering.
+- RX VLAN stripping.
+- TX VLAN insertion.
+- RX CRC stripping configuration.
- Promiscuous mode.
+- Multicast promiscuous mode.
+- Hardware checksum offloads.
+- Flow director (RTE_FDIR_MODE_PERFECT, RTE_FDIR_MODE_PERFECT_MAC_VLAN and
+ RTE_ETH_FDIR_REJECT).
+- Flow API.
+- Secondary process TX is supported.
+- KVM and VMware ESX SR-IOV modes are supported.
+- RSS hash result is supported.
+- Hardware TSO.
+- Hardware checksum TX offload for VXLAN and GRE.
Limitations
-----------
-- IPv6 and inner VXLAN RSS are not supported yet.
+- Inner RSS for VXLAN frames is not supported yet.
- Port statistics through software counters only.
-- No allmulticast mode.
-- Hardware checksum offloads are not supported yet.
+- Hardware checksum RX offloads for VXLAN inner header are not supported yet.
+- Secondary process RX is not supported.
Configuration
-------------
adds additional run-time checks and debugging messages at the cost of
lower performance.
-- ``CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N`` (default **4**)
-
- Number of scatter/gather elements (SGEs) per work request (WR). Lowering
- this number improves performance but also limits the ability to receive
- scattered packets (packets that do not fit a single mbuf). The default
- value is a safe tradeoff.
-
-- ``CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE`` (default **0**)
-
- Amount of data to be inlined during TX operations. Improves latency but
- lowers throughput.
-
- ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**)
Maximum number of cached memory pools (MPs) per TX queue. Each MP from
This value is always 1 for RX queues since they use a single MP.
+Environment variables
+~~~~~~~~~~~~~~~~~~~~~
+
+- ``MLX5_PMD_ENABLE_PADDING``
+
+ Enables HW packet padding in PCI bus transactions.
+
+ When packet size is cache aligned and CRC stripping is enabled, 4 fewer
+ bytes are written to the PCI bus. Enabling padding makes such packets
+ aligned again.
+
+ In cases where PCI bandwidth is the bottleneck, padding can improve
+ performance by 10%.
+
+ This is disabled by default since this can also decrease performance for
+ unaligned packet sizes.
+
Run-time configuration
~~~~~~~~~~~~~~~~~~~~~~
- **ethtool** operations on related kernel interfaces also affect the PMD.
+- ``rxq_cqe_comp_en`` parameter [int]
+
+ A nonzero value enables the compression of CQE on RX side. This feature
+ allows to save PCI bandwidth and improve performance at the cost of a
+ slightly higher CPU usage. Enabled by default.
+
+ Supported on:
+
+ - x86_64 with ConnectX4 and ConnectX4 LX
+ - Power8 with ConnectX4 LX
+
+- ``txq_inline`` parameter [int]
+
+ Amount of data to be inlined during TX operations. Improves latency.
+ Can improve PPS performance when PCI back pressure is detected and may be
+ useful for scenarios involving heavy traffic on many queues.
+
+ It is not enabled by default (set to 0) since the additional software
+ logic necessary to handle this mode can lower performance when back
+ pressure is not expected.
+
+- ``txqs_min_inline`` parameter [int]
+
+ Enable inline send only when the number of TX queues is greater or equal
+ to this value.
+
+ This option should be used in combination with ``txq_inline`` above.
+
+- ``txq_mpw_en`` parameter [int]
+
+ A nonzero value enables multi-packet send (MPS) for ConnectX-4 Lx and
+ enhanced multi-packet send (Enhanced MPS) for ConnectX-5. MPS allows the
+ TX burst function to pack up multiple packets in a single descriptor
+ session in order to save PCI bandwidth and improve performance at the
+ cost of a slightly higher CPU usage. When ``txq_inline`` is set along
+ with ``txq_mpw_en``, TX burst function tries to copy entire packet data
+ on to TX descriptor instead of including pointer of packet only if there
+ is enough room remained in the descriptor. ``txq_inline`` sets
+ per-descriptor space for either pointers or inlined packets. In addition,
+ Enhanced MPS supports hybrid mode - mixing inlined packets and pointers
+ in the same descriptor.
+
+ This option cannot be used in conjunction with ``tso`` below. When ``tso``
+ is set, ``txq_mpw_en`` is disabled.
+
+ It is currently only supported on the ConnectX-4 Lx and ConnectX-5
+ families of adapters. Enabled by default.
+
+- ``txq_mpw_hdr_dseg_en`` parameter [int]
+
+ A nonzero value enables including two pointers in the first block of TX
+ descriptor. This can be used to lessen CPU load for memory copy.
+
+ Effective only when Enhanced MPS is supported. Disabled by default.
+
+- ``txq_max_inline_len`` parameter [int]
+
+ Maximum size of packet to be inlined. This limits the size of packet to
+ be inlined. If the size of a packet is larger than configured value, the
+ packet isn't inlined even though there's enough space remained in the
+ descriptor. Instead, the packet is included with pointer.
+
+ Effective only when Enhanced MPS is supported. The default value is 256.
+
+- ``tso`` parameter [int]
+
+ A nonzero value enables hardware TSO.
+ When hardware TSO is enabled, packets marked with TCP segmentation
+ offload will be divided into segments by the hardware.
+
+ Disabled by default.
+
Prerequisites
-------------
- **libmlx5**
- Low-level user space driver library for Mellanox ConnectX-4 devices,
- it is automatically loaded by libibverbs.
+ Low-level user space driver library for Mellanox ConnectX-4/ConnectX-5
+ devices, it is automatically loaded by libibverbs.
This library basically implements send/receive calls to the hardware
queues.
Unlike most other PMDs, these modules must remain loaded and bound to
their devices:
- - mlx5_core: hardware driver managing Mellanox ConnectX-4 devices and
- related Ethernet kernel network devices.
+ - mlx5_core: hardware driver managing Mellanox ConnectX-4/ConnectX-5
+ devices and related Ethernet kernel network devices.
- mlx5_ib: InifiniBand device driver.
- ib_uverbs: user space driver for Verbs (entry point for libibverbs).
- **Firmware update**
- Mellanox OFED releases include firmware updates for ConnectX-4 adapters.
+ Mellanox OFED releases include firmware updates for ConnectX-4/ConnectX-5
+ adapters.
Because each release provides new features, these updates must be applied to
match the kernel modules and libraries they come with.
Currently supported by DPDK:
-- Mellanox OFED **3.1**.
-- Minimum firmware version:
- - ConnectX-4: **12.12.0780**.
- - ConnectX-4 Lx: **14.12.0780**.
+- Mellanox OFED version: **4.0-1.0.1.0**
+- firmware version:
+
+ - ConnectX-4: **12.18.1000**
+ - ConnectX-4 Lx: **14.18.1000**
+ - ConnectX-5: **16.18.1000**
+ - ConnectX-5 Ex: **16.18.1000**
Getting Mellanox OFED
~~~~~~~~~~~~~~~~~~~~~
this DPDK release was developed and tested against is strongly
recommended. Please check the `prerequisites`_.
+Supported NICs
+--------------
+
+* Mellanox(R) ConnectX(R)-4 10G MCX4111A-XCAT (1x10G)
+* Mellanox(R) ConnectX(R)-4 10G MCX4121A-XCAT (2x10G)
+* Mellanox(R) ConnectX(R)-4 25G MCX4111A-ACAT (1x25G)
+* Mellanox(R) ConnectX(R)-4 25G MCX4121A-ACAT (2x25G)
+* Mellanox(R) ConnectX(R)-4 40G MCX4131A-BCAT (1x40G)
+* Mellanox(R) ConnectX(R)-4 40G MCX413A-BCAT (1x40G)
+* Mellanox(R) ConnectX(R)-4 40G MCX415A-BCAT (1x40G)
+* Mellanox(R) ConnectX(R)-4 50G MCX4131A-GCAT (1x50G)
+* Mellanox(R) ConnectX(R)-4 50G MCX413A-GCAT (1x50G)
+* Mellanox(R) ConnectX(R)-4 50G MCX414A-BCAT (2x50G)
+* Mellanox(R) ConnectX(R)-4 50G MCX415A-GCAT (2x50G)
+* Mellanox(R) ConnectX(R)-4 50G MCX416A-BCAT (2x50G)
+* Mellanox(R) ConnectX(R)-4 50G MCX416A-GCAT (2x50G)
+* Mellanox(R) ConnectX(R)-4 50G MCX415A-CCAT (1x100G)
+* Mellanox(R) ConnectX(R)-4 100G MCX416A-CCAT (2x100G)
+* Mellanox(R) ConnectX(R)-4 Lx 10G MCX4121A-XCAT (2x10G)
+* Mellanox(R) ConnectX(R)-4 Lx 25G MCX4121A-ACAT (2x25G)
+* Mellanox(R) ConnectX(R)-5 100G MCX556A-ECAT (2x100G)
+* Mellanox(R) ConnectX(R)-5 Ex EN 100G MCX516A-CDAT (2x100G)
+
+Notes for testpmd
+-----------------
+
+Compared to librte_pmd_mlx4 that implements a single RSS configuration per
+port, librte_pmd_mlx5 supports per-protocol RSS configuration.
+
+Since ``testpmd`` defaults to IP RSS mode and there is currently no
+command-line parameter to enable additional protocols (UDP and TCP as well
+as IP), the following commands must be entered from its CLI to get the same
+behavior as librte_pmd_mlx4:
+
+.. code-block:: console
+
+ > port stop all
+ > port config all rss all
+ > port start all
+
Usage example
-------------
-This section demonstrates how to launch **testpmd** with Mellanox ConnectX-4
-devices managed by librte_pmd_mlx5.
+This section demonstrates how to launch **testpmd** with Mellanox
+ConnectX-4/ConnectX-5 devices managed by librte_pmd_mlx5.
#. Load the kernel modules:
modprobe -a ib_uverbs mlx5_core mlx5_ib
+ Alternatively if MLNX_OFED is fully installed, the following script can
+ be run:
+
+ .. code-block:: console
+
+ /etc/init.d/openibd restart
+
.. note::
User space I/O kernel modules (uio and igb_uio) are not used and do
.. code-block:: console
- testpmd -c 0xff00 -n 4 -w 05:00.0 -w 05:00.1 -w 06:00.0 -w 06:00.1 -- --rxq=2 --txq=2 -i
+ testpmd -l 8-15 -n 4 -w 05:00.0 -w 05:00.1 -w 06:00.0 -w 06:00.1 -- --rxq=2 --txq=2 -i
Example output: