- Hardware LRO.
- Hairpin.
- Multiple-thread flow insertion.
+- Matching on IPv4 Internet Header Length (IHL).
- Matching on GTP extension header with raw encap/decap action.
- Matching on Geneve TLV option header with raw encap/decap action.
- RSS support in sample action.
- E-Switch mirroring and modify.
- 21844 flow priorities for ingress or egress flow groups greater than 0 and for any transfer
flow group.
+- Flow metering, including meter policy API.
+- Flow meter hierarchy.
+- Flow integrity offload API.
+- Connection tracking.
+- Sub-Function representors.
+- Sub-Function.
+
Limitations
-----------
- IPv4/UDP with CVLAN filtering
- Unicast MAC filtering
+ - Additional rules are supported from WinOF2 version 2.70:
+
+ - IPv4/TCP with CVLAN filtering
+ - L4 steering rules for port RSS of UDP, TCP and IP
+
- For secondary process:
- Forked secondary process not supported.
size and ``txq_inline_min`` settings and may be from 2 (worst case forced by maximal
inline settings) to 58.
-- Flows with a VXLAN Network Identifier equal (or ends to be equal)
- to 0 are not supported.
+- Match on VXLAN supports the following fields only:
+
+ - VNI
+ - Last reserved 8-bits
+
+ Last reserved 8-bits matching is only supported When using DV flow
+ engine (``dv_flow_en`` = 1).
+ For ConnectX-5, the UDP destination port must be the standard one (4789).
+ Group zero's behavior may differ which depends on FW.
+ Matching value equals 0 (value & mask) is not supported.
- L3 VXLAN and VXLAN-GPE tunnels cannot be supported together with MPLSoGRE and MPLSoUDP.
- Hairpin between two ports could only manual binding and explicit Tx flow mode. For single port hairpin, all the combinations of auto/manual binding and explicit/implicit Tx flow mode could be supported.
- Hairpin in switchdev SR-IOV mode is not supported till now.
+- Meter:
+
+ - All the meter colors with drop action will be counted only by the global drop statistics.
+ - Yellow detection is only supported with ASO metering.
+ - Red color must be with drop action.
+ - Meter statistics are supported only for drop case.
+ - A meter action created with pre-defined policy must be the last action in the flow except single case where the policy actions are:
+ - green: NULL or END.
+ - yellow: NULL or END.
+ - RED: DROP / END.
+ - The only supported meter policy actions:
+ - green: QUEUE, RSS, PORT_ID, REPRESENTED_PORT, JUMP, DROP, MARK and SET_TAG.
+ - yellow: QUEUE, RSS, PORT_ID, REPRESENTED_PORT, JUMP, DROP, MARK and SET_TAG.
+ - RED: must be DROP.
+ - Policy actions of RSS for green and yellow should have the same configuration except queues.
+ - meter profile packet mode is supported.
+ - meter profiles of RFC2697, RFC2698 and RFC4115 are supported.
+
+- Integrity:
+
+ - Integrity offload is enabled for **ConnectX-6** family.
+ - Verification bits provided by the hardware are ``l3_ok``, ``ipv4_csum_ok``, ``l4_ok``, ``l4_csum_ok``.
+ - ``level`` value 0 references outer headers.
+ - Multiple integrity items not supported in a single flow rule.
+ - Flow rule items supplied by application must explicitly specify network headers referred by integrity item.
+ For example, if integrity item mask sets ``l4_ok`` or ``l4_csum_ok`` bits, reference to L4 network header,
+ TCP or UDP, must be in the rule pattern as well::
+
+ flow create 0 ingress pattern integrity level is 0 value mask l3_ok value spec l3_ok / eth / ipv6 / end …
+ or
+ flow create 0 ingress pattern integrity level is 0 value mask l4_ok value spec 0 / eth / ipv4 proto is udp / end …
+
+- Connection tracking:
+
+ - Cannot co-exist with ASO meter, ASO age action in a single flow rule.
+ - Flow rules insertion rate and memory consumption need more optimization.
+ - 256 ports maximum.
+ - 4M connections maximum.
+
+- Multi-thread flow insertion:
+
+ - In order to achieve best insertion rate, application should manage the flows per lcore.
+ - Better to disable memory reclaim by setting ``reclaim_mem_mode`` to 0 to accelerate the flow object allocation and release with cache.
+
Statistics
----------
and each stride receives one packet. MPRQ can improve throughput for
small-packet traffic.
- When MPRQ is enabled, max_rx_pkt_len can be larger than the size of
+ When MPRQ is enabled, MTU can be larger than the size of
user-provided mbuf even if DEV_RX_OFFLOAD_SCATTER isn't enabled. PMD will
- configure large stride size enough to accommodate max_rx_pkt_len as long as
+ configure large stride size enough to accommodate MTU as long as
device allows. Note that this can waste system memory compared to enabling Rx
scatter and multi-segment packet.
it is not recommended and may prevent NIC from sending packets over
some configurations.
+ For ConnectX-4 and ConnectX-4 Lx NICs, automatically configured value
+ is insufficient for some traffic, because they require at least all L2 headers
+ to be inlined. For example, Q-in-Q adds 4 bytes to default 18 bytes
+ of Ethernet and VLAN, thus ``txq_inline_min`` must be set to 22.
+ MPLS would add 4 bytes per label. Final value must account for all possible
+ L2 encapsulation headers used in particular environment.
+
Please, note, this minimal data inlining disengages eMPW feature (Enhanced
Multi-Packet Write), because last one does not support partial packet inlining.
This is not very critical due to minimal data inlining is mostly required
Enabled by default.
+- ``mr_mempool_reg_en`` parameter [int]
+
+ A nonzero value enables implicit registration of DMA memory of all mempools
+ except those having ``RTE_MEMPOOL_F_NON_IO``. This flag is set automatically
+ for mempools populated with non-contiguous objects or those without IOVA.
+ The effect is that when a packet from a mempool is transmitted,
+ its memory is already registered for DMA in the PMD and no registration
+ will happen on the data path. The tradeoff is extra work on the creation
+ of each mempool and increased HW resource use if some mempools
+ are not used with MLX5 devices.
+
+ Enabled by default.
+
- ``representor`` parameter [list]
This parameter can be used to instantiate DPDK Ethernet devices from
By default, the PMD will set this value to 1.
+- ``allow_duplicate_pattern`` parameter [int]
+
+ There are two options to choose:
+
+ - 0. Prevent insertion of rules with the same pattern items on non-root table.
+ In this case, only the first rule is inserted and the following rules are
+ rejected and error code EEXIST is returned.
+
+ - 1. Allow insertion of rules with the same pattern items.
+ In this case, all rules are inserted but only the first rule takes effect,
+ the next rule takes effect only if the previous rules are deleted.
+
+ By default, the PMD will set this value to 1.
+
.. _mlx5_firmware_config:
Firmware configuration
echo -n "<device pci address" > /sys/bus/pci/drivers/mlx5_core/unbind
-5. Enbale switchdev mode::
+5. Enable switchdev mode::
echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
-SubFunction representor support
--------------------------------
-SubFunction is a portion of the PCI device, a SF netdev has its own
-dedicated queues(txq, rxq). A SF netdev supports E-Switch representation
-offload similar to existing PF and VF representors. A SF shares PCI
-level resources with other SFs and/or with its parent PCI function.
+Sub-Function support
+--------------------
+
+Sub-Function is a portion of the PCI device, a SF netdev has its own
+dedicated queues (txq, rxq).
+A SF shares PCI level resources with other SFs and/or with its parent PCI function.
+
+0. Requirement::
+
+ OFED version >= 5.4-0.3.3.0
1. Configure SF feature::
- mlxconfig -d <mst device> set PF_BAR2_SIZE=<0/1/2/3> PF_BAR2_ENABLE=1
+ # Run mlxconfig on both PFs on host and ECPFs on BlueField.
+ mlxconfig -d <mst device> set PER_PF_NUM_SF=1 PF_TOTAL_SF=252 PF_SF_BAR_SIZE=12
- Value of PF_BAR2_SIZE:
+2. Enable switchdev mode::
- 0: 8 SFs
- 1: 16 SFs
- 2: 32 SFs
- 3: 64 SFs
+ mlxdevm dev eswitch set pci/<DBDF> mode switchdev
-2. Reset the FW::
+3. Add SF port::
- mlxfwreset -d <mst device> reset
+ mlxdevm port add pci/<DBDF> flavour pcisf pfnum 0 sfnum <sfnum>
-3. Enable switchdev mode::
+ Get SFID from output: pci/<DBDF>/<SFID>
- echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
+4. Modify MAC address::
+
+ mlxdevm port function set pci/<DBDF>/<SFID> hw_addr <MAC>
-4. Create SF::
+5. Activate SF port::
- mlnx-sf -d <PCI_BDF> -a create
+ mlxdevm port function set pci/<DBDF>/<ID> state active
-5. Probe SF representor::
+6. Devargs to probe SF device::
- testpmd> port attach <PCI_BDF>,representor=sf0,dv_flow_en=1
+ auxiliary:mlx5_core.sf.<num>,dv_flow_en=1
+
+Sub-Function representor support
+--------------------------------
+
+A SF netdev supports E-Switch representation offload
+similar to PF and VF representors.
+Use <sfnum> to probe SF representor::
+
+ testpmd> port attach <PCI_BDF>,representor=sf<sfnum>,dv_flow_en=1
Performance tuning
------------------
for better performance. For VMs, verify that the right CPU
and NUMA node are pinned according to the above. Run::
- lstopo-no-graphics
+ lstopo-no-graphics --merge
to identify the NUMA node to which the PCIe adapter is connected.
| | | rdma-core 26 | | rdma-core 26 |
| | | ConnectX-5 | | ConnectX-5 |
+-----------------------+-----------------+-----------------+
+ | ASO Metering | | DPDK 21.05 | | DPDK 21.05 |
+ | | | OFED 5.3 | | OFED 5.3 |
+ | | | rdma-core 33 | | rdma-core 33 |
+ | | | ConnectX-6 Dx| | ConnectX-6 Dx |
+ +-----------------------+-----------------+-----------------+
+ | Metering Hierarchy | | DPDK 21.08 | | DPDK 21.08 |
+ | | | OFED 5.3 | | OFED 5.3 |
+ | | | N/A | | N/A |
+ | | | ConnectX-6 Dx| | ConnectX-6 Dx |
+ +-----------------------+-----------------+-----------------+
| Sampling | | DPDK 20.11 | | DPDK 20.11 |
| | | OFED 5.1-2 | | OFED 5.1-2 |
| | | rdma-core 32 | | N/A |
| | | rdma-core 35 | | rdma-core 35 |
| | | ConnectX-5 | | ConnectX-5 |
+-----------------------+-----------------+-----------------+
+ | Connection tracking | | | | DPDK 21.05 |
+ | | | N/A | | OFED 5.3 |
+ | | | | | rdma-core 35 |
+ | | | | | ConnectX-6 Dx |
+ +-----------------------+-----------------+-----------------+
.. table:: Minimal SW/HW versions for shared action offload
:name: sact
| | | | | rdma-core 33 |
| | | | | ConnectX-5 |
+-----------------------+-----------------+-----------------+
- | Age | | DPDK 20.11 | | DPDK 20.11 |
- | | | OFED 5.2 | | OFED 5.2 |
- | | | rdma-core 32 | | rdma-core 32 |
- | | | ConnectX-6 Dx| | ConnectX-6 Dx |
+ | Age | | DPDK 20.11 | | DPDK 20.11 |
+ | | | OFED 5.2 | | OFED 5.2 |
+ | | | rdma-core 32 | | rdma-core 32 |
+ | | | ConnectX-6 Dx | | ConnectX-6 Dx |
+ +-----------------------+-----------------+-----------------+
+ | Count | | DPDK 21.05 | | DPDK 21.05 |
+ | | | OFED 4.6 | | OFED 4.6 |
+ | | | rdma-core 24 | | rdma-core 23 |
+ | | | ConnectX-5 | | ConnectX-5 |
+-----------------------+-----------------+-----------------+
Notes for metadata
.. code-block:: console
mlx_steering_dump.py -f <output_file> -flowptr <flow_ptr>
+
+How to share a meter between ports in the same switch domain
+------------------------------------------------------------
+
+This section demonstrates how to use the shared meter. A meter M can be created
+on port X and to be shared with a port Y on the same switch domain by the next way:
+
+.. code-block:: console
+
+ flow create X ingress transfer pattern eth / port_id id is Y / end actions meter mtr_id M / end
+
+How to use meter hierarchy
+--------------------------
+
+This section demonstrates how to create and use a meter hierarchy.
+A termination meter M can be the policy green action of another termination meter N.
+The two meters are chained together as a chain. Using meter N in a flow will apply
+both the meters in hierarchy on that flow.
+
+.. code-block:: console
+
+ add port meter policy 0 1 g_actions queue index 0 / end y_actions end r_actions drop / end
+ create port meter 0 M 1 1 yes 0xffff 1 0
+ add port meter policy 0 2 g_actions meter mtr_id M / end y_actions end r_actions drop / end
+ create port meter 0 N 2 2 yes 0xffff 1 0
+ flow create 0 ingress group 1 pattern eth / end actions meter mtr_id N / end