1 .. SPDX-License-Identifier: BSD-3-Clause
2 Copyright (c) 2017, Cisco Systems, Inc.
8 ENIC PMD is the DPDK poll-mode driver for the Cisco System Inc. VIC Ethernet
9 NICs. These adapters are also referred to as vNICs below. If you are running
10 or would like to run DPDK software applications on Cisco UCS servers using
11 Cisco VIC adapters the following documentation is relevant.
13 How to obtain ENIC PMD integrated DPDK
14 --------------------------------------
16 ENIC PMD support is integrated into the DPDK suite. dpdk-<version>.tar.gz
17 should be downloaded from http://core.dpdk.org/download/
20 Configuration information
21 -------------------------
23 - **DPDK Configuration Parameters**
25 The following configuration options are available for the ENIC PMD:
27 - **CONFIG_RTE_LIBRTE_ENIC_PMD** (default y): Enables or disables inclusion
28 of the ENIC PMD driver in the DPDK compilation.
30 - **vNIC Configuration Parameters**
32 - **Number of Queues**
34 The maximum number of receive queues (RQs), work queues (WQs) and
35 completion queues (CQs) are configurable on a per vNIC basis
36 through the Cisco UCS Manager (CIMC or UCSM).
38 These values should be configured as follows:
40 - The number of WQs should be greater or equal to the value of the
41 expected nb_tx_q parameter in the call to
42 rte_eth_dev_configure()
44 - The number of RQs configured in the vNIC should be greater or
45 equal to *twice* the value of the expected nb_rx_q parameter in
46 the call to rte_eth_dev_configure(). With the addition of Rx
47 scatter, a pair of RQs on the vnic is needed for each receive
48 queue used by DPDK, even if Rx scatter is not being used.
49 Having a vNIC with only 1 RQ is not a valid configuration, and
50 will fail with an error message.
52 - The number of CQs should set so that there is one CQ for each
53 WQ, and one CQ for each pair of RQs.
55 For example: If the application requires 3 Rx queues, and 3 Tx
56 queues, the vNIC should be configured to have at least 3 WQs, 6
57 RQs (3 pairs), and 6 CQs (3 for use by WQs + 3 for use by the 3
62 Likewise, the number of receive and transmit descriptors are configurable on
63 a per-vNIC basis via the UCS Manager and should be greater than or equal to
64 the nb_rx_desc and nb_tx_desc parameters expected to be used in the calls
65 to rte_eth_rx_queue_setup() and rte_eth_tx_queue_setup() respectively.
66 An application requesting more than the set size will be limited to that
69 Unless there is a lack of resources due to creating many vNICs, it
70 is recommended that the WQ and RQ sizes be set to the maximum. This
71 gives the application the greatest amount of flexibility in its
74 - *Note*: Since the introduction of Rx scatter, for performance
75 reasons, this PMD uses two RQs on the vNIC per receive queue in
76 DPDK. One RQ holds descriptors for the start of a packet, and the
77 second RQ holds the descriptors for the rest of the fragments of
78 a packet. This means that the nb_rx_desc parameter to
79 rte_eth_rx_queue_setup() can be a greater than 4096. The exact
80 amount will depend on the size of the mbufs being used for
81 receives, and the MTU size.
83 For example: If the mbuf size is 2048, and the MTU is 9000, then
84 receiving a full size packet will take 5 descriptors, 1 from the
85 start-of-packet queue, and 4 from the second queue. Assuming
86 that the RQ size was set to the maximum of 4096, then the
87 application can specify up to 1024 + 4096 as the nb_rx_desc
88 parameter to rte_eth_rx_queue_setup().
92 At least one interrupt per vNIC interface should be configured in the UCS
93 manager regardless of the number receive/transmit queues. The ENIC PMD
94 uses this interrupt to get information about link status and errors
97 In addition to the interrupt for link status and errors, when using Rx queue
98 interrupts, increase the number of configured interrupts so that there is at
99 least one interrupt for each Rx queue. For example, if the app uses 3 Rx
100 queues and wants to use per-queue interrupts, configure 4 (3 + 1) interrupts.
102 - **Receive Side Scaling**
104 In order to fully utilize RSS in DPDK, enable all RSS related settings in
105 CIMC or UCSM. These include the following items listed under
106 Receive Side Scaling:
107 TCP, IPv4, TCP-IPv4, IPv6, TCP-IPv6, IPv6 Extension, TCP-IPv6 Extension.
110 SR-IOV mode utilization
111 -----------------------
113 UCS blade servers configured with dynamic vNIC connection policies in UCSM
114 are capable of supporting SR-IOV. SR-IOV virtual functions (VFs) are
115 specialized vNICs, distinct from regular Ethernet vNICs. These VFs can be
116 directly assigned to virtual machines (VMs) as 'passthrough' devices.
118 In UCS, SR-IOV VFs require the use of the Cisco Virtual Machine Fabric Extender
119 (VM-FEX), which gives the VM a dedicated
120 interface on the Fabric Interconnect (FI). Layer 2 switching is done at
121 the FI. This may eliminate the requirement for software switching on the
122 host to route intra-host VM traffic.
124 Please refer to `Creating a Dynamic vNIC Connection Policy
125 <http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/sw/vm_fex/vmware/gui/config_guide/b_GUI_VMware_VM-FEX_UCSM_Configuration_Guide/b_GUI_VMware_VM-FEX_UCSM_Configuration_Guide_chapter_010.html#task_433E01651F69464783A68E66DA8A47A5>`_
126 for information on configuring SR-IOV adapter policies and port profiles
129 Once the policies are in place and the host OS is rebooted, VFs should be
130 visible on the host, E.g.:
132 .. code-block:: console
134 # lspci | grep Cisco | grep Ethernet
135 0d:00.0 Ethernet controller: Cisco Systems Inc VIC Ethernet NIC (rev a2)
136 0d:00.1 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
137 0d:00.2 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
138 0d:00.3 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
139 0d:00.4 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
140 0d:00.5 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
141 0d:00.6 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
142 0d:00.7 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
144 Enable Intel IOMMU on the host and install KVM and libvirt, and reboot again as
145 required. Then, using libvirt, create a VM instance with an assigned device.
146 Below is an example ``interface`` block (part of the domain configuration XML)
147 that adds the host VF 0d:00:01 to the VM. ``profileid='pp-vlan-25'`` indicates
148 the port profile that has been configured in UCSM.
150 .. code-block:: console
152 <interface type='hostdev' managed='yes'>
153 <mac address='52:54:00:ac:ff:b6'/>
154 <driver name='vfio'/>
156 <address type='pci' domain='0x0000' bus='0x0d' slot='0x00' function='0x1'/>
158 <virtualport type='802.1Qbh'>
159 <parameters profileid='pp-vlan-25'/>
164 Alternatively, the configuration can be done in a separate file using the
165 ``network`` keyword. These methods are described in the libvirt documentation for
166 `Network XML format <https://libvirt.org/formatnetwork.html>`_.
168 When the VM instance is started, libvirt will bind the host VF to
169 vfio, complete provisioning on the FI and bring up the link.
173 It is not possible to use a VF directly from the host because it is not
174 fully provisioned until libvirt brings up the VM that it is assigned
177 In the VM instance, the VF will now be visible. E.g., here the VF 00:04.0 is
178 seen on the VM instance and should be available for binding to a DPDK.
180 .. code-block:: console
183 00:04.0 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
185 Follow the normal DPDK install procedure, binding the VF to either ``igb_uio``
186 or ``vfio`` in non-IOMMU mode.
188 In the VM, the kernel enic driver may be automatically bound to the VF during
189 boot. Unbinding it currently hangs due to a known issue with the driver. To
190 work around the issue, blacklist the enic module as follows.
191 Please see :ref:`Limitations <enic_limitations>` for limitations in
194 .. code-block:: console
196 # cat /etc/modprobe.d/enic.conf
203 Passthrough does not require SR-IOV. If VM-FEX is not desired, the user
204 may create as many regular vNICs as necessary and assign them to VMs as
205 passthrough devices. Since these vNICs are not SR-IOV VFs, using them as
206 passthrough devices do not require libvirt, port profiles, and VM-FEX.
209 .. _enic-generic-flow-api:
211 Generic Flow API support
212 ------------------------
214 Generic Flow API (also called "rte_flow" API) is supported. More advanced
215 capabilities are available when "Advanced Filtering" is enabled on the adapter.
216 Advanced filtering was added to 1300 series VIC firmware starting with version
217 2.0.13 for C-series UCS servers and version 3.1.2 for UCSM managed blade
218 servers. Advanced filtering is available on 1400 series adapters and beyond.
219 To enable advanced filtering, the 'Advanced filter' radio button should be
220 selected via CIMC or UCSM followed by a reboot of the server.
222 - **1200 series VICs**
224 5-tuple exact flow support for 1200 series adapters. This allows:
226 - Attributes: ingress
227 - Items: ipv4, ipv6, udp, tcp (must exactly match src/dst IP
228 addresses and ports and all must be specified)
229 - Actions: queue and void
232 - **1300 and later series VICS with advanced filters disabled**
234 With advanced filters disabled, an IPv4 or IPv6 item must be specified
237 - Attributes: ingress
238 - Items: eth, vlan, ipv4, ipv6, udp, tcp, vxlan, inner eth, vlan, ipv4, ipv6, udp, tcp
239 - Actions: queue and void
240 - Selectors: 'is', 'spec' and 'mask'. 'last' is not supported
241 - In total, up to 64 bytes of mask is allowed across all headers
243 - **1300 and later series VICS with advanced filters enabled**
245 - Attributes: ingress
246 - Items: eth, vlan, ipv4, ipv6, udp, tcp, vxlan, raw, inner eth, vlan, ipv4, ipv6, udp, tcp
247 - Actions: queue, mark, drop, flag, rss, passthru, and void
248 - Selectors: 'is', 'spec' and 'mask'. 'last' is not supported
249 - In total, up to 64 bytes of mask is allowed across all headers
251 - **1400 and later series VICs with Flow Manager API enabled**
253 - Attributes: ingress, egress
254 - Items: eth, vlan, ipv4, ipv6, sctp, udp, tcp, vxlan, raw, inner eth, vlan, ipv4, ipv6, sctp, udp, tcp
255 - Ingress Actions: count, drop, flag, jump, mark, port_id, passthru, queue, rss, vxlan_decap, vxlan_encap, and void
256 - Egress Actions: count, drop, jump, passthru, vxlan_encap, and void
257 - Selectors: 'is', 'spec' and 'mask'. 'last' is not supported
258 - In total, up to 64 bytes of mask is allowed across all headers
260 The VIC performs packet matching after applying VLAN strip. If VLAN
261 stripping is enabled, EtherType in the ETH item corresponds to the
262 stripped VLAN header's EtherType. Stripping does not affect the VLAN
263 item. TCI and EtherType in the VLAN item are matched against those in
264 the (stripped) VLAN header whether stripping is enabled or disabled.
266 More features may be added in future firmware and new versions of the VIC.
267 Please refer to the release notes.
274 Recent hardware models support overlay offload. When enabled, the NIC performs
275 the following operations for VXLAN, NVGRE, and GENEVE packets. In all cases,
276 inner and outer packets can be IPv4 or IPv6.
278 - TSO for VXLAN and GENEVE packets.
280 Hardware supports NVGRE TSO, but DPDK currently has no NVGRE offload flags.
282 - Tx checksum offloads.
284 The NIC fills in IPv4/UDP/TCP checksums for both inner and outer packets.
286 - Rx checksum offloads.
288 The NIC validates IPv4/UDP/TCP checksums of both inner and outer packets.
289 Good checksum flags (e.g. ``PKT_RX_L4_CKSUM_GOOD``) indicate that the inner
290 packet has the correct checksum, and if applicable, the outer packet also
291 has the correct checksum. Bad checksum flags (e.g. ``PKT_RX_L4_CKSUM_BAD``)
292 indicate that the inner and/or outer packets have invalid checksum values.
294 - Inner Rx packet type classification
296 PMD sets inner L3/L4 packet types (e.g. ``RTE_PTYPE_INNER_L4_TCP``), and
297 ``RTE_PTYPE_TUNNEL_GRENAT`` to indicate that the packet is tunneled.
298 PMD does not set L3/L4 packet types for outer packets.
302 RSS hash calculation, therefore queue selection, is done on inner packets.
304 In order to enable overlay offload, the 'Enable VXLAN' box should be checked
305 via CIMC or UCSM followed by a reboot of the server. When PMD successfully
306 enables overlay offload, it prints the following message on the console.
308 .. code-block:: console
310 Overlay offload is enabled
312 By default, PMD enables overlay offload if hardware supports it. To disable
313 it, set ``devargs`` parameter ``disable-overlay=1``. For example::
315 -w 12:00.0,disable-overlay=1
317 By default, the NIC uses 4789 as the VXLAN port. The user may change
318 it through ``rte_eth_dev_udp_tunnel_port_{add,delete}``. However, as
319 the current NIC has a single VXLAN port number, the user cannot
320 configure multiple port numbers.
322 Geneve headers with non-zero options are not supported by default. To
323 use Geneve with options, update the VIC firmware to the latest version
324 and then set ``devargs`` parameter ``geneve-opt=1``. When Geneve with
325 options is enabled, flow API cannot be used as the features are
326 currently mutually exclusive. When this feature is successfully
327 enabled, PMD prints the following message.
329 .. code-block:: console
331 Geneve with options is enabled
337 VIC adapters can tag, untag, or modify the VLAN headers of ingress
338 packets. The ingress VLAN rewrite mode controls this behavior. By
339 default, it is set to pass-through, where the NIC does not modify the
340 VLAN header in any way so that the application can see the original
341 header. This mode is sufficient for many applications, but may not be
342 suitable for others. Such applications may change the mode by setting
343 ``devargs`` parameter ``ig-vlan-rewrite`` to one of the following.
345 - ``pass``: Pass-through mode. The NIC does not modify the VLAN
346 header. This is the default mode.
348 - ``priority``: Priority-tag default VLAN mode. If the ingress packet
349 is tagged with the default VLAN, the NIC replaces its VLAN header
350 with the priority tag (VLAN ID 0).
352 - ``trunk``: Default trunk mode. The NIC tags untagged ingress packets
353 with the default VLAN. Tagged ingress packets are not modified. To
354 the application, every packet appears as tagged.
356 - ``untag``: Untag default VLAN mode. If the ingress packet is tagged
357 with the default VLAN, the NIC removes or untags its VLAN header so
358 that the application sees an untagged packet. As a result, the
359 default VLAN becomes `untagged`. This mode can be useful for
360 applications such as OVS-DPDK performance benchmarks that utilize
361 only the default VLAN and want to see only untagged packets.
364 Vectorized Rx Handler
365 ---------------------
367 ENIC PMD includes a version of the receive handler that is vectorized using
368 AVX2 SIMD instructions. It is meant for bulk, throughput oriented workloads
369 where reducing cycles/packet in PMD is a priority. In order to use the
370 vectorized handler, take the following steps.
372 - Use a recent version of gcc, icc, or clang and build 64-bit DPDK. If
373 the compiler is known to support AVX2, DPDK build system
374 automatically compiles the vectorized handler. Otherwise, the
375 handler is not available.
377 - Set ``devargs`` parameter ``enable-avx2-rx=1`` to explicitly request that
378 PMD consider the vectorized handler when selecting the receive handler.
381 -w 12:00.0,enable-avx2-rx=1
383 As the current implementation is intended for field trials, by default, the
384 vectorized handler is not considered (``enable-avx2-rx=0``).
386 - Run on a UCS M4 or later server with CPUs that support AVX2.
388 PMD selects the vectorized handler when the handler is compiled into
389 the driver, the user requests its use via ``enable-avx2-rx=1``, CPU
390 supports AVX2, and scatter Rx is not used. To verify that the
391 vectorized handler is selected, enable debug logging
392 (``--log-level=pmd,debug``) and check the following message.
394 .. code-block:: console
396 enic_use_vector_rx_handler use the non-scatter avx2 Rx handler
398 .. _enic_limitations:
403 - **VLAN 0 Priority Tagging**
405 If a vNIC is configured in TRUNK mode by the UCS manager, the adapter will
406 priority tag egress packets according to 802.1Q if they were not already
407 VLAN tagged by software. If the adapter is connected to a properly configured
408 switch, there will be no unexpected behavior.
410 In test setups where an Ethernet port of a Cisco adapter in TRUNK mode is
411 connected point-to-point to another adapter port or connected though a router
412 instead of a switch, all ingress packets will be VLAN tagged. Programs such
413 as l3fwd may not account for VLAN tags in packets and may misbehave. One
414 solution is to enable VLAN stripping on ingress so the VLAN tag is removed
415 from the packet and put into the mbuf->vlan_tci field. Here is an example
416 of how to accomplish this:
418 .. code-block:: console
420 vlan_offload = rte_eth_dev_get_vlan_offload(port);
421 vlan_offload |= ETH_VLAN_STRIP_OFFLOAD;
422 rte_eth_dev_set_vlan_offload(port, vlan_offload);
424 Another alternative is modify the adapter's ingress VLAN rewrite mode so that
425 packets with the default VLAN tag are stripped by the adapter and presented to
426 DPDK as untagged packets. In this case mbuf->vlan_tci and the PKT_RX_VLAN and
427 PKT_RX_VLAN_STRIPPED mbuf flags would not be set. This mode is enabled with the
428 ``devargs`` parameter ``ig-vlan-rewrite=untag``. For example::
430 -w 12:00.0,ig-vlan-rewrite=untag
434 - KVM hypervisor support only. VMware has not been tested.
435 - Requires VM-FEX, and so is only available on UCS managed servers connected
436 to Fabric Interconnects. It is not on standalone C-Series servers.
437 - VF devices are not usable directly from the host. They can only be used
438 as assigned devices on VM instances.
439 - Currently, unbind of the ENIC kernel mode driver 'enic.ko' on the VM
440 instance may hang. As a workaround, enic.ko should be blacklisted or removed
441 from the boot process.
442 - pci_generic cannot be used as the uio module in the VM. igb_uio or
443 vfio in non-IOMMU mode can be used.
444 - The number of RQs in UCSM dynamic vNIC configurations must be at least 2.
445 - The number of SR-IOV devices is limited to 256. Components on target system
446 might limit this number to fewer than 256.
450 - The number of filters that can be specified with the Generic Flow API is
451 dependent on how many header fields are being masked. Use 'flow create' in
452 a loop to determine how many filters your VIC will support (not more than
453 1000 for 1300 series VICs). Filters are checked for matching in the order they
454 were added. Since there currently is no grouping or priority support,
455 'catch-all' filters should be added last.
456 - The supported range of IDs for the 'MARK' action is 0 - 0xFFFD.
457 - RSS and PASSTHRU actions only support "receive normally". They are limited
458 to supporting MARK + RSS and PASSTHRU + MARK to allow the application to mark
459 packets and then receive them normally. These require 1400 series VIC adapters
461 - RAW items are limited to matching UDP tunnel headers like VXLAN.
465 - ``rx_good_bytes`` (ibytes) always includes VLAN header (4B) and CRC bytes (4B).
466 This behavior applies to 1300 and older series VIC adapters.
467 1400 series VICs do not count CRC bytes, and count VLAN header only when VLAN
468 stripping is disabled.
469 - When the NIC drops a packet because the Rx queue has no free buffers,
470 ``rx_good_bytes`` still increments by 4B if the packet is not VLAN tagged or
471 VLAN stripping is disabled, or by 8B if the packet is VLAN tagged and stripping
473 This behavior applies to 1300 and older series VIC adapters. 1400 series VICs
474 do not increment this byte counter when packets are dropped.
478 - Hardware enables and disables UDP and TCP RSS hashing together. The driver
479 cannot control UDP and TCP hashing individually.
481 How to build the suite
482 ----------------------
484 The build instructions for the DPDK suite should be followed. By default
485 the ENIC PMD library will be built into the DPDK library.
487 Refer to the document :ref:`compiling and testing a PMD for a NIC
488 <pmd_build_and_test>` for details.
490 For configuring and using UIO and VFIO frameworks, please refer to the
491 documentation that comes with DPDK suite.
493 Supported Cisco VIC adapters
494 ----------------------------
496 ENIC PMD supports all recent generations of Cisco VIC adapters including:
502 Supported Operating Systems
503 ---------------------------
505 Any Linux distribution fulfilling the conditions described in Dependencies
506 section of DPDK documentation.
511 - Unicast, multicast and broadcast transmission and reception
512 - Receive queue polling
513 - Port Hardware Statistics
514 - Hardware VLAN acceleration
515 - IP checksum offload
516 - Receive side VLAN stripping
517 - Multiple receive and transmit queues
519 - Setting RX VLAN (supported via UCSM/CIMC only)
520 - VLAN filtering (supported via UCSM/CIMC only)
521 - Execution of application by unprivileged system users
522 - IPV4, IPV6 and TCP RSS hashing
523 - UDP RSS hashing (1400 series and later adapters)
526 - SR-IOV on UCS managed servers connected to Fabric Interconnects
530 - Rx/Tx checksum offloads for VXLAN, NVGRE, GENEVE
531 - TSO for VXLAN and GENEVE packets
534 Known bugs and unsupported features in this release
535 ---------------------------------------------------
537 - Signature or flex byte based flow direction
538 - Drop feature of flow direction
539 - VLAN based flow direction
540 - Non-IPV4 flow direction
541 - Setting of extended VLAN
542 - MTU update only works if Scattered Rx mode is disabled
543 - Maximum receive packet length is ignored if Scattered Rx mode is used
548 - Prepare the system as recommended by DPDK suite. This includes environment
549 variables, hugepages configuration, tool-chains and configuration.
550 - Insert vfio-pci kernel module using the command 'modprobe vfio-pci' if the
551 user wants to use VFIO framework.
552 - Insert uio kernel module using the command 'modprobe uio' if the user wants
553 to use UIO framework.
554 - DPDK suite should be configured based on the user's decision to use VFIO or
556 - If the vNIC device(s) to be used is bound to the kernel mode Ethernet driver
557 use 'ip' to bring the interface down. The dpdk-devbind.py tool can
558 then be used to unbind the device's bus id from the ENIC kernel mode driver.
559 - Bind the intended vNIC to vfio-pci in case the user wants ENIC PMD to use
560 VFIO framework using dpdk-devbind.py.
561 - Bind the intended vNIC to igb_uio in case the user wants ENIC PMD to use
562 UIO framework using dpdk-devbind.py.
564 At this point the system should be ready to run DPDK applications. Once the
565 application runs to completion, the vNIC can be detached from vfio-pci or
566 igb_uio if necessary.
568 Root privilege is required to bind and unbind vNICs to/from VFIO/UIO.
569 VFIO framework helps an unprivileged user to run the applications.
570 For an unprivileged user to run the applications on DPDK and ENIC PMD,
571 it may be necessary to increase the maximum locked memory of the user.
572 The following command could be used to do this.
574 .. code-block:: console
576 sudo sh -c "ulimit -l <value in Kilo Bytes>"
578 The value depends on the memory configuration of the application, DPDK and
579 PMD. Typically, the limit has to be raised to higher than 2GB.
582 The compilation of any unused drivers can be disabled using the
583 configuration file in config/ directory (e.g., config/common_linux).
584 This would help in bringing down the time taken for building the
585 libraries and the initialization time of the application.
590 - https://www.cisco.com/c/en/us/products/servers-unified-computing/index.html
591 - https://www.cisco.com/c/en/us/products/interfaces-modules/unified-computing-system-adapters/index.html
596 Any questions or bugs should be reported to DPDK community and to the ENIC PMD
599 - John Daley <johndale@cisco.com>
600 - Hyong Youb Kim <hyonkim@cisco.com>