+.. _overlay_offload:
+
+Overlay Offload
+---------------
+
+Recent hardware models support overlay offload. When enabled, the NIC performs
+the following operations for VXLAN, NVGRE, and GENEVE packets. In all cases,
+inner and outer packets can be IPv4 or IPv6.
+
+- TSO for VXLAN and GENEVE packets.
+
+ Hardware supports NVGRE TSO, but DPDK currently has no NVGRE offload flags.
+
+- Tx checksum offloads.
+
+ The NIC fills in IPv4/UDP/TCP checksums for both inner and outer packets.
+
+- Rx checksum offloads.
+
+ The NIC validates IPv4/UDP/TCP checksums of both inner and outer packets.
+ Good checksum flags (e.g. ``RTE_MBUF_F_RX_L4_CKSUM_GOOD``) indicate that the inner
+ packet has the correct checksum, and if applicable, the outer packet also
+ has the correct checksum. Bad checksum flags (e.g. ``RTE_MBUF_F_RX_L4_CKSUM_BAD``)
+ indicate that the inner and/or outer packets have invalid checksum values.
+
+- Inner Rx packet type classification
+
+ PMD sets inner L3/L4 packet types (e.g. ``RTE_PTYPE_INNER_L4_TCP``), and
+ ``RTE_PTYPE_TUNNEL_GRENAT`` to indicate that the packet is tunneled.
+ PMD does not set L3/L4 packet types for outer packets.
+
+- Inner RSS
+
+ RSS hash calculation, therefore queue selection, is done on inner packets.
+
+In order to enable overlay offload, enable VXLAN and/or Geneve on vNIC
+via CIMC or UCSM followed by a reboot of the server. When PMD successfully
+enables overlay offload, it prints one of the following messages on the console.
+
+.. code-block:: console
+
+ Overlay offload is enabled (VxLAN)
+ Overlay offload is enabled (Geneve)
+ Overlay offload is enabled (VxLAN, Geneve)
+
+By default, PMD enables overlay offload if hardware supports it. To disable
+it, set ``devargs`` parameter ``disable-overlay=1``. For example::
+
+ -a 12:00.0,disable-overlay=1
+
+By default, the NIC uses 4789 and 6081 as the VXLAN and Geneve ports,
+respectively. The user may change them through
+``rte_eth_dev_udp_tunnel_port_{add,delete}``. However, as the current
+NIC has a single VXLAN port number and a single Geneve port number,
+the user cannot configure multiple port numbers for each tunnel type.
+
+Geneve offload support has evolved over VIC models. On older models,
+Geneve offload and advanced filters are mutually exclusive. This is
+enforced by UCSM and CIMC, which only allow one of the two features
+to be selected at one time. Newer VIC models do not have this restriction.
+
+Ingress VLAN Rewrite
+--------------------
+
+VIC adapters can tag, untag, or modify the VLAN headers of ingress
+packets. The ingress VLAN rewrite mode controls this behavior. By
+default, it is set to pass-through, where the NIC does not modify the
+VLAN header in any way so that the application can see the original
+header. This mode is sufficient for many applications, but may not be
+suitable for others. Such applications may change the mode by setting
+``devargs`` parameter ``ig-vlan-rewrite`` to one of the following.
+
+- ``pass``: Pass-through mode. The NIC does not modify the VLAN
+ header. This is the default mode.
+
+- ``priority``: Priority-tag default VLAN mode. If the ingress packet
+ is tagged with the default VLAN, the NIC replaces its VLAN header
+ with the priority tag (VLAN ID 0).
+
+- ``trunk``: Default trunk mode. The NIC tags untagged ingress packets
+ with the default VLAN. Tagged ingress packets are not modified. To
+ the application, every packet appears as tagged.
+
+- ``untag``: Untag default VLAN mode. If the ingress packet is tagged
+ with the default VLAN, the NIC removes or untags its VLAN header so
+ that the application sees an untagged packet. As a result, the
+ default VLAN becomes `untagged`. This mode can be useful for
+ applications such as OVS-DPDK performance benchmarks that utilize
+ only the default VLAN and want to see only untagged packets.
+
+
+Vectorized Rx Handler
+---------------------
+
+ENIC PMD includes a version of the receive handler that is vectorized using
+AVX2 SIMD instructions. It is meant for bulk, throughput oriented workloads
+where reducing cycles/packet in PMD is a priority. In order to use the
+vectorized handler, take the following steps.
+
+- Use a recent version of gcc, icc, or clang and build 64-bit DPDK. If
+ the compiler is known to support AVX2, DPDK build system
+ automatically compiles the vectorized handler. Otherwise, the
+ handler is not available.
+
+- Set ``devargs`` parameter ``enable-avx2-rx=1`` to explicitly request that
+ PMD consider the vectorized handler when selecting the receive handler.
+ For example::
+
+ -a 12:00.0,enable-avx2-rx=1
+
+ As the current implementation is intended for field trials, by default, the
+ vectorized handler is not considered (``enable-avx2-rx=0``).
+
+- Run on a UCS M4 or later server with CPUs that support AVX2.
+
+PMD selects the vectorized handler when the handler is compiled into
+the driver, the user requests its use via ``enable-avx2-rx=1``, CPU
+supports AVX2, and scatter Rx is not used. To verify that the
+vectorized handler is selected, enable debug logging
+(``--log-level=pmd,debug``) and check the following message.
+
+.. code-block:: console
+
+ enic_use_vector_rx_handler use the non-scatter avx2 Rx handler
+
+64B Completion Queue Entry
+--------------------------
+
+Recent VIC adapters support 64B completion queue entries, as well as
+16B entries that are available on all adapter models. ENIC PMD enables
+and uses 64B entries by default, if available. 64B entries generally
+lower CPU cycles per Rx packet, as they avoid partial DMA writes and
+reduce cache contention between DMA and polling CPU. The effect is
+most pronounced when multiple Rx queues are used on Intel platforms
+with Data Direct I/O Technology (DDIO).
+
+If 64B entries are not available, PMD uses 16B entries. The user may
+explicitly disable 64B entries and use 16B entries by setting
+``devarg`` parameter ``cq64=0``. For example::
+
+ -a 12:00.0,cq64=0
+
+To verify the selected entry size, enable debug logging
+(``--log-level=enic,debug``) and check the following messages.
+
+.. code-block:: console
+
+ PMD: rte_enic_pmd: Supported CQ entry sizes: 16 32
+ PMD: rte_enic_pmd: Using 16B CQ entry size
+