X-Git-Url: http://git.droids-corp.org/?a=blobdiff_plain;f=doc%2Fguides%2Fnics%2Fi40e.rst;h=e1b8083c1e32ba18b9ebe19aeaa7c9973011a0c1;hb=68d35933a960442a373346b76f2ad6de0eae27f9;hp=4d3c7ca0529867e6d40ae5fb06c750a5bc617a78;hpb=cdc073298185cbde40b89de756013a2c175d38bf;p=dpdk.git diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst index 4d3c7ca052..e1b8083c1e 100644 --- a/doc/guides/nics/i40e.rst +++ b/doc/guides/nics/i40e.rst @@ -1,32 +1,5 @@ -.. BSD LICENSE - Copyright(c) 2016 Intel Corporation. All rights reserved. - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of Intel Corporation nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2016 Intel Corporation. I40E Poll Mode Driver ====================== @@ -65,7 +38,8 @@ Features of the I40E PMD are: - Hot plug - IEEE1588/802.1AS timestamping - VF Daemon (VFD) - EXPERIMENTAL - +- Dynamic Device Personalization (DDP) +- Queue region configuration Prerequisites ------------- @@ -78,6 +52,8 @@ Prerequisites - To get better performance on Intel platforms, please follow the "How to get best performance with NICs on Intel platforms" section of the :ref:`Getting Started Guide for Linux `. +- Upgrade the NVM/FW version following the `Intel® Ethernet NVM Update Tool Quick Usage Guide for Linux + `_ if needed. Pre-Installation Configuration ------------------------------ @@ -113,10 +89,6 @@ Please note that enabling debugging options may affect system performance. Number of queues reserved for PF. -- ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF`` (default ``4``) - - Number of queues reserved for each SR-IOV VF. - - ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM`` (default ``4``) Number of queues reserved for each VMDQ Pool. @@ -126,6 +98,29 @@ Please note that enabling debugging options may affect system performance. Interrupt Throttling interval. +Runtime Config Options +~~~~~~~~~~~~~~~~~~~~~~ + +- ``Number of Queues per VF`` (default ``4``) + + The number of queue per VF is determined by its host PF. If the PCI address + of an i40e PF is aaaa:bb.cc, the number of queues per VF can be configured + with EAL parameter like -w aaaa:bb.cc,queue-num-per-vf=n. The value n can be + 1, 2, 4, 8 or 16. If no such parameter is configured, the number of queues + per VF is 4 by default. + +- ``Support multiple driver`` (default ``disable``) + + There was a multiple driver support issue during use of 700 series Ethernet + Adapter with both Linux kernel and DPDK PMD. To fix this issue, ``devargs`` + parameter ``support-multi-driver`` is introduced, for example:: + + -w 84:00.0,support-multi-driver=1 + + With the above configuration, DPDK PMD will not change global registers, and + will switch PF interrupt from IntN to Int0 to avoid interrupt conflict between + DPDK and Linux Kernel. + Driver compilation and testing ------------------------------ @@ -372,6 +367,75 @@ configuration passed on the EAL command line. The floating VEB functionality requires a NIC firmware version of 5.0 or greater. +Dynamic Device Personalization (DDP) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The Intel® Ethernet Controller X*710 support a feature called "Dynamic Device +Personalization (DDP)", which is used to configure hardware by downloading +a profile to support protocols/filters which are not supported by default. +The DDP functionality requires a NIC firmware version of 6.0 or greater. + +Current implementation supports MPLSoUDP/MPLSoGRE/GTP-C/GTP-U/PPPoE/PPPoL2TP, +steering can be used with rte_flow API. + +Load a profile which supports MPLSoUDP/MPLSoGRE and store backup profile: + +.. code-block:: console + + testpmd> ddp add 0 ./mpls.pkgo,./backup.pkgo + +Delete a MPLS profile and restore backup profile: + +.. code-block:: console + + testpmd> ddp del 0 ./backup.pkgo + +Get loaded DDP package info list: + +.. code-block:: console + + testpmd> ddp get list 0 + +Display information about a MPLS profile: + +.. code-block:: console + + testpmd> ddp get info ./mpls.pkgo + +Input set configuration +~~~~~~~~~~~~~~~~~~~~~~~ +Input set for any PCTYPE can be configured with user defined configuration, +For example, to use only 48bit prefix for IPv6 src address for IPv6 TCP RSS: + +.. code-block:: console + + testpmd> port config 0 pctype 43 hash_inset clear all + testpmd> port config 0 pctype 43 hash_inset set field 13 + testpmd> port config 0 pctype 43 hash_inset set field 14 + testpmd> port config 0 pctype 43 hash_inset set field 15 + +Queue region configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The Ethernet Controller X710/XL710 supports a feature of queue regions +configuration for RSS in the PF, so that different traffic classes or +different packet classification types can be separated to different +queues in different queue regions. There is an API for configuration +of queue regions in RSS with a command line. It can parse the parameters +of the region index, queue number, queue start index, user priority, traffic +classes and so on. Depending on commands from the command line, it will call +i40e private APIs and start the process of setting or flushing the queue +region configuration. As this feature is specific for i40e only private +APIs are used. These new ``test_pmd`` commands are as shown below. For +details please refer to :doc:`../testpmd_app_ug/index`. + +.. code-block:: console + + testpmd> set port (port_id) queue-region region_id (value) \ + queue_start_index (value) queue_num (value) + testpmd> set port (port_id) queue-region region_id (value) flowtype (value) + testpmd> set port (port_id) queue-region UP (value) region_id (value) + testpmd> set port (port_id) queue-region flush (on|off) + testpmd> show port (port_id) queue-region Limitations or Known issues --------------------------- @@ -396,23 +460,24 @@ used to classify MPLS packet by using a command in testpmd like: testpmd> ethertype_filter 0 add mac_ignr 00:00:00:00:00:00 ethertype \ 0x8847 fwd queue -16 Byte Descriptor cannot be used on DPDK VF -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If the Linux i40e kernel driver is used as host driver, while DPDK i40e PMD -is used as the VF driver, DPDK cannot choose 16 byte receive descriptor. That -is to say, user should keep ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n`` in -config file. - -Link down with i40e kernel driver after DPDK application exit -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -After DPDK application quit, and the device is bound back to Linux i40e -kernel driver, the link cannot be up after ``ifconfig up``. -To work around this issue, ``ethtool -s autoneg on`` should be -set first and then the link can be brought up through ``ifconfig up``. +16 Byte RX Descriptor setting on DPDK VF +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -NOTE: requires Linux kernel i40e driver version >= 1.4.X +Currently the VF's RX descriptor mode is decided by PF. There's no PF-VF +interface for VF to request the RX descriptor mode, also no interface to notify +VF its own RX descriptor mode. +For all available versions of the i40e driver, these drivers don't support 16 +byte RX descriptor. If the Linux i40e kernel driver is used as host driver, +while DPDK i40e PMD is used as the VF driver, DPDK cannot choose 16 byte receive +descriptor. The reason is that the RX descriptor is already set to 32 byte by +the i40e kernel driver. That is to say, user should keep +``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n`` in config file. +In the future, if the Linux i40e driver supports 16 byte RX descriptor, user +should make sure the DPDK VF uses the same RX descriptor mode, 16 byte or 32 +byte, as the PF driver. + +The same rule for DPDK PF + DPDK VF. The PF and VF should use the same RX +descriptor mode. Or the VF RX will not work. Receive packets with Ethertype 0x88A8 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -447,3 +512,131 @@ It means if APP has set the max bandwidth for that TC, it comes to no effect. It's suggested to set the strict priority mode for a TC that is latency sensitive but no consuming much bandwidth. + +VF performance is impacted by PCI extended tag setting +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To reach maximum NIC performance in the VF the PCI extended tag must be +enabled. The DPDK I40E PF driver will set this feature during initialization, +but the kernel PF driver does not. So when running traffic on a VF which is +managed by the kernel PF driver, a significant NIC performance downgrade has +been observed (for 64 byte packets, there is about 25% linerate downgrade for +a 25G device and about 35% for a 40G device). + +For kernel version >= 4.11, the kernel's PCI driver will enable the extended +tag if it detects that the device supports it. So by default, this is not an +issue. For kernels <= 4.11 or when the PCI extended tag is disabled it can be +enabled using the steps below. + +#. Get the current value of the PCI configure register:: + + setpci -s a8.w + +#. Set bit 8:: + + value = value | 0x100 + +#. Set the PCI configure register with new value:: + + setpci -s a8.w= + +Vlan strip of VF +~~~~~~~~~~~~~~~~ + +The VF vlan strip function is only supported in the i40e kernel driver >= 2.1.26. + +DCB function +~~~~~~~~~~~~ + +DCB works only when RSS is enabled. + +Global configuration warning +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +I40E PMD will set some global registers to enable some function or set some +configure. Then when using different ports of the same NIC with Linux kernel +and DPDK, the port with Linux kernel will be impacted by the port with DPDK. +For example, register I40E_GL_SWT_L2TAGCTRL is used to control L2 tag, i40e +PMD uses I40E_GL_SWT_L2TAGCTRL to set vlan TPID. If setting TPID in port A +with DPDK, then the configuration will also impact port B in the NIC with +kernel driver, which don't want to use the TPID. +So PMD reports warning to clarify what is changed by writing global register. + +High Performance of Small Packets on 40G NIC +-------------------------------------------- + +As there might be firmware fixes for performance enhancement in latest version +of firmware image, the firmware update might be needed for getting high performance. +Check with the local Intel's Network Division application engineers for firmware updates. +Users should consult the release notes specific to a DPDK release to identify +the validated firmware version for a NIC using the i40e driver. + +Use 16 Bytes RX Descriptor Size +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As i40e PMD supports both 16 and 32 bytes RX descriptor sizes, and 16 bytes size can provide helps to high performance of small packets. +Configuration of ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC`` in config files can be changed to use 16 bytes size RX descriptors. + +High Performance and per Packet Latency Tradeoff +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Due to the hardware design, the interrupt signal inside NIC is needed for per +packet descriptor write-back. The minimum interval of interrupts could be set +at compile time by ``CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL`` in configuration files. +Though there is a default configuration, the interval could be tuned by the +users with that configuration item depends on what the user cares about more, +performance or per packet latency. + +Example of getting best performance with l3fwd example +------------------------------------------------------ + +The following is an example of running the DPDK ``l3fwd`` sample application to get high performance with an +Intel server platform and Intel XL710 NICs. + +The example scenario is to get best performance with two Intel XL710 40GbE ports. +See :numref:`figure_intel_perf_test_setup` for the performance test setup. + +.. _figure_intel_perf_test_setup: + +.. figure:: img/intel_perf_test_setup.* + + Performance Test Setup + + +1. Add two Intel XL710 NICs to the platform, and use one port per card to get best performance. + The reason for using two NICs is to overcome a PCIe Gen3's limitation since it cannot provide 80G bandwidth + for two 40G ports, but two different PCIe Gen3 x8 slot can. + Refer to the sample NICs output above, then we can select ``82:00.0`` and ``85:00.0`` as test ports:: + + 82:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583] + 85:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583] + +2. Connect the ports to the traffic generator. For high speed testing, it's best to use a hardware traffic generator. + +3. Check the PCI devices numa node (socket id) and get the cores number on the exact socket id. + In this case, ``82:00.0`` and ``85:00.0`` are both in socket 1, and the cores on socket 1 in the referenced platform + are 18-35 and 54-71. + Note: Don't use 2 logical cores on the same core (e.g core18 has 2 logical cores, core18 and core54), instead, use 2 logical + cores from different cores (e.g core18 and core19). + +4. Bind these two ports to igb_uio. + +5. As to XL710 40G port, we need at least two queue pairs to achieve best performance, then two queues per port + will be required, and each queue pair will need a dedicated CPU core for receiving/transmitting packets. + +6. The DPDK sample application ``l3fwd`` will be used for performance testing, with using two ports for bi-directional forwarding. + Compile the ``l3fwd sample`` with the default lpm mode. + +7. The command line of running l3fwd would be something like the following:: + + ./l3fwd -l 18-21 -n 4 -w 82:00.0 -w 85:00.0 \ + -- -p 0x3 --config '(0,0,18),(0,1,19),(1,0,20),(1,1,21)' + + This means that the application uses core 18 for port 0, queue pair 0 forwarding, core 19 for port 0, queue pair 1 forwarding, + core 20 for port 1, queue pair 0 forwarding, and core 21 for port 1, queue pair 1 forwarding. + +8. Configure the traffic at a traffic generator. + + * Start creating a stream on packet generator. + + * Set the Ethernet II type to 0x0800.