From: Thomas Monjalon Date: Sat, 31 Jan 2015 22:06:06 +0000 (+0100) Subject: doc: nics guide X-Git-Tag: spdx-start~9365 X-Git-Url: http://git.droids-corp.org/?a=commitdiff_plain;h=972e365bfe7a;p=dpdk.git doc: nics guide Create nics guide by moving chapters about Intel and Mellanox NICs. Signed-off-by: Thomas Monjalon Acked-by: Siobhan Butler --- diff --git a/MAINTAINERS b/MAINTAINERS index 7b67c5bdbf..d75944e747 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -213,15 +213,20 @@ F: lib/librte_pmd_enic/ Intel e1000 F: lib/librte_pmd_e1000/ +F: doc/guides/nics/e1000em.rst +F: doc/guides/nics/intel_vf.rst Intel ixgbe M: Helin Zhang M: Konstantin Ananyev F: lib/librte_pmd_ixgbe/ +F: doc/guides/nics/ixgbe.rst +F: doc/guides/nics/intel_vf.rst Intel i40e M: Helin Zhang F: lib/librte_pmd_i40e/ +F: doc/guides/nics/intel_vf.rst Intel fm10k M: Jing Chen @@ -230,12 +235,12 @@ F: lib/librte_pmd_fm10k/ Mellanox mlx4 M: Adrien Mazarguil F: lib/librte_pmd_mlx4/ -F: doc/guides/prog_guide/mlx4_poll_mode_drv.rst +F: doc/guides/nics/mlx4.rst RedHat virtio M: Changchun Ouyang F: lib/librte_pmd_virtio/ -F: doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst +F: doc/guides/nics/virtio.rst F: lib/librte_vhost/ F: doc/guides/prog_guide/vhost_lib.rst F: examples/vhost/ @@ -244,18 +249,18 @@ F: doc/guides/sample_app_ug/vhost.rst VMware vmxnet3 M: Yong Wang F: lib/librte_pmd_vmxnet3/ -F: doc/guides/prog_guide/poll_mode_drv_paravirtual_vmxnets_nic.rst +F: doc/guides/nics/vmxnet3.rst PCAP PMD M: Nicolás Pernas Maradei M: John McNamara F: lib/librte_pmd_pcap/ -F: doc/guides/prog_guide/libpcap_ring_based_poll_mode_drv.rst +F: doc/guides/nics/pcap_ring.rst Ring PMD M: Bruce Richardson F: lib/librte_pmd_ring/ -F: doc/guides/prog_guide/ring_lib.rst +F: doc/guides/nics/pcap_ring.rst F: app/test/test_pmd_ring.c Null PMD diff --git a/doc/guides/index.rst b/doc/guides/index.rst index 53f1be1e38..44e8432b41 100644 --- a/doc/guides/index.rst +++ b/doc/guides/index.rst @@ -41,6 +41,7 @@ Contents: freebsd_gsg/index xen/index prog_guide/index + nics/index sample_app_ug/index testpmd_app_ug/index rel_notes/index diff --git a/doc/guides/nics/e1000em.rst b/doc/guides/nics/e1000em.rst new file mode 100644 index 0000000000..fedafad412 --- /dev/null +++ b/doc/guides/nics/e1000em.rst @@ -0,0 +1,178 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Driver for VM Emulated Devices +============================== + +The DPDK EM poll mode driver supports the following emulated devices: + +* qemu-kvm emulated Intel® 82540EM Gigabit Ethernet Controller (qemu e1000 device) + +* VMware* emulated Intel® 82545EM Gigabit Ethernet Controller + +* VMware emulated Intel® 8274L Gigabit Ethernet Controller. + +Validated Hypervisors +--------------------- + +The validated hypervisors are: + +* KVM (Kernel Virtual Machine) with Qemu, version 0.14.0 + +* KVM (Kernel Virtual Machine) with Qemu, version 0.15.1 + +* VMware ESXi 5.0, Update 1 + +Recommended Guest Operating System in Virtual Machine +----------------------------------------------------- + +The recommended guest operating system in a virtualized environment is: + +* Fedora* 18 (64-bit) + +For supported kernel versions, refer to the *DPDK Release Notes*. + +Setting Up a KVM Virtual Machine +-------------------------------- + +The following describes a target environment: + +* Host Operating System: Fedora 14 + +* Hypervisor: KVM (Kernel Virtual Machine) with Qemu version, 0.14.0 + +* Guest Operating System: Fedora 14 + +* Linux Kernel Version: Refer to the DPDK Getting Started Guide + +* Target Applications: testpmd + +The setup procedure is as follows: + +#. Download qemu-kvm-0.14.0 from + `http://sourceforge.net/projects/kvm/files/qemu-kvm/ `_ + and install it in the Host OS using the following steps: + + When using a recent kernel (2.6.25+) with kvm modules included: + + .. code-block:: console + + tar xzf qemu-kvm-release.tar.gz cd qemu-kvm-release + ./configure --prefix=/usr/local/kvm + make + sudo make install + sudo /sbin/modprobe kvm-intel + + When using an older kernel or a kernel from a distribution without the kvm modules, + you must download (from the same link), compile and install the modules yourself: + + .. code-block:: console + + tar xjf kvm-kmod-release.tar.bz2 + cd kvm-kmod-release + ./configure + make + sudo make install + sudo /sbin/modprobe kvm-intel + + Note that qemu-kvm installs in the /usr/local/bin directory. + + For more details about KVM configuration and usage, please refer to: + `http://www.linux-kvm.org/page/HOWTO1 `_. + +#. Create a Virtual Machine and install Fedora 14 on the Virtual Machine. + This is referred to as the Guest Operating System (Guest OS). + +#. Start the Virtual Machine with at least one emulated e1000 device. + + .. note:: + + The Qemu provides several choices for the emulated network device backend. + Most commonly used is a TAP networking backend that uses a TAP networking device in the host. + For more information about Qemu supported networking backends and different options for configuring networking at Qemu, + please refer to: + + — `http://www.linux-kvm.org/page/Networking `_ + + — `http://wiki.qemu.org/Documentation/Networking `_ + + — `http://qemu.weilnetz.de/qemu-doc.html `_ + + For example, to start a VM with two emulated e1000 devices, issue the following command: + + .. code-block:: console + + /usr/local/kvm/bin/qemu-system-x86_64 -cpu host -smp 4 -hda qemu1.raw -m 1024 + -net nic,model=e1000,vlan=1,macaddr=DE:AD:1E:00:00:01 + -net tap,vlan=1,ifname=tapvm01,script=no,downscript=no + -net nic,model=e1000,vlan=2,macaddr=DE:AD:1E:00:00:02 + -net tap,vlan=2,ifname=tapvm02,script=no,downscript=no + + where: + + — -m = memory to assign + + — -smp = number of smp cores + + — -hda = virtual disk image + + This command starts a new virtual machine with two emulated 82540EM devices, + backed up with two TAP networking host interfaces, tapvm01 and tapvm02. + + .. code-block:: console + + # ip tuntap show + tapvm01: tap + tapvm02: tap + +#. Configure your TAP networking interfaces using ip/ifconfig tools. + +#. Log in to the guest OS and check that the expected emulated devices exist: + + .. code-block:: console + + # lspci -d 8086:100e + 00:04.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) + 00:05.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) + +#. Install the DPDK and run testpmd. + +Known Limitations of Emulated Devices +------------------------------------- + +The following are known limitations: + +#. The Qemu e1000 RX path does not support multiple descriptors/buffers per packet. + Therefore, rte_mbuf should be big enough to hold the whole packet. + For example, to allow testpmd to receive jumbo frames, use the following: + + testpmd [options] -- --mbuf-size= + +#. Qemu e1000 does not validate the checksum of incoming packets. diff --git a/doc/guides/nics/img/console.png b/doc/guides/nics/img/console.png new file mode 100644 index 0000000000..99423340d5 Binary files /dev/null and b/doc/guides/nics/img/console.png differ diff --git a/doc/guides/nics/img/fast_pkt_proc.png b/doc/guides/nics/img/fast_pkt_proc.png new file mode 100644 index 0000000000..48d57e5c40 Binary files /dev/null and b/doc/guides/nics/img/fast_pkt_proc.png differ diff --git a/doc/guides/nics/img/forward_stats.png b/doc/guides/nics/img/forward_stats.png new file mode 100644 index 0000000000..23e35325c1 Binary files /dev/null and b/doc/guides/nics/img/forward_stats.png differ diff --git a/doc/guides/nics/img/host_vm_comms.png b/doc/guides/nics/img/host_vm_comms.png new file mode 100644 index 0000000000..4e0b3c96c9 Binary files /dev/null and b/doc/guides/nics/img/host_vm_comms.png differ diff --git a/doc/guides/nics/img/host_vm_comms_qemu.png b/doc/guides/nics/img/host_vm_comms_qemu.png new file mode 100644 index 0000000000..391a4eac27 Binary files /dev/null and b/doc/guides/nics/img/host_vm_comms_qemu.png differ diff --git a/doc/guides/nics/img/inter_vm_comms.png b/doc/guides/nics/img/inter_vm_comms.png new file mode 100644 index 0000000000..6d85ece737 Binary files /dev/null and b/doc/guides/nics/img/inter_vm_comms.png differ diff --git a/doc/guides/nics/img/perf_benchmark.png b/doc/guides/nics/img/perf_benchmark.png new file mode 100644 index 0000000000..aba818c300 Binary files /dev/null and b/doc/guides/nics/img/perf_benchmark.png differ diff --git a/doc/guides/nics/img/single_port_nic.png b/doc/guides/nics/img/single_port_nic.png new file mode 100644 index 0000000000..8f39d73b25 Binary files /dev/null and b/doc/guides/nics/img/single_port_nic.png differ diff --git a/doc/guides/nics/img/vm_vm_comms.png b/doc/guides/nics/img/vm_vm_comms.png new file mode 100644 index 0000000000..2bf1cd2729 Binary files /dev/null and b/doc/guides/nics/img/vm_vm_comms.png differ diff --git a/doc/guides/nics/img/vmxnet3_int.png b/doc/guides/nics/img/vmxnet3_int.png new file mode 100644 index 0000000000..6541feba73 Binary files /dev/null and b/doc/guides/nics/img/vmxnet3_int.png differ diff --git a/doc/guides/nics/img/vswitch_vm.png b/doc/guides/nics/img/vswitch_vm.png new file mode 100644 index 0000000000..ac817aaaed Binary files /dev/null and b/doc/guides/nics/img/vswitch_vm.png differ diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst new file mode 100644 index 0000000000..aadbae36c8 --- /dev/null +++ b/doc/guides/nics/index.rst @@ -0,0 +1,63 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Network Interface Controller Drivers +==================================== + +|today| + + +**Contents** + +.. toctree:: + :maxdepth: 3 + :numbered: + + e1000em + ixgbe + intel_vf + mlx4 + virtio + vmxnet3 + pcap_ring + +**Figures** + +:ref:`Figure 1. Virtualization for a Single Port NIC in SR-IOV Mode ` + +:ref:`Figure 2. SR-IOV Performance Benchmark Setup ` + +:ref:`Figure 3. Fast Host-based Packet Processing ` + +:ref:`Figure 4. SR-IOV Inter-VM Communication ` + +:ref:`Figure 5. Virtio Host2VM Communication Example Using KNI vhost Back End ` + +:ref:`Figure 6. Virtio Host2VM Communication Example Using Qemu vhost Back End ` diff --git a/doc/guides/nics/intel_vf.rst b/doc/guides/nics/intel_vf.rst new file mode 100644 index 0000000000..0c0b7b5ce1 --- /dev/null +++ b/doc/guides/nics/intel_vf.rst @@ -0,0 +1,567 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +I40E/IXGBE/IGB Virtual Function Driver +====================================== + +Supported Intel® Ethernet Controllers (see the *DPDK Release Notes* for details) +support the following modes of operation in a virtualized environment: + +* **SR-IOV mode**: Involves direct assignment of part of the port resources to different guest operating systems + using the PCI-SIG Single Root I/O Virtualization (SR IOV) standard, + also known as "native mode" or "pass-through" mode. + In this chapter, this mode is referred to as IOV mode. + +* **VMDq mode**: Involves central management of the networking resources by an IO Virtual Machine (IOVM) or + a Virtual Machine Monitor (VMM), also known as software switch acceleration mode. + In this chapter, this mode is referred to as the Next Generation VMDq mode. + +SR-IOV Mode Utilization in a DPDK Environment +--------------------------------------------- + +The DPDK uses the SR-IOV feature for hardware-based I/O sharing in IOV mode. +Therefore, it is possible to partition SR-IOV capability on Ethernet controller NIC resources logically and +expose them to a virtual machine as a separate PCI function called a "Virtual Function". +Refer to Figure 10. + +Therefore, a NIC is logically distributed among multiple virtual machines (as shown in Figure 10), +while still having global data in common to share with the Physical Function and other Virtual Functions. +The DPDK fm10kvf, i40evf, igbvf or ixgbevf as a Poll Mode Driver (PMD) serves for the Intel® 82576 Gigabit Ethernet Controller, +Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller NIC, +Intel® Fortville 10/40 Gigabit Ethernet Controller NIC's virtual PCI function,or PCIE host-interface of the Intel Ethernet Switch +FM10000 Series. +Meanwhile the DPDK Poll Mode Driver (PMD) also supports "Physical Function" of such NIC's on the host. + +The DPDK PF/VF Poll Mode Driver (PMD) supports the Layer 2 switch on Intel® 82576 Gigabit Ethernet Controller, +Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller, +and Intel® Fortville 10/40 Gigabit Ethernet Controller NICs so that guest can choose it for inter virtual machine traffic in SR-IOV mode. + +For more detail on SR-IOV, please refer to the following documents: + +* `SR-IOV provides hardware based I/O sharing `_ + +* `PCI-SIG-Single Root I/O Virtualization Support on IA + `_ + +* `Scalable I/O Virtualized Servers `_ + +.. _nic_figure_1: + +**Figure 1. Virtualization for a Single Port NIC in SR-IOV Mode** + +.. image:: img/single_port_nic.* + +Physical and Virtual Function Infrastructure +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following describes the Physical Function and Virtual Functions infrastructure for the supported Ethernet Controller NICs. + +Virtual Functions operate under the respective Physical Function on the same NIC Port and therefore have no access +to the global NIC resources that are shared between other functions for the same NIC port. + +A Virtual Function has basic access to the queue resources and control structures of the queues assigned to it. +For global resource access, a Virtual Function has to send a request to the Physical Function for that port, +and the Physical Function operates on the global resources on behalf of the Virtual Function. +For this out-of-band communication, an SR-IOV enabled NIC provides a memory buffer for each Virtual Function, +which is called a "Mailbox". + +The PCIE host-interface of Intel Ethernet Switch FM10000 Series VF infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In a virtualized environment, the programmer can enable a maximum of *64 Virtual Functions (VF)* +globally per PCIE host-interface of the Intel Ethernet Switch FM10000 Series device. +Each VF can have a maximum of 16 queue pairs. +The Physical Function in host could be only configured by the Linux* fm10k driver +(in the case of the Linux Kernel-based Virtual Machine [KVM]), DPDK PMD PF driver doesn't support it yet. + +For example, + +* Using Linux* fm10k driver: + + .. code-block:: console + + rmmod fm10k (To remove the fm10k module) + insmod fm0k.ko max_vfs=2,2 (To enable two Virtual Functions per port) + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. +However: + +* Virtual Functions 0 and 2 belong to Physical Function 0 + +* Virtual Functions 1 and 3 belong to Physical Function 1 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + +Intel® Fortville 10/40 Gigabit Ethernet Controller VF Infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In a virtualized environment, the programmer can enable a maximum of *128 Virtual Functions (VF)* +globally per Intel® Fortville 10/40 Gigabit Ethernet Controller NIC device. +Each VF can have a maximum of 16 queue pairs. +The Physical Function in host could be either configured by the Linux* i40e driver +(in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver. +When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application. + +For example, + +* Using Linux* i40e driver: + + .. code-block:: console + + rmmod i40e (To remove the i40e module) + insmod i40e.ko max_vfs=2,2 (To enable two Virtual Functions per port) + +* Using the DPDK PMD PF i40e driver: + + Kernel Params: iommu=pt, intel_iommu=on + + .. code-block:: console + + modprobe uio + insmod igb_uio + ./dpdk_nic_bind.py -b igb_uio bb:ss.f + echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device) + + Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. +However: + +* Virtual Functions 0 and 2 belong to Physical Function 0 + +* Virtual Functions 1 and 3 belong to Physical Function 1 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + +Intel® 82599 10 Gigabit Ethernet Controller VF Infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The programmer can enable a maximum of *63 Virtual Functions* and there must be *one Physical Function* per Intel® 82599 +10 Gigabit Ethernet Controller NIC port. +The reason for this is that the device allows for a maximum of 128 queues per port and a virtual/physical function has to +have at least one queue pair (RX/TX). +The current implementation of the DPDK ixgbevf driver supports a single queue pair (RX/TX) per Virtual Function. +The Physical Function in host could be either configured by the Linux* ixgbe driver +(in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver. +When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application. + +For example, + +* Using Linux* ixgbe driver: + + .. code-block:: console + + rmmod ixgbe (To remove the ixgbe module) + insmod ixgbe max_vfs=2,2 (To enable two Virtual Functions per port) + +* Using the DPDK PMD PF ixgbe driver: + + Kernel Params: iommu=pt, intel_iommu=on + + .. code-block:: console + + modprobe uio + insmod igb_uio + ./dpdk_nic_bind.py -b igb_uio bb:ss.f + echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device) + + Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. +However: + +* Virtual Functions 0 and 2 belong to Physical Function 0 + +* Virtual Functions 1 and 3 belong to Physical Function 1 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + +Intel® 82576 Gigabit Ethernet Controller and Intel® Ethernet Controller I350 Family VF Infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In a virtualized environment, an Intel® 82576 Gigabit Ethernet Controller serves up to eight virtual machines (VMs). +The controller has 16 TX and 16 RX queues. +They are generally referred to (or thought of) as queue pairs (one TX and one RX queue). +This gives the controller 16 queue pairs. + +A pool is a group of queue pairs for assignment to the same VF, used for transmit and receive operations. +The controller has eight pools, with each pool containing two queue pairs, that is, two TX and two RX queues assigned to each VF. + +In a virtualized environment, an Intel® Ethernet Controller I350 family device serves up to eight virtual machines (VMs) per port. +The eight queues can be accessed by eight different VMs if configured correctly (the i350 has 4x1GbE ports each with 8T X and 8 RX queues), +that means, one Transmit and one Receive queue assigned to each VF. + +For example, + +* Using Linux* igb driver: + + .. code-block:: console + + rmmod igb (To remove the igb module) + insmod igb max_vfs=2,2 (To enable two Virtual Functions per port) + +* Using Intel® DPDK PMD PF igb driver: + + Kernel Params: iommu=pt, intel_iommu=on modprobe uio + + .. code-block:: console + + insmod igb_uio + ./dpdk_nic_bind.py -b igb_uio bb:ss.f + echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific pci device) + + Launch DPDK testpmd/example or your own host daemon application using the DPDK PMD library. + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a four-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence, starting from 0 to 7. +However: + +* Virtual Functions 0 and 4 belong to Physical Function 0 + +* Virtual Functions 1 and 5 belong to Physical Function 1 + +* Virtual Functions 2 and 6 belong to Physical Function 2 + +* Virtual Functions 3 and 7 belong to Physical Function 3 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + +Validated Hypervisors +~~~~~~~~~~~~~~~~~~~~~ + +The validated hypervisor is: + +* KVM (Kernel Virtual Machine) with Qemu, version 0.14.0 + +However, the hypervisor is bypassed to configure the Virtual Function devices using the Mailbox interface, +the solution is hypervisor-agnostic. +Xen* and VMware* (when SR- IOV is supported) will also be able to support the DPDK with Virtual Function driver support. + +Expected Guest Operating System in Virtual Machine +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The expected guest operating systems in a virtualized environment are: + +* Fedora* 14 (64-bit) + +* Ubuntu* 10.04 (64-bit) + +For supported kernel versions, refer to the *DPDK Release Notes*. + +Setting Up a KVM Virtual Machine Monitor +---------------------------------------- + +The following describes a target environment: + +* Host Operating System: Fedora 14 + +* Hypervisor: KVM (Kernel Virtual Machine) with Qemu version 0.14.0 + +* Guest Operating System: Fedora 14 + +* Linux Kernel Version: Refer to the *DPDK Getting Started Guide* + +* Target Applications: l2fwd, l3fwd-vf + +The setup procedure is as follows: + +#. Before booting the Host OS, open **BIOS setup** and enable **Intel® VT features**. + +#. While booting the Host OS kernel, pass the intel_iommu=on kernel command line argument using GRUB. + When using DPDK PF driver on host, pass the iommu=pt kernel command line argument in GRUB. + +#. Download qemu-kvm-0.14.0 from + `http://sourceforge.net/projects/kvm/files/qemu-kvm/ `_ + and install it in the Host OS using the following steps: + + When using a recent kernel (2.6.25+) with kvm modules included: + + .. code-block:: console + + tar xzf qemu-kvm-release.tar.gz + cd qemu-kvm-release + ./configure --prefix=/usr/local/kvm + make + sudo make install + sudo /sbin/modprobe kvm-intel + + When using an older kernel, or a kernel from a distribution without the kvm modules, + you must download (from the same link), compile and install the modules yourself: + + .. code-block:: console + + tar xjf kvm-kmod-release.tar.bz2 + cd kvm-kmod-release + ./configure + make + sudo make install + sudo /sbin/modprobe kvm-intel + + qemu-kvm installs in the /usr/local/bin directory. + + For more details about KVM configuration and usage, please refer to: + + `http://www.linux-kvm.org/page/HOWTO1 `_. + +#. Create a Virtual Machine and install Fedora 14 on the Virtual Machine. + This is referred to as the Guest Operating System (Guest OS). + +#. Download and install the latest ixgbe driver from: + + `http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=14687 `_ + +#. In the Host OS + + When using Linux kernel ixgbe driver, unload the Linux ixgbe driver and reload it with the max_vfs=2,2 argument: + + .. code-block:: console + + rmmod ixgbe + modprobe ixgbe max_vfs=2,2 + + When using DPDK PMD PF driver, insert DPDK kernel module igb_uio and set the number of VF by sysfs max_vfs: + + .. code-block:: console + + modprobe uio + insmod igb_uio + ./dpdk_nic_bind.py -b igb_uio 02:00.0 02:00.1 0e:00.0 0e:00.1 + echo 2 > /sys/bus/pci/devices/0000\:02\:00.0/max_vfs + echo 2 > /sys/bus/pci/devices/0000\:02\:00.1/max_vfs + echo 2 > /sys/bus/pci/devices/0000\:0e\:00.0/max_vfs + echo 2 > /sys/bus/pci/devices/0000\:0e\:00.1/max_vfs + + .. note:: + + You need to explicitly specify number of vfs for each port, for example, + in the command above, it creates two vfs for the first two ixgbe ports. + + Let say we have a machine with four physical ixgbe ports: + + + 0000:02:00.0 + + 0000:02:00.1 + + 0000:0e:00.0 + + 0000:0e:00.1 + + The command above creates two vfs for device 0000:02:00.0: + + .. code-block:: console + + ls -alrt /sys/bus/pci/devices/0000\:02\:00.0/virt* + lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/virtfn1 -> ../0000:02:10.2 + lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/virtfn0 -> ../0000:02:10.0 + + It also creates two vfs for device 0000:02:00.1: + + .. code-block:: console + + ls -alrt /sys/bus/pci/devices/0000\:02\:00.1/virt* + lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/virtfn1 -> ../0000:02:10.3 + lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/virtfn0 -> ../0000:02:10.1 + +#. List the PCI devices connected and notice that the Host OS shows two Physical Functions (traditional ports) + and four Virtual Functions (two for each port). + This is the result of the previous step. + +#. Insert the pci_stub module to hold the PCI devices that are freed from the default driver using the following command + (see http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM Section 4 for more information): + + .. code-block:: console + + sudo /sbin/modprobe pci-stub + + Unbind the default driver from the PCI devices representing the Virtual Functions. + A script to perform this action is as follows: + + .. code-block:: console + + echo "8086 10ed" > /sys/bus/pci/drivers/pci-stub/new_id + echo 0000:08:10.0 > /sys/bus/pci/devices/0000:08:10.0/driver/unbind + echo 0000:08:10.0 > /sys/bus/pci/drivers/pci-stub/bind + + where, 0000:08:10.0 belongs to the Virtual Function visible in the Host OS. + +#. Now, start the Virtual Machine by running the following command: + + .. code-block:: console + + /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0 + + where: + + — -m = memory to assign + + — -smp = number of smp cores + + — -boot = boot option + + — -hda = virtual disk image + + — -device = device to attach + + .. note:: + + — The pci-assign,host=08:10.0 alue indicates that you want to attach a PCI device + to a Virtual Machine and the respective (Bus:Device.Function) + numbers should be passed for the Virtual Function to be attached. + + — qemu-kvm-0.14.0 allows a maximum of four PCI devices assigned to a VM, + but this is qemu-kvm version dependent since qemu-kvm-0.14.1 allows a maximum of five PCI devices. + + — qemu-system-x86_64 also has a -cpu command line option that is used to select the cpu_model + to emulate in a Virtual Machine. Therefore, it can be used as: + + .. code-block:: console + + /usr/local/kvm/bin/qemu-system-x86_64 -cpu ? + + (to list all available cpu_models) + + /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -cpu host -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0 + + (to use the same cpu_model equivalent to the host cpu) + + For more information, please refer to: `http://wiki.qemu.org/Features/CPUModels `_. + +#. Install and run DPDK host app to take over the Physical Function. Eg. + + .. code-block:: console + + make install T=x86_64-native-linuxapp-gcc + ./x86_64-native-linuxapp-gcc/app/testpmd -c f -n 4 -- -i + +#. Finally, access the Guest OS using vncviewer with the localhost:5900 port and check the lspci command output in the Guest OS. + The virtual functions will be listed as available for use. + +#. Configure and install the DPDK with an x86_64-native-linuxapp-gcc configuration on the Guest OS as normal, + that is, there is no change to the normal installation procedure. + + .. code-block:: console + + make config T=x86_64-native-linuxapp-gcc O=x86_64-native-linuxapp-gcc + cd x86_64-native-linuxapp-gcc + make + +.. note:: + + If you are unable to compile the DPDK and you are getting "error: CPU you selected does not support x86-64 instruction set", + power off the Guest OS and start the virtual machine with the correct -cpu option in the qemu- system-x86_64 command as shown in step 9. + You must select the best x86_64 cpu_model to emulate or you can select host option if available. + +.. note:: + + Run the DPDK l2fwd sample application in the Guest OS with Hugepages enabled. + For the expected benchmark performance, you must pin the cores from the Guest OS to the Host OS (taskset can be used to do this) and + you must also look at the PCI Bus layout on the board to ensure you are not running the traffic over the QPI Inteface. + +.. note:: + + * The Virtual Machine Manager (the Fedora package name is virt-manager) is a utility for virtual machine management + that can also be used to create, start, stop and delete virtual machines. + If this option is used, step 2 and 6 in the instructions provided will be different. + + * virsh, a command line utility for virtual machine management, + can also be used to bind and unbind devices to a virtual machine in Ubuntu. + If this option is used, step 6 in the instructions provided will be different. + + * The Virtual Machine Monitor (see Figure 11) is equivalent to a Host OS with KVM installed as described in the instructions. + +.. _nic_figure_2: + +**Figure 2. Performance Benchmark Setup** + +.. image:: img/perf_benchmark.* + +DPDK SR-IOV PMD PF/VF Driver Usage Model +---------------------------------------- + +Fast Host-based Packet Processing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Software Defined Network (SDN) trends are demanding fast host-based packet handling. +In a virtualization environment, +the DPDK VF PMD driver performs the same throughput result as a non-VT native environment. + +With such host instance fast packet processing, lots of services such as filtering, QoS, +DPI can be offloaded on the host fast path. + +Figure 12 shows the scenario where some VMs directly communicate externally via a VFs, +while others connect to a virtual switch and share the same uplink bandwidth. + +.. _nic_figure_3: + +**Figure 3. Fast Host-based Packet Processing** + +.. image:: img/fast_pkt_proc.* + +SR-IOV (PF/VF) Approach for Inter-VM Communication +-------------------------------------------------- + +Inter-VM data communication is one of the traffic bottle necks in virtualization platforms. +SR-IOV device assignment helps a VM to attach the real device, taking advantage of the bridge in the NIC. +So VF-to-VF traffic within the same physical port (VM0<->VM1) have hardware acceleration. +However, when VF crosses physical ports (VM0<->VM2), there is no such hardware bridge. +In this case, the DPDK PMD PF driver provides host forwarding between such VMs. + +Figure 13 shows an example. +In this case an update of the MAC address lookup tables in both the NIC and host DPDK application is required. + +In the NIC, writing the destination of a MAC address belongs to another cross device VM to the PF specific pool. +So when a packet comes in, its destination MAC address will match and forward to the host DPDK PMD application. + +In the host DPDK application, the behavior is similar to L2 forwarding, +that is, the packet is forwarded to the correct PF pool. +The SR-IOV NIC switch forwards the packet to a specific VM according to the MAC destination address +which belongs to the destination VF on the VM. + +.. _nic_figure_4: + +**Figure 4. Inter-VM Communication** + +.. image:: img/inter_vm_comms.* diff --git a/doc/guides/nics/ixgbe.rst b/doc/guides/nics/ixgbe.rst new file mode 100644 index 0000000000..d2c124ec48 --- /dev/null +++ b/doc/guides/nics/ixgbe.rst @@ -0,0 +1,184 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +IXGBE Driver +============ + +Vector PMD for IXGBE +-------------------- + +Vector PMD uses Intel® SIMD instructions to optimize packet I/O. +It improves load/store bandwidth efficiency of L1 data cache by using a wider SSE/AVX register 1 (1). +The wider register gives space to hold multiple packet buffers so as to save instruction number when processing bulk of packets. + +There is no change to PMD API. The RX/TX handler are the only two entries for vPMD packet I/O. +They are transparently registered at runtime RX/TX execution if all condition checks pass. + +1. To date, only an SSE version of IX GBE vPMD is available. + To ensure that vPMD is in the binary code, ensure that the option CONFIG_RTE_IXGBE_INC_VECTOR=y is in the configure file. + +Some constraints apply as pre-conditions for specific optimizations on bulk packet transfers. +The following sections explain RX and TX constraints in the vPMD. + +RX Constraints +~~~~~~~~~~~~~~ + +Prerequisites and Pre-conditions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following prerequisites apply: + +* To enable vPMD to work for RX, bulk allocation for Rx must be allowed. + +* The RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y configuration MACRO must be set before compiling the code. + +Ensure that the following pre-conditions are satisfied: + +* rxq->rx_free_thresh >= RTE_PMD_IXGBE_RX_MAX_BURST + +* rxq->rx_free_thresh < rxq->nb_rx_desc + +* (rxq->nb_rx_desc % rxq->rx_free_thresh) == 0 + +* rxq->nb_rx_desc < (IXGBE_MAX_RING_DESC - RTE_PMD_IXGBE_RX_MAX_BURST) + +These conditions are checked in the code. + +Scattered packets are not supported in this mode. +If an incoming packet is greater than the maximum acceptable length of one "mbuf" data size (by default, the size is 2 KB), +vPMD for RX would be disabled. + +By default, IXGBE_MAX_RING_DESC is set to 4096 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32. + +Feature not Supported by RX Vector PMD +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Some features are not supported when trying to increase the throughput in vPMD. +They are: + +* IEEE1588 + +* FDIR + +* Header split + +* RX checksum off load + +Other features are supported using optional MACRO configuration. They include: + +* HW VLAN strip + +* HW extend dual VLAN + +* Enabled by RX_OLFLAGS (RTE_IXGBE_RX_OLFLAGS_DISABLE=n) + + +To guarantee the constraint, configuration flags in dev_conf.rxmode will be checked: + +* hw_vlan_strip + +* hw_vlan_extend + +* hw_ip_checksum + +* header_split + +* dev_conf + +fdir_conf->mode will also be checked. + +RX Burst Size +^^^^^^^^^^^^^ + +As vPMD is focused on high throughput, it assumes that the RX burst size is equal to or greater than 32 per burst. +It returns zero if using nb_pkt < 32 as the expected packet number in the receive handler. + +TX Constraint +~~~~~~~~~~~~~ + +Prerequisite +^^^^^^^^^^^^ + +The only prerequisite is related to tx_rs_thresh. +The tx_rs_thresh value must be greater than or equal to RTE_PMD_IXGBE_TX_MAX_BURST, +but less or equal to RTE_IXGBE_TX_MAX_FREE_BUF_SZ. +Consequently, by default the tx_rs_thresh value is in the range 32 to 64. + +Feature not Supported by RX Vector PMD +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +TX vPMD only works when txq_flags is set to IXGBE_SIMPLE_FLAGS. + +This means that it does not support TX multi-segment, VLAN offload and TX csum offload. +The following MACROs are used for these three features: + +* ETH_TXQ_FLAGS_NOMULTSEGS + +* ETH_TXQ_FLAGS_NOVLANOFFL + +* ETH_TXQ_FLAGS_NOXSUMSCTP + +* ETH_TXQ_FLAGS_NOXSUMUDP + +* ETH_TXQ_FLAGS_NOXSUMTCP + + +Sample Application Notes +~~~~~~~~~~~~~~~~~~~~~~~~ + +testpmd +^^^^^^^ + +By default, using CONFIG_RTE_IXGBE_RX_OLFLAGS_DISABLE=n: + +.. code-block:: console + + ./x86_64-native-linuxapp-gcc/app/testpmd -c 300 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01 + +When CONFIG_RTE_IXGBE_RX_OLFLAGS_DISABLE=y, better performance can be achieved: + +.. code-block:: console + + ./x86_64-native-linuxapp-gcc/app/testpmd -c 300 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01 --disable-hw-vlan + +If scatter gather lists are not required, set CONFIG_RTE_MBUF_SCATTER_GATHER=n for better throughput. + +l3fwd +^^^^^ + +When running l3fwd with vPMD, there is one thing to note. +In the configuration, ensure that port_conf.rxmode.hw_ip_checksum=0. +Otherwise, by default, RX vPMD is disabled. + +load_balancer +^^^^^^^^^^^^^ + +As in the case of l3fwd, set configure port_conf.rxmode.hw_ip_checksum=0 to enable vPMD. +In addition, for improved performance, use -bsz "(32,32),(64,64),(32,32)" in load_balancer to avoid using the default burst size of 144. diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst new file mode 100644 index 0000000000..b26c2198f9 --- /dev/null +++ b/doc/guides/nics/mlx4.rst @@ -0,0 +1,315 @@ +.. BSD LICENSE + Copyright 2012-2015 6WIND S.A. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +MLX4 poll mode driver library +============================= + +The MLX4 poll mode driver library (**librte_pmd_mlx4**) implements support +for **Mellanox ConnectX-3** 10/40 Gbps adapters (EN 40, EN 10, Pro EN 40) as +well as their virtual functions (VF) in SR-IOV context. + +.. note:: + + Due to external dependencies, this driver is disabled by default. It must + be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX4_PMD=y`` and + recompiling DPDK. + +Implementation details +---------------------- + +Most Mellanox ConnectX-3 devices provide two ports but expose a single PCI +bus address, thus unlike most drivers, librte_pmd_mlx4 registers itself as a +PCI driver that allocates one Ethernet device per detected port. + +For this reason, one cannot white/blacklist a single port without also +white/blacklisting the others on the same device. + +Besides its dependency on libibverbs (that implies libmlx4 and associated +kernel support), librte_pmd_mlx4 relies heavily on system calls for control +operations such as querying/updating the MTU and flow control parameters. + +For security reasons and robustness, this driver only deals with virtual +memory addresses. The way resources allocations are handled by the kernel +combined with hardware specifications that allow it to handle virtual memory +addresses directly ensure that DPDK applications cannot access random +physical memory (or memory that does not belong to the current process). + +This capability allows the PMD to coexist with kernel network interfaces +which remain functional, although they stop receiving unicast packets as +long as they share the same MAC address. + +Compiling librte_pmd_mlx4 causes DPDK to be linked against libibverbs. + +Features and limitations +------------------------ + +- RSS, also known as RCA, is supported. In this mode the number of + configured RX queues must be a power of two. +- VLAN filtering is supported. +- Link state information is provided. +- Promiscuous mode is supported. +- All multicast mode is supported. +- Multiple MAC addresses (unicast, multicast) can be configured. +- Scattered packets are supported for TX and RX. + +.. + +- RSS hash key cannot be modified. +- Hardware counters are not implemented (they are software counters). +- Checksum offloads are not supported yet. + +Configuration +------------- + +Compilation options +~~~~~~~~~~~~~~~~~~~ + +- ``CONFIG_RTE_LIBRTE_MLX4_PMD`` (default **n**) + + Toggle compilation of librte_pmd_mlx4 itself. + +- ``CONFIG_RTE_LIBRTE_MLX4_DEBUG`` (default **n**) + + Toggle debugging code and stricter compilation flags. Enabling this option + adds additional run-time checks and debugging messages at the cost of + lower performance. + +- ``CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N`` (default **4**) + + Number of scatter/gather elements (SGEs) per work request (WR). Lowering + this number improves performance but also limits the ability to receive + scattered packets (packets that do not fit a single mbuf). The default + value is a safe tradeoff. + +- ``CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE`` (default **0**) + + Amount of data to be inlined during TX operations. Improves latency but + lowers throughput. + +- ``CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE`` (default **8**) + + Maximum number of cached memory pools (MPs) per TX queue. Each MP from + which buffers are to be transmitted must be associated to memory regions + (MRs). This is a slow operation that must be cached. + + This value is always 1 for RX queues since they use a single MP. + +- ``CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS`` (default **1**) + + Toggle software counters. No counters are available if this option is + disabled since hardware counters are not supported. + +Environment variables +~~~~~~~~~~~~~~~~~~~~~ + +- ``MLX4_INLINE_RECV_SIZE`` + + A nonzero value enables inline receive for packets up to that size. May + significantly improve performance in some cases but lower it in + others. Requires careful testing. + +Run-time configuration +~~~~~~~~~~~~~~~~~~~~~~ + +- The only constraint when RSS mode is requested is to make sure the number + of RX queues is a power of two. This is a hardware requirement. + +- librte_pmd_mlx4 brings kernel network interfaces up during initialization + because it is affected by their state. Forcing them down prevents packets + reception. + +- **ethtool** operations on related kernel interfaces also affect the PMD. + +Prerequisites +------------- + +This driver relies on external libraries and kernel drivers for resources +allocations and initialization. The following dependencies are not part of +DPDK and must be installed separately: + +- **libibverbs** + + User space verbs framework used by librte_pmd_mlx4. This library provides + a generic interface between the kernel and low-level user space drivers + such as libmlx4. + + It allows slow and privileged operations (context initialization, hardware + resources allocations) to be managed by the kernel and fast operations to + never leave user space. + +- **libmlx4** + + Low-level user space driver library for Mellanox ConnectX-3 devices, + it is automatically loaded by libibverbs. + + This library basically implements send/receive calls to the hardware + queues. + +- **Kernel modules** (mlnx-ofed-kernel) + + They provide the kernel-side verbs API and low level device drivers that + manage actual hardware initialization and resources sharing with user + space processes. + + Unlike most other PMDs, these modules must remain loaded and bound to + their devices: + + - mlx4_core: hardware driver managing Mellanox ConnectX-3 devices. + - mlx4_en: Ethernet device driver that provides kernel network interfaces. + - mlx4_ib: InifiniBand device driver. + - ib_uverbs: user space driver for verbs (entry point for libibverbs). + +While these libraries and kernel modules are available on OpenFabrics +Aliance's `website `_ and provided by package +managers on most distributions, this PMD requires Ethernet extensions that +may not be supported at the moment (this is a work in progress). + +`Mellanox OFED +`_ +includes the necessary support and should be used in the meantime. For DPDK, +only libibverbs, libmlx4 and mlnx-ofed-kernel packages are required from +that distribution. + +.. note:: + + Both libraries are BSD and GPL licensed. Linux kernel modules are GPL + licensed. + +Usage example +------------- + +This section demonstrates how to launch **testpmd** with Mellanox ConnectX-3 +devices managed by librte_pmd_mlx4. + +#. Load the kernel modules: + + .. code-block:: console + + modprobe -a ib_uverbs mlx4_en mlx4_core mlx4_ib + + .. note:: + + User space I/O kernel modules (uio and igb_uio) are not used and do + not have to be loaded. + +#. Make sure Ethernet interfaces are in working order and linked to kernel + verbs. Related sysfs entries should be present: + + .. code-block:: console + + ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 + + Example output: + + .. code-block:: console + + eth2 + eth3 + eth4 + eth5 + +#. Optionally, retrieve their PCI bus addresses for whitelisting: + + .. code-block:: console + + { + for intf in eth2 eth3 eth4 eth5; + do + (cd "/sys/class/net/${intf}/device/" && pwd -P); + done; + } | + sed -n 's,.*/\(.*\),-w \1,p' + + Example output: + + .. code-block:: console + + -w 0000:83:00.0 + -w 0000:83:00.0 + -w 0000:84:00.0 + -w 0000:84:00.0 + + .. note:: + + There are only two distinct PCI bus addresses because the Mellanox + ConnectX-3 adapters installed on this system are dual port. + +#. Request huge pages: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages + +#. Start testpmd with basic parameters: + + .. code-block:: console + + testpmd -c 0xff00 -n 4 -w 0000:83:00.0 -w 0000:84:00.0 -- --rxq=2 --txq=2 -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:83:00.0 on NUMA socket 1 + EAL: probe driver: 15b3:1007 librte_pmd_mlx4 + PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false) + PMD: librte_pmd_mlx4: 2 port(s) detected + PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:b7:50 + PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:b7:51 + EAL: PCI device 0000:84:00.0 on NUMA socket 1 + EAL: probe driver: 15b3:1007 librte_pmd_mlx4 + PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_1" (VF: false) + PMD: librte_pmd_mlx4: 2 port(s) detected + PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:ba:b0 + PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:ba:b1 + Interactive-mode selected + Configuring Port 0 (socket 0) + PMD: librte_pmd_mlx4: 0x867d60: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867d60: RX queues number update: 0 -> 2 + Port 0: 00:02:C9:B5:B7:50 + Configuring Port 1 (socket 0) + PMD: librte_pmd_mlx4: 0x867da0: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867da0: RX queues number update: 0 -> 2 + Port 1: 00:02:C9:B5:B7:51 + Configuring Port 2 (socket 0) + PMD: librte_pmd_mlx4: 0x867de0: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867de0: RX queues number update: 0 -> 2 + Port 2: 00:02:C9:B5:BA:B0 + Configuring Port 3 (socket 0) + PMD: librte_pmd_mlx4: 0x867e20: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867e20: RX queues number update: 0 -> 2 + Port 3: 00:02:C9:B5:BA:B1 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 40000 Mbps - full-duplex + Port 2 Link Up - speed 10000 Mbps - full-duplex + Port 3 Link Up - speed 40000 Mbps - full-duplex + Done + testpmd> diff --git a/doc/guides/nics/pcap_ring.rst b/doc/guides/nics/pcap_ring.rst new file mode 100644 index 0000000000..702da89af8 --- /dev/null +++ b/doc/guides/nics/pcap_ring.rst @@ -0,0 +1,267 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Libpcap and Ring Based Poll Mode Drivers +======================================== + +In addition to Poll Mode Drivers (PMDs) for physical and virtual hardware, +the DPDK also includes two pure-software PMDs. These two drivers are: + +* A libpcap -based PMD (librte_pmd_pcap) that reads and writes packets using libpcap, + - both from files on disk, as well as from physical NIC devices using standard Linux kernel drivers. + +* A ring-based PMD (librte_pmd_ring) that allows a set of software FIFOs (that is, rte_ring) + to be accessed using the PMD APIs, as though they were physical NICs. + +.. note:: + + The libpcap -based PMD is disabled by default in the build configuration files, + owing to an external dependency on the libpcap development files which must be installed on the board. + Once the libpcap development files are installed, + the library can be enabled by setting CONFIG_RTE_LIBRTE_PMD_PCAP=y and recompiling the Intel® DPDK. + +Using the Drivers from the EAL Command Line +------------------------------------------- + +For ease of use, the DPDK EAL also has been extended to allow pseudo-ethernet devices, +using one or more of these drivers, +to be created at application startup time during EAL initialization. + +To do so, the --vdev= parameter must be passed to the EAL. +This takes take options to allow ring and pcap-based Ethernet to be allocated and used transparently by the application. +This can be used, for example, for testing on a virtual machine where there are no Ethernet ports. + +Libpcap-based PMD +~~~~~~~~~~~~~~~~~ + +Pcap-based devices can be created using the virtual device --vdev option. +The device name must start with the eth_pcap prefix followed by numbers or letters. +The name is unique for each device. Each device can have multiple stream options and multiple devices can be used. +Multiple device definitions can be arranged using multiple --vdev. +Device name and stream options must be separated by commas as shown below: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c f -n 4 --vdev 'eth_pcap0,stream_opt0=..,stream_opt1=..' --vdev='eth_pcap1,stream_opt0=..' + +Device Streams +^^^^^^^^^^^^^^ + +Multiple ways of stream definitions can be assessed and combined as long as the following two rules are respected: + +* A device is provided with two different streams - reception and transmission. + +* A device is provided with one network interface name used for reading and writing packets. + +The different stream types are: + +* rx_pcap: Defines a reception stream based on a pcap file. + The driver reads each packet within the given pcap file as if it was receiving it from the wire. + The value is a path to a valid pcap file. + + rx_pcap=/path/to/file.pcap + +* tx_pcap: Defines a transmission stream based on a pcap file. + The driver writes each received packet to the given pcap file. + The value is a path to a pcap file. + The file is overwritten if it already exists and it is created if it does not. + + tx_pcap=/path/to/file.pcap + +* rx_iface: Defines a reception stream based on a network interface name. + The driver reads packets coming from the given interface using the Linux kernel driver for that interface. + The value is an interface name. + + rx_iface=eth0 + +* tx_iface: Defines a transmission stream based on a network interface name. + The driver sends packets to the given interface using the Linux kernel driver for that interface. + The value is an interface name. + + tx_iface=eth0 + +* iface: Defines a device mapping a network interface. + The driver both reads and writes packets from and to the given interface. + The value is an interface name. + + iface=eth0 + +Examples of Usage +^^^^^^^^^^^^^^^^^ + +Read packets from one pcap file and write them to another: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' -- --port-topology=chained + +Read packets from a network interface and write them to a pcap file: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_iface=eth0,tx_pcap=/path/to/file_tx.pcap' -- --port-topology=chained + +Read packets from a pcap file and write them to a network interface: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_iface=eth1' -- --port-topology=chained + +Forward packets through two network interfaces: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,iface=eth0' --vdev='eth_pcap1;iface=eth1' + +Using libpcap-based PMD with the testpmd Application +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +One of the first things that testpmd does before starting to forward packets is to flush the RX streams +by reading the first 512 packets on every RX stream and discarding them. +When using a libpcap-based PMD this behavior can be turned off using the following command line option: + +.. code-block:: console + + --no-flush-rx + +It is also available in the runtime command line: + +.. code-block:: console + + set flush_rx on/off + +It is useful for the case where the rx_pcap is being used and no packets are meant to be discarded. +Otherwise, the first 512 packets from the input pcap file will be discarded by the RX flushing operation. + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' -- --port-topology=chained --no-flush-rx + + +Rings-based PMD +~~~~~~~~~~~~~~~ + +To run a DPDK application on a machine without any Ethernet devices, a pair of ring-based rte_ethdevs can be used as below. +The device names passed to the --vdev option must start with eth_ring and take no additional parameters. +Multiple devices may be specified, separated by commas. + +.. code-block:: console + + ./testpmd -c E -n 4 --vdev=eth_ring0 --vdev=eth_ring1 -- -i + EAL: Detected lcore 1 as core 1 on socket 0 + ... + + Interactive-mode selected + Configuring Port 0 (socket 0) + Configuring Port 1 (socket 0) + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + + testpmd> start tx_first + io packet forwarding - CRC stripping disabled - packets/burst=16 + nb forwarding cores=1 - nb forwarding ports=2 + RX queues=1 - RX desc=128 - RX free threshold=0 + RX threshold registers: pthresh=8 hthresh=8 wthresh=4 + TX queues=1 - TX desc=512 - TX free threshold=0 + TX threshold registers: pthresh=36 hthresh=0 wthresh=0 + TX RS bit threshold=0 - TXQ flags=0x0 + + testpmd> stop + Telling cores to stop... + Waiting for lcores to finish... + +.. image:: img/forward_stats.* + +.. code-block:: console + + +++++++++++++++ Accumulated forward statistics for allports++++++++++ + RX-packets: 462384736 RX-dropped: 0 RX-total: 462384736 + TX-packets: 462384768 TX-dropped: 0 TX-total: 462384768 + +++++++++++++++++++++++++++++++++++++++++++++++++++++ + + Done. + + +Using the Poll Mode Driver from an Application +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Both drivers can provide similar APIs to allow the user to create a PMD, that is, +rte_ethdev structure, instances at run-time in the end-application, +for example, using rte_eth_from_rings() or rte_eth_from_pcaps() APIs. +For the rings- based PMD, this functionality could be used, for example, +to allow data exchange between cores using rings to be done in exactly the +same way as sending or receiving packets from an Ethernet device. +For the libpcap-based PMD, it allows an application to open one or more pcap files +and use these as a source of packet input to the application. + +Usage Examples +^^^^^^^^^^^^^^ + +To create two pseudo-ethernet ports where all traffic sent to a port is looped back +for reception on the same port (error handling omitted for clarity): + +.. code-block:: c + + struct rte_ring *r1, *r2; + int port1, port2; + + r1 = rte_ring_create("R1", 256, SOCKET0,RING_F_SP_ENQ|RING_F_SC_DEQ); + r2 = rte_ring_create("R2", 256, SOCKET0, RING_F_SP_ENQ|RING_F_SC_DEQ); + + /* create an ethdev where RX and TX are done to/from r1, and * another from r2 */ + + port1 = rte_eth_from_rings(r1, 1, r1, 1, SOCKET0); + port2 = rte_eth_from_rings(r2, 1, r2, 1, SOCKET0); + + +To create two pseudo-Ethernet ports where the traffic is switched between them, +that is, traffic sent to port 1 is read back from port 2 and vice-versa, +the final two lines could be changed as below: + +.. code-block:: c + + port1 = rte_eth_from_rings(r1, 1, r2, 1, SOCKET0); + port2 = rte_eth_from_rings(r2, 1, r1, 1, SOCKET0); + +This type of configuration could be useful in a pipeline model, for example, +where one may want to have inter-core communication using pseudo Ethernet devices rather than raw rings, +for reasons of API consistency. + +Enqueuing and dequeuing items from an rte_ring using the rings-based PMD may be slower than using the native rings API. +This is because DPDK Ethernet drivers make use of function pointers to call the appropriate enqueue or dequeue functions, +while the rte_ring specific functions are direct function calls in the code and are often inlined by the compiler. + + Once an ethdev has been created, for either a ring or a pcap-based PMD, + it should be configured and started in the same way as a regular Ethernet device, that is, + by calling rte_eth_dev_configure() to set the number of receive and transmit queues, + then calling rte_eth_rx_queue_setup() / tx_queue_setup() for each of those queues and + finally calling rte_eth_dev_start() to allow transmission and reception of packets to begin. diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst new file mode 100644 index 0000000000..073d980b9d --- /dev/null +++ b/doc/guides/nics/virtio.rst @@ -0,0 +1,209 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Poll Mode Driver for Emulated Virtio NIC +======================================== + +Virtio is a para-virtualization framework initiated by IBM, and supported by KVM hypervisor. +In the Data Plane Development Kit (DPDK), +we provide a virtio Poll Mode Driver (PMD) as a software solution, comparing to SRIOV hardware solution, +for fast guest VM to guest VM communication and guest VM to host communication. + +Vhost is a kernel acceleration module for virtio qemu backend. +The DPDK extends kni to support vhost raw socket interface, +which enables vhost to directly read/ write packets from/to a physical port. +With this enhancement, virtio could achieve quite promising performance. + +In future release, we will also make enhancement to vhost backend, +releasing peak performance of virtio PMD driver. + +For basic qemu-KVM installation and other Intel EM poll mode driver in guest VM, +please refer to Chapter "Driver for VM Emulated Devices". + +In this chapter, we will demonstrate usage of virtio PMD driver with two backends, +standard qemu vhost back end and vhost kni back end. + +Virtio Implementation in DPDK +----------------------------- + +For details about the virtio spec, refer to Virtio PCI Card Specification written by Rusty Russell. + +As a PMD, virtio provides packet reception and transmission callbacks virtio_recv_pkts and virtio_xmit_pkts. + +In virtio_recv_pkts, index in range [vq->vq_used_cons_idx , vq->vq_ring.used->idx) in vring is available for virtio to burst out. + +In virtio_xmit_pkts, same index range in vring is available for virtio to clean. +Virtio will enqueue to be transmitted packets into vring, advance the vq->vq_ring.avail->idx, +and then notify the host back end if necessary. + +Features and Limitations of virtio PMD +-------------------------------------- + +In this release, the virtio PMD driver provides the basic functionality of packet reception and transmission. + +* It supports merge-able buffers per packet when receiving packets and scattered buffer per packet + when transmitting packets. The packet size supported is from 64 to 1518. + +* It supports multicast packets and promiscuous mode. + +* The descriptor number for the RX/TX queue is hard-coded to be 256 by qemu. + If given a different descriptor number by the upper application, + the virtio PMD generates a warning and fall back to the hard-coded value. + +* Features of mac/vlan filter are supported, negotiation with vhost/backend are needed to support them. + When backend can't support vlan filter, virtio app on guest should disable vlan filter to make sure + the virtio port is configured correctly. E.g. specify '--disable-hw-vlan' in testpmd command line. + +* RTE_PKTMBUF_HEADROOM should be defined larger than sizeof(struct virtio_net_hdr), which is 10 bytes. + +* Virtio does not support runtime configuration. + +* Virtio supports Link State interrupt. + +* Virtio supports software vlan stripping and inserting. + +* Virtio supports using port IO to get PCI resource when uio/igb_uio module is not available. + +Prerequisites +------------- + +The following prerequisites apply: + +* In the BIOS, turn VT-x and VT-d on + +* Linux kernel with KVM module; vhost module loaded and ioeventfd supported. + Qemu standard backend without vhost support isn't tested, and probably isn't supported. + +Virtio with kni vhost Back End +------------------------------ + +This section demonstrates kni vhost back end example setup for Phy-VM Communication. + +.. _nic_figure_5: + +**Figure 5. Host2VM Communication Example Using kni vhost Back End** + +.. image:: img/host_vm_comms.* + +Host2VM communication example + +#. Load the kni kernel module: + + .. code-block:: console + + insmod rte_kni.ko + + Other basic DPDK preparations like hugepage enabling, uio port binding are not listed here. + Please refer to the *DPDK Getting Started Guide* for detailed instructions. + +#. Launch the kni user application: + + .. code-block:: console + + examples/kni/build/app/kni -c 0xf -n 4 -- -p 0x1 -i 0x1 -o 0x2 + + This command generates one network device vEth0 for physical port. + If specify more physical ports, the generated network device will be vEth1, vEth2, and so on. + + For each physical port, kni creates two user threads. + One thread loops to fetch packets from the physical NIC port into the kni receive queue. + The other user thread loops to send packets in the kni transmit queue. + + For each physical port, kni also creates a kernel thread that retrieves packets from the kni receive queue, + place them onto kni's raw socket's queue and wake up the vhost kernel thread to exchange packets with the virtio virt queue. + + For more details about kni, please refer to Chapter 24 "Kernel NIC Interface". + +#. Enable the kni raw socket functionality for the specified physical NIC port, + get the generated file descriptor and set it in the qemu command line parameter. + Always remember to set ioeventfd_on and vhost_on. + + Example: + + .. code-block:: console + + echo 1 > /sys/class/net/vEth0/sock_en + fd=`cat /sys/class/net/vEth0/sock_fd` + exec qemu-system-x86_64 -enable-kvm -cpu host \ + -m 2048 -smp 4 -name dpdk-test1-vm1 \ + -drive file=/data/DPDKVMS/dpdk-vm.img \ + -netdev tap, fd=$fd,id=mynet_kni, script=no,vhost=on \ + -device virtio-net-pci,netdev=mynet_kni,bus=pci.0,addr=0x3,ioeventfd=on \ + -vnc:1 -daemonize + + In the above example, virtio port 0 in the guest VM will be associated with vEth0, which in turns corresponds to a physical port, + which means received packets come from vEth0, and transmitted packets is sent to vEth0. + +#. In the guest, bind the virtio device to the uio_pci_generic kernel module and start the forwarding application. + When the virtio port in guest bursts rx, it is getting packets from the raw socket's receive queue. + When the virtio port bursts tx, it is sending packet to the tx_q. + + .. code-block:: console + + modprobe uio + echo 512 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages + modprobe uio_pci_generic + python tools/dpdk_nic_bind.py -b uio_pci_generic 00:03.0 + + We use testpmd as the forwarding application in this example. + + .. image:: img/console.* + +#. Use IXIA packet generator to inject a packet stream into the KNI physical port. + + The packet reception and transmission flow path is: + + IXIA packet generator->82599 PF->KNI rx queue->KNI raw socket queue->Guest VM virtio port 0 rx burst->Guest VM virtio port 0 tx burst-> KNI tx queue->82599 PF-> IXIA packet generator + +Virtio with qemu virtio Back End +-------------------------------- + +.. _nic_figure_6: + +**Figure 6. Host2VM Communication Example Using qemu vhost Back End** + +.. image:: img/host_vm_comms_qemu.* + +.. code-block:: console + + qemu-system-x86_64 -enable-kvm -cpu host -m 2048 -smp 2 -mem-path /dev/ + hugepages -mem-prealloc + -drive file=/data/DPDKVMS/dpdk-vm1 + -netdev tap,id=vm1_p1,ifname=tap0,script=no,vhost=on + -device virtio-net-pci,netdev=vm1_p1,bus=pci.0,addr=0x3,ioeventfd=on + -device pci-assign,host=04:10.1 \ + +In this example, the packet reception flow path is: + + IXIA packet generator->82599 PF->Linux Bridge->TAP0's socket queue-> Guest VM virtio port 0 rx burst-> Guest VM 82599 VF port1 tx burst-> IXIA packet generator + +The packet transmission flow is: + + IXIA packet generator-> Guest VM 82599 VF port1 rx burst-> Guest VM virtio port 0 tx burst-> tap -> Linux Bridge->82599 PF-> IXIA packet generator diff --git a/doc/guides/nics/vmxnet3.rst b/doc/guides/nics/vmxnet3.rst new file mode 100644 index 0000000000..3aa5b40081 --- /dev/null +++ b/doc/guides/nics/vmxnet3.rst @@ -0,0 +1,172 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Poll Mode Driver for Paravirtual VMXNET3 NIC +============================================ + +The VMXNET3 adapter is the next generation of a paravirtualized NIC, introduced by VMware* ESXi. +It is designed for performance and is not related to VMXNET or VMXENET2. +It offers all the features available in VMXNET2, and adds several new features such as, +multi-queue support (also known as Receive Side Scaling, RSS), +IPv6 offloads, and MSI/MSI-X interrupt delivery. +Because operating system vendors do not provide built-in drivers for this card, +VMware Tools must be installed to have a driver for the VMXNET3 network adapter available. +One can use the same device in a DPDK application with VMXNET3 PMD introduced in DPDK API. + +Currently, the driver provides basic support for using the device in a DPDK application running on a guest OS. +Optimization is needed on the backend, that is, the VMware* ESXi vmkernel switch, to achieve optimal performance end-to-end. + +In this chapter, two setups with the use of the VMXNET3 PMD are demonstrated: + +#. Vmxnet3 with a native NIC connected to a vSwitch + +#. Vmxnet3 chaining VMs connected to a vSwitch + +VMXNET3 Implementation in the DPDK +---------------------------------- + +For details on the VMXNET3 device, refer to the VMXNET3 driver's vmxnet3 directory and support manual from VMware*. + +For performance details, refer to the following link from VMware: + +`http://www.vmware.com/pdf/vsp_4_vmxnet3_perf.pdf `_ + +As a PMD, the VMXNET3 driver provides the packet reception and transmission callbacks, vmxnet3_recv_pkts and vmxnet3_xmit_pkts. +It does not support scattered packet reception as part of vmxnet3_recv_pkts and vmxnet3_xmit_pkts. +Also, it does not support scattered packet reception as part of the device operations supported. + +The VMXNET3 PMD handles all the packet buffer memory allocation and resides in guest address space +and it is solely responsible to free that memory when not needed. +The packet buffers and features to be supported are made available to hypervisor via VMXNET3 PCI configuration space BARs. +During RX/TX, the packet buffers are exchanged by their GPAs, +and the hypervisor loads the buffers with packets in the RX case and sends packets to vSwitch in the TX case. + +The VMXNET3 PMD is compiled with vmxnet3 device headers. +The interface is similar to that of the other PMDs available in the DPDK API. +The driver pre-allocates the packet buffers and loads the command ring descriptors in advance. +The hypervisor fills those packet buffers on packet arrival and write completion ring descriptors, +which are eventually pulled by the PMD. +After reception, the DPDK application frees the descriptors and loads new packet buffers for the coming packets. +The interrupts are disabled and there is no notification required. +This keeps performance up on the RX side, even though the device provides a notification feature. + +In the transmit routine, the DPDK application fills packet buffer pointers in the descriptors of the command ring +and notifies the hypervisor. +In response the hypervisor takes packets and passes them to the vSwitch. It writes into the completion descriptors ring. +The rings are read by the PMD in the next transmit routine call and the buffers and descriptors are freed from memory. + +Features and Limitations of VMXNET3 PMD +--------------------------------------- + +In release 1.6.0, the VMXNET3 PMD provides the basic functionality of packet reception and transmission. +There are several options available for filtering packets at VMXNET3 device level including: + +#. MAC Address based filtering: + + * Unicast, Broadcast, All Multicast modes - SUPPORTED BY DEFAULT + + * Multicast with Multicast Filter table - NOT SUPPORTED + + * Promiscuous mode - SUPPORTED + + * RSS based load balancing between queues - SUPPORTED + +#. VLAN filtering: + + * VLAN tag based filtering without load balancing - SUPPORTED + +.. note:: + + + * Release 1.6.0 does not support separate headers and body receive cmd_ring and hence, + multiple segment buffers are not supported. + Only cmd_ring_0 is used for packet buffers, one for each descriptor. + + * Receive and transmit of scattered packets is not supported. + + * Multicast with Multicast Filter table is not supported. + +Prerequisites +------------- + +The following prerequisites apply: + +* Before starting a VM, a VMXNET3 interface to a VM through VMware vSphere Client must be assigned. + This is shown in the figure below. + +.. image:: img/vmxnet3_int.* + +.. note:: + + Depending on the Virtual Machine type, the VMware vSphere Client shows Ethernet adaptors while adding an Ethernet device. + Ensure that the VM type used offers a VMXNET3 device. Refer to the VMware documentation for a listed of VMs. + +.. note:: + + Follow the *DPDK Getting Started Guide* to setup the basic DPDK environment. + +.. note:: + + Follow the *DPDK Sample Application's User Guide*, L2 Forwarding/L3 Forwarding and + TestPMD for instructions on how to run a DPDK application using an assigned VMXNET3 device. + +VMXNET3 with a Native NIC Connected to a vSwitch +------------------------------------------------ + +This section describes an example setup for Phy-vSwitch-VM-Phy communication. + +.. image:: img/vswitch_vm.* + +.. note:: + + Other instructions on preparing to use DPDK such as, hugepage enabling, uio port binding are not listed here. + Please refer to *DPDK Getting Started Guide and DPDK Sample Application's User Guide* for detailed instructions. + +The packet reception and transmission flow path is: + + Packet generator -> 82576 -> VMware ESXi vSwitch -> VMXNET3 device -> Guest VM VMXNET3 port 0 rx burst -> Guest + VM 82599 VF port 0 tx burst -> 82599 VF -> Packet generator + +VMXNET3 Chaining VMs Connected to a vSwitch +------------------------------------------- + +The following figure shows an example VM-to-VM communication over a Phy-VM-vSwitch-VM-Phy communication channel. + +.. image:: img/vm_vm_comms.* + +.. note:: + + When using the L2 Forwarding or L3 Forwarding applications, + a destination MAC address needs to be written in packets to hit the other VM's VMXNET3 interface. + +In this example, the packet flow path is: + + Packet generator -> 82599 VF -> Guest VM 82599 port 0 rx burst -> Guest VM VMXNET3 port 1 tx burst -> VMXNET3 + device -> VMware ESXi vSwitch -> VMXNET3 device -> Guest VM VMXNET3 port 0 rx burst -> Guest VM 82599 VF port 1 tx burst -> 82599 VF -> Packet generator diff --git a/doc/guides/prog_guide/driver_vm_emul_dev.rst b/doc/guides/prog_guide/driver_vm_emul_dev.rst deleted file mode 100644 index fedafad412..0000000000 --- a/doc/guides/prog_guide/driver_vm_emul_dev.rst +++ /dev/null @@ -1,178 +0,0 @@ -.. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of Intel Corporation nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -Driver for VM Emulated Devices -============================== - -The DPDK EM poll mode driver supports the following emulated devices: - -* qemu-kvm emulated Intel® 82540EM Gigabit Ethernet Controller (qemu e1000 device) - -* VMware* emulated Intel® 82545EM Gigabit Ethernet Controller - -* VMware emulated Intel® 8274L Gigabit Ethernet Controller. - -Validated Hypervisors ---------------------- - -The validated hypervisors are: - -* KVM (Kernel Virtual Machine) with Qemu, version 0.14.0 - -* KVM (Kernel Virtual Machine) with Qemu, version 0.15.1 - -* VMware ESXi 5.0, Update 1 - -Recommended Guest Operating System in Virtual Machine ------------------------------------------------------ - -The recommended guest operating system in a virtualized environment is: - -* Fedora* 18 (64-bit) - -For supported kernel versions, refer to the *DPDK Release Notes*. - -Setting Up a KVM Virtual Machine --------------------------------- - -The following describes a target environment: - -* Host Operating System: Fedora 14 - -* Hypervisor: KVM (Kernel Virtual Machine) with Qemu version, 0.14.0 - -* Guest Operating System: Fedora 14 - -* Linux Kernel Version: Refer to the DPDK Getting Started Guide - -* Target Applications: testpmd - -The setup procedure is as follows: - -#. Download qemu-kvm-0.14.0 from - `http://sourceforge.net/projects/kvm/files/qemu-kvm/ `_ - and install it in the Host OS using the following steps: - - When using a recent kernel (2.6.25+) with kvm modules included: - - .. code-block:: console - - tar xzf qemu-kvm-release.tar.gz cd qemu-kvm-release - ./configure --prefix=/usr/local/kvm - make - sudo make install - sudo /sbin/modprobe kvm-intel - - When using an older kernel or a kernel from a distribution without the kvm modules, - you must download (from the same link), compile and install the modules yourself: - - .. code-block:: console - - tar xjf kvm-kmod-release.tar.bz2 - cd kvm-kmod-release - ./configure - make - sudo make install - sudo /sbin/modprobe kvm-intel - - Note that qemu-kvm installs in the /usr/local/bin directory. - - For more details about KVM configuration and usage, please refer to: - `http://www.linux-kvm.org/page/HOWTO1 `_. - -#. Create a Virtual Machine and install Fedora 14 on the Virtual Machine. - This is referred to as the Guest Operating System (Guest OS). - -#. Start the Virtual Machine with at least one emulated e1000 device. - - .. note:: - - The Qemu provides several choices for the emulated network device backend. - Most commonly used is a TAP networking backend that uses a TAP networking device in the host. - For more information about Qemu supported networking backends and different options for configuring networking at Qemu, - please refer to: - - — `http://www.linux-kvm.org/page/Networking `_ - - — `http://wiki.qemu.org/Documentation/Networking `_ - - — `http://qemu.weilnetz.de/qemu-doc.html `_ - - For example, to start a VM with two emulated e1000 devices, issue the following command: - - .. code-block:: console - - /usr/local/kvm/bin/qemu-system-x86_64 -cpu host -smp 4 -hda qemu1.raw -m 1024 - -net nic,model=e1000,vlan=1,macaddr=DE:AD:1E:00:00:01 - -net tap,vlan=1,ifname=tapvm01,script=no,downscript=no - -net nic,model=e1000,vlan=2,macaddr=DE:AD:1E:00:00:02 - -net tap,vlan=2,ifname=tapvm02,script=no,downscript=no - - where: - - — -m = memory to assign - - — -smp = number of smp cores - - — -hda = virtual disk image - - This command starts a new virtual machine with two emulated 82540EM devices, - backed up with two TAP networking host interfaces, tapvm01 and tapvm02. - - .. code-block:: console - - # ip tuntap show - tapvm01: tap - tapvm02: tap - -#. Configure your TAP networking interfaces using ip/ifconfig tools. - -#. Log in to the guest OS and check that the expected emulated devices exist: - - .. code-block:: console - - # lspci -d 8086:100e - 00:04.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) - 00:05.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) - -#. Install the DPDK and run testpmd. - -Known Limitations of Emulated Devices -------------------------------------- - -The following are known limitations: - -#. The Qemu e1000 RX path does not support multiple descriptors/buffers per packet. - Therefore, rte_mbuf should be big enough to hold the whole packet. - For example, to allow testpmd to receive jumbo frames, use the following: - - testpmd [options] -- --mbuf-size= - -#. Qemu e1000 does not validate the checksum of incoming packets. diff --git a/doc/guides/prog_guide/i40e_ixgbe_igb_virt_func_drv.rst b/doc/guides/prog_guide/i40e_ixgbe_igb_virt_func_drv.rst deleted file mode 100644 index 68d7107e1d..0000000000 --- a/doc/guides/prog_guide/i40e_ixgbe_igb_virt_func_drv.rst +++ /dev/null @@ -1,583 +0,0 @@ -.. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of Intel Corporation nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -I40E/IXGBE/IGB Virtual Function Driver -====================================== - -Supported Intel® Ethernet Controllers (see the *DPDK Release Notes* for details) -support the following modes of operation in a virtualized environment: - -* **SR-IOV mode**: Involves direct assignment of part of the port resources to different guest operating systems - using the PCI-SIG Single Root I/O Virtualization (SR IOV) standard, - also known as "native mode" or "pass-through" mode. - In this chapter, this mode is referred to as IOV mode. - -* **VMDq mode**: Involves central management of the networking resources by an IO Virtual Machine (IOVM) or - a Virtual Machine Monitor (VMM), also known as software switch acceleration mode. - In this chapter, this mode is referred to as the Next Generation VMDq mode. - -SR-IOV Mode Utilization in a DPDK Environment ---------------------------------------------- - -The DPDK uses the SR-IOV feature for hardware-based I/O sharing in IOV mode. -Therefore, it is possible to partition SR-IOV capability on Ethernet controller NIC resources logically and -expose them to a virtual machine as a separate PCI function called a "Virtual Function". -Refer to Figure 10. - -Therefore, a NIC is logically distributed among multiple virtual machines (as shown in Figure 10), -while still having global data in common to share with the Physical Function and other Virtual Functions. -The DPDK fm10kvf, i40evf, igbvf or ixgbevf as a Poll Mode Driver (PMD) serves for the Intel® 82576 Gigabit Ethernet Controller, -Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller NIC, -Intel® Fortville 10/40 Gigabit Ethernet Controller NIC's virtual PCI function,or PCIE host-interface of the Intel Ethernet Switch -FM10000 Series. -Meanwhile the DPDK Poll Mode Driver (PMD) also supports "Physical Function" of such NIC's on the host. - -The DPDK PF/VF Poll Mode Driver (PMD) supports the Layer 2 switch on Intel® 82576 Gigabit Ethernet Controller, -Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller, -and Intel® Fortville 10/40 Gigabit Ethernet Controller NICs so that guest can choose it for inter virtual machine traffic in SR-IOV mode. - -For more detail on SR-IOV, please refer to the following documents: - -* `SR-IOV provides hardware based I/O sharing `_ - -* `PCI-SIG-Single Root I/O Virtualization Support on IA - `_ - -* `Scalable I/O Virtualized Servers `_ - -.. _pg_figure_10: - -**Figure 10. Virtualization for a Single Port NIC in SR-IOV Mode** - -.. image24_png has been renamed - -|single_port_nic| - -Physical and Virtual Function Infrastructure -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The following describes the Physical Function and Virtual Functions infrastructure for the supported Ethernet Controller NICs. - -Virtual Functions operate under the respective Physical Function on the same NIC Port and therefore have no access -to the global NIC resources that are shared between other functions for the same NIC port. - -A Virtual Function has basic access to the queue resources and control structures of the queues assigned to it. -For global resource access, a Virtual Function has to send a request to the Physical Function for that port, -and the Physical Function operates on the global resources on behalf of the Virtual Function. -For this out-of-band communication, an SR-IOV enabled NIC provides a memory buffer for each Virtual Function, -which is called a "Mailbox". - -The PCIE host-interface of Intel Ethernet Switch FM10000 Series VF infrastructure -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -In a virtualized environment, the programmer can enable a maximum of *64 Virtual Functions (VF)* -globally per PCIE host-interface of the Intel Ethernet Switch FM10000 Series device. -Each VF can have a maximum of 16 queue pairs. -The Physical Function in host could be only configured by the Linux* fm10k driver -(in the case of the Linux Kernel-based Virtual Machine [KVM]), DPDK PMD PF driver doesn't support it yet. - -For example, - -* Using Linux* fm10k driver: - - .. code-block:: console - - rmmod fm10k (To remove the fm10k module) - insmod fm0k.ko max_vfs=2,2 (To enable two Virtual Functions per port) - -Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. -When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# -represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. -However: - -* Virtual Functions 0 and 2 belong to Physical Function 0 - -* Virtual Functions 1 and 3 belong to Physical Function 1 - -.. note:: - - The above is an important consideration to take into account when targeting specific packets to a selected port. - -Intel® Fortville 10/40 Gigabit Ethernet Controller VF Infrastructure -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -In a virtualized environment, the programmer can enable a maximum of *128 Virtual Functions (VF)* -globally per Intel® Fortville 10/40 Gigabit Ethernet Controller NIC device. -Each VF can have a maximum of 16 queue pairs. -The Physical Function in host could be either configured by the Linux* i40e driver -(in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver. -When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application. - -For example, - -* Using Linux* i40e driver: - - .. code-block:: console - - rmmod i40e (To remove the i40e module) - insmod i40e.ko max_vfs=2,2 (To enable two Virtual Functions per port) - -* Using the DPDK PMD PF i40e driver: - - Kernel Params: iommu=pt, intel_iommu=on - - .. code-block:: console - - modprobe uio - insmod igb_uio - ./dpdk_nic_bind.py -b igb_uio bb:ss.f - echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device) - - Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. - -Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. -When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# -represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. -However: - -* Virtual Functions 0 and 2 belong to Physical Function 0 - -* Virtual Functions 1 and 3 belong to Physical Function 1 - -.. note:: - - The above is an important consideration to take into account when targeting specific packets to a selected port. - -Intel® 82599 10 Gigabit Ethernet Controller VF Infrastructure -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The programmer can enable a maximum of *63 Virtual Functions* and there must be *one Physical Function* per Intel® 82599 -10 Gigabit Ethernet Controller NIC port. -The reason for this is that the device allows for a maximum of 128 queues per port and a virtual/physical function has to -have at least one queue pair (RX/TX). -The current implementation of the DPDK ixgbevf driver supports a single queue pair (RX/TX) per Virtual Function. -The Physical Function in host could be either configured by the Linux* ixgbe driver -(in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver. -When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application. - -For example, - -* Using Linux* ixgbe driver: - - .. code-block:: console - - rmmod ixgbe (To remove the ixgbe module) - insmod ixgbe max_vfs=2,2 (To enable two Virtual Functions per port) - -* Using the DPDK PMD PF ixgbe driver: - - Kernel Params: iommu=pt, intel_iommu=on - - .. code-block:: console - - modprobe uio - insmod igb_uio - ./dpdk_nic_bind.py -b igb_uio bb:ss.f - echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device) - - Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. - -Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. -When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# -represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. -However: - -* Virtual Functions 0 and 2 belong to Physical Function 0 - -* Virtual Functions 1 and 3 belong to Physical Function 1 - -.. note:: - - The above is an important consideration to take into account when targeting specific packets to a selected port. - -Intel® 82576 Gigabit Ethernet Controller and Intel® Ethernet Controller I350 Family VF Infrastructure -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -In a virtualized environment, an Intel® 82576 Gigabit Ethernet Controller serves up to eight virtual machines (VMs). -The controller has 16 TX and 16 RX queues. -They are generally referred to (or thought of) as queue pairs (one TX and one RX queue). -This gives the controller 16 queue pairs. - -A pool is a group of queue pairs for assignment to the same VF, used for transmit and receive operations. -The controller has eight pools, with each pool containing two queue pairs, that is, two TX and two RX queues assigned to each VF. - -In a virtualized environment, an Intel® Ethernet Controller I350 family device serves up to eight virtual machines (VMs) per port. -The eight queues can be accessed by eight different VMs if configured correctly (the i350 has 4x1GbE ports each with 8T X and 8 RX queues), -that means, one Transmit and one Receive queue assigned to each VF. - -For example, - -* Using Linux* igb driver: - - .. code-block:: console - - rmmod igb (To remove the igb module) - insmod igb max_vfs=2,2 (To enable two Virtual Functions per port) - -* Using Intel® DPDK PMD PF igb driver: - - Kernel Params: iommu=pt, intel_iommu=on modprobe uio - - .. code-block:: console - - insmod igb_uio - ./dpdk_nic_bind.py -b igb_uio bb:ss.f - echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific pci device) - - Launch DPDK testpmd/example or your own host daemon application using the DPDK PMD library. - -Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a four-port NIC. -When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# -represented by (Bus#, Device#, Function#) in sequence, starting from 0 to 7. -However: - -* Virtual Functions 0 and 4 belong to Physical Function 0 - -* Virtual Functions 1 and 5 belong to Physical Function 1 - -* Virtual Functions 2 and 6 belong to Physical Function 2 - -* Virtual Functions 3 and 7 belong to Physical Function 3 - -.. note:: - - The above is an important consideration to take into account when targeting specific packets to a selected port. - -Validated Hypervisors -~~~~~~~~~~~~~~~~~~~~~ - -The validated hypervisor is: - -* KVM (Kernel Virtual Machine) with Qemu, version 0.14.0 - -However, the hypervisor is bypassed to configure the Virtual Function devices using the Mailbox interface, -the solution is hypervisor-agnostic. -Xen* and VMware* (when SR- IOV is supported) will also be able to support the DPDK with Virtual Function driver support. - -Expected Guest Operating System in Virtual Machine -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The expected guest operating systems in a virtualized environment are: - -* Fedora* 14 (64-bit) - -* Ubuntu* 10.04 (64-bit) - -For supported kernel versions, refer to the *DPDK Release Notes*. - -Setting Up a KVM Virtual Machine Monitor ----------------------------------------- - -The following describes a target environment: - -* Host Operating System: Fedora 14 - -* Hypervisor: KVM (Kernel Virtual Machine) with Qemu version 0.14.0 - -* Guest Operating System: Fedora 14 - -* Linux Kernel Version: Refer to the *DPDK Getting Started Guide* - -* Target Applications: l2fwd, l3fwd-vf - -The setup procedure is as follows: - -#. Before booting the Host OS, open **BIOS setup** and enable **Intel® VT features**. - -#. While booting the Host OS kernel, pass the intel_iommu=on kernel command line argument using GRUB. - When using DPDK PF driver on host, pass the iommu=pt kernel command line argument in GRUB. - -#. Download qemu-kvm-0.14.0 from - `http://sourceforge.net/projects/kvm/files/qemu-kvm/ `_ - and install it in the Host OS using the following steps: - - When using a recent kernel (2.6.25+) with kvm modules included: - - .. code-block:: console - - tar xzf qemu-kvm-release.tar.gz - cd qemu-kvm-release - ./configure --prefix=/usr/local/kvm - make - sudo make install - sudo /sbin/modprobe kvm-intel - - When using an older kernel, or a kernel from a distribution without the kvm modules, - you must download (from the same link), compile and install the modules yourself: - - .. code-block:: console - - tar xjf kvm-kmod-release.tar.bz2 - cd kvm-kmod-release - ./configure - make - sudo make install - sudo /sbin/modprobe kvm-intel - - qemu-kvm installs in the /usr/local/bin directory. - - For more details about KVM configuration and usage, please refer to: - - `http://www.linux-kvm.org/page/HOWTO1 `_. - -#. Create a Virtual Machine and install Fedora 14 on the Virtual Machine. - This is referred to as the Guest Operating System (Guest OS). - -#. Download and install the latest ixgbe driver from: - - `http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=14687 `_ - -#. In the Host OS - - When using Linux kernel ixgbe driver, unload the Linux ixgbe driver and reload it with the max_vfs=2,2 argument: - - .. code-block:: console - - rmmod ixgbe - modprobe ixgbe max_vfs=2,2 - - When using DPDK PMD PF driver, insert DPDK kernel module igb_uio and set the number of VF by sysfs max_vfs: - - .. code-block:: console - - modprobe uio - insmod igb_uio - ./dpdk_nic_bind.py -b igb_uio 02:00.0 02:00.1 0e:00.0 0e:00.1 - echo 2 > /sys/bus/pci/devices/0000\:02\:00.0/max_vfs - echo 2 > /sys/bus/pci/devices/0000\:02\:00.1/max_vfs - echo 2 > /sys/bus/pci/devices/0000\:0e\:00.0/max_vfs - echo 2 > /sys/bus/pci/devices/0000\:0e\:00.1/max_vfs - - .. note:: - - You need to explicitly specify number of vfs for each port, for example, - in the command above, it creates two vfs for the first two ixgbe ports. - - Let say we have a machine with four physical ixgbe ports: - - - 0000:02:00.0 - - 0000:02:00.1 - - 0000:0e:00.0 - - 0000:0e:00.1 - - The command above creates two vfs for device 0000:02:00.0: - - .. code-block:: console - - ls -alrt /sys/bus/pci/devices/0000\:02\:00.0/virt* - lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/virtfn1 -> ../0000:02:10.2 - lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/virtfn0 -> ../0000:02:10.0 - - It also creates two vfs for device 0000:02:00.1: - - .. code-block:: console - - ls -alrt /sys/bus/pci/devices/0000\:02\:00.1/virt* - lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/virtfn1 -> ../0000:02:10.3 - lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/virtfn0 -> ../0000:02:10.1 - -#. List the PCI devices connected and notice that the Host OS shows two Physical Functions (traditional ports) - and four Virtual Functions (two for each port). - This is the result of the previous step. - -#. Insert the pci_stub module to hold the PCI devices that are freed from the default driver using the following command - (see http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM Section 4 for more information): - - .. code-block:: console - - sudo /sbin/modprobe pci-stub - - Unbind the default driver from the PCI devices representing the Virtual Functions. - A script to perform this action is as follows: - - .. code-block:: console - - echo "8086 10ed" > /sys/bus/pci/drivers/pci-stub/new_id - echo 0000:08:10.0 > /sys/bus/pci/devices/0000:08:10.0/driver/unbind - echo 0000:08:10.0 > /sys/bus/pci/drivers/pci-stub/bind - - where, 0000:08:10.0 belongs to the Virtual Function visible in the Host OS. - -#. Now, start the Virtual Machine by running the following command: - - .. code-block:: console - - /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0 - - where: - - — -m = memory to assign - - — -smp = number of smp cores - - — -boot = boot option - - — -hda = virtual disk image - - — -device = device to attach - - .. note:: - - — The pci-assign,host=08:10.0 alue indicates that you want to attach a PCI device - to a Virtual Machine and the respective (Bus:Device.Function) - numbers should be passed for the Virtual Function to be attached. - - — qemu-kvm-0.14.0 allows a maximum of four PCI devices assigned to a VM, - but this is qemu-kvm version dependent since qemu-kvm-0.14.1 allows a maximum of five PCI devices. - - — qemu-system-x86_64 also has a -cpu command line option that is used to select the cpu_model - to emulate in a Virtual Machine. Therefore, it can be used as: - - .. code-block:: console - - /usr/local/kvm/bin/qemu-system-x86_64 -cpu ? - - (to list all available cpu_models) - - /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -cpu host -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0 - - (to use the same cpu_model equivalent to the host cpu) - - For more information, please refer to: `http://wiki.qemu.org/Features/CPUModels `_. - -#. Install and run DPDK host app to take over the Physical Function. Eg. - - .. code-block:: console - - make install T=x86_64-native-linuxapp-gcc - ./x86_64-native-linuxapp-gcc/app/testpmd -c f -n 4 -- -i - -#. Finally, access the Guest OS using vncviewer with the localhost:5900 port and check the lspci command output in the Guest OS. - The virtual functions will be listed as available for use. - -#. Configure and install the DPDK with an x86_64-native-linuxapp-gcc configuration on the Guest OS as normal, - that is, there is no change to the normal installation procedure. - - .. code-block:: console - - make config T=x86_64-native-linuxapp-gcc O=x86_64-native-linuxapp-gcc - cd x86_64-native-linuxapp-gcc - make - -.. note:: - - If you are unable to compile the DPDK and you are getting "error: CPU you selected does not support x86-64 instruction set", - power off the Guest OS and start the virtual machine with the correct -cpu option in the qemu- system-x86_64 command as shown in step 9. - You must select the best x86_64 cpu_model to emulate or you can select host option if available. - -.. note:: - - Run the DPDK l2fwd sample application in the Guest OS with Hugepages enabled. - For the expected benchmark performance, you must pin the cores from the Guest OS to the Host OS (taskset can be used to do this) and - you must also look at the PCI Bus layout on the board to ensure you are not running the traffic over the QPI Inteface. - -.. note:: - - * The Virtual Machine Manager (the Fedora package name is virt-manager) is a utility for virtual machine management - that can also be used to create, start, stop and delete virtual machines. - If this option is used, step 2 and 6 in the instructions provided will be different. - - * virsh, a command line utility for virtual machine management, - can also be used to bind and unbind devices to a virtual machine in Ubuntu. - If this option is used, step 6 in the instructions provided will be different. - - * The Virtual Machine Monitor (see Figure 11) is equivalent to a Host OS with KVM installed as described in the instructions. - -.. _pg_figure_11: - -**Figure 11. Performance Benchmark Setup** - -.. image25_png has been renamed - -|perf_benchmark| - -DPDK SR-IOV PMD PF/VF Driver Usage Model ----------------------------------------- - -Fast Host-based Packet Processing -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Software Defined Network (SDN) trends are demanding fast host-based packet handling. -In a virtualization environment, -the DPDK VF PMD driver performs the same throughput result as a non-VT native environment. - -With such host instance fast packet processing, lots of services such as filtering, QoS, -DPI can be offloaded on the host fast path. - -Figure 12 shows the scenario where some VMs directly communicate externally via a VFs, -while others connect to a virtual switch and share the same uplink bandwidth. - -.. _pg_figure_12: - -**Figure 12. Fast Host-based Packet Processing** - -.. image26_png has been renamed - -|fast_pkt_proc| - -SR-IOV (PF/VF) Approach for Inter-VM Communication --------------------------------------------------- - -Inter-VM data communication is one of the traffic bottle necks in virtualization platforms. -SR-IOV device assignment helps a VM to attach the real device, taking advantage of the bridge in the NIC. -So VF-to-VF traffic within the same physical port (VM0<->VM1) have hardware acceleration. -However, when VF crosses physical ports (VM0<->VM2), there is no such hardware bridge. -In this case, the DPDK PMD PF driver provides host forwarding between such VMs. - -Figure 13 shows an example. -In this case an update of the MAC address lookup tables in both the NIC and host DPDK application is required. - -In the NIC, writing the destination of a MAC address belongs to another cross device VM to the PF specific pool. -So when a packet comes in, its destination MAC address will match and forward to the host DPDK PMD application. - -In the host DPDK application, the behavior is similar to L2 forwarding, -that is, the packet is forwarded to the correct PF pool. -The SR-IOV NIC switch forwards the packet to a specific VM according to the MAC destination address -which belongs to the destination VF on the VM. - -.. _pg_figure_13: - -**Figure 13. Inter-VM Communication** - -.. image27_png has been renamed - -|inter_vm_comms| - -.. |perf_benchmark| image:: img/perf_benchmark.* - -.. |single_port_nic| image:: img/single_port_nic.* - -.. |inter_vm_comms| image:: img/inter_vm_comms.* - -.. |fast_pkt_proc| image:: img/fast_pkt_proc.* diff --git a/doc/guides/prog_guide/img/console.png b/doc/guides/prog_guide/img/console.png deleted file mode 100644 index 99423340d5..0000000000 Binary files a/doc/guides/prog_guide/img/console.png and /dev/null differ diff --git a/doc/guides/prog_guide/img/fast_pkt_proc.png b/doc/guides/prog_guide/img/fast_pkt_proc.png deleted file mode 100644 index 48d57e5c40..0000000000 Binary files a/doc/guides/prog_guide/img/fast_pkt_proc.png and /dev/null differ diff --git a/doc/guides/prog_guide/img/forward_stats.png b/doc/guides/prog_guide/img/forward_stats.png deleted file mode 100644 index 23e35325c1..0000000000 Binary files a/doc/guides/prog_guide/img/forward_stats.png and /dev/null differ diff --git a/doc/guides/prog_guide/img/host_vm_comms.png b/doc/guides/prog_guide/img/host_vm_comms.png deleted file mode 100644 index 4e0b3c96c9..0000000000 Binary files a/doc/guides/prog_guide/img/host_vm_comms.png and /dev/null differ diff --git a/doc/guides/prog_guide/img/host_vm_comms_qemu.png b/doc/guides/prog_guide/img/host_vm_comms_qemu.png deleted file mode 100644 index 391a4eac27..0000000000 Binary files a/doc/guides/prog_guide/img/host_vm_comms_qemu.png and /dev/null differ diff --git a/doc/guides/prog_guide/img/inter_vm_comms.png b/doc/guides/prog_guide/img/inter_vm_comms.png deleted file mode 100644 index 6d85ece737..0000000000 Binary files a/doc/guides/prog_guide/img/inter_vm_comms.png and /dev/null differ diff --git a/doc/guides/prog_guide/img/perf_benchmark.png b/doc/guides/prog_guide/img/perf_benchmark.png deleted file mode 100644 index aba818c300..0000000000 Binary files a/doc/guides/prog_guide/img/perf_benchmark.png and /dev/null differ diff --git a/doc/guides/prog_guide/img/single_port_nic.png b/doc/guides/prog_guide/img/single_port_nic.png deleted file mode 100644 index 8f39d73b25..0000000000 Binary files a/doc/guides/prog_guide/img/single_port_nic.png and /dev/null differ diff --git a/doc/guides/prog_guide/img/vm_vm_comms.png b/doc/guides/prog_guide/img/vm_vm_comms.png deleted file mode 100644 index 2bf1cd2729..0000000000 Binary files a/doc/guides/prog_guide/img/vm_vm_comms.png and /dev/null differ diff --git a/doc/guides/prog_guide/img/vmxnet3_int.png b/doc/guides/prog_guide/img/vmxnet3_int.png deleted file mode 100644 index 6541feba73..0000000000 Binary files a/doc/guides/prog_guide/img/vmxnet3_int.png and /dev/null differ diff --git a/doc/guides/prog_guide/img/vswitch_vm.png b/doc/guides/prog_guide/img/vswitch_vm.png deleted file mode 100644 index ac817aaaed..0000000000 Binary files a/doc/guides/prog_guide/img/vswitch_vm.png and /dev/null differ diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst index b263d318e0..a9966a08b4 100644 --- a/doc/guides/prog_guide/index.rst +++ b/doc/guides/prog_guide/index.rst @@ -48,14 +48,8 @@ Programmer's Guide mempool_lib mbuf_lib poll_mode_drv - i40e_ixgbe_igb_virt_func_drv - driver_vm_emul_dev ivshmem_lib - poll_mode_drv_emulated_virtio_nic - poll_mode_drv_paravirtual_vmxnets_nic - libpcap_ring_based_poll_mode_drv link_bonding_poll_mode_drv_lib - mlx4_poll_mode_drv timer_lib hash_lib lpm_lib @@ -104,18 +98,6 @@ Programmer's Guide :ref:`Figure 9. An mbuf with Three Segments ` -:ref:`Figure 10. Virtualization for a Single Port NIC in SR-IOV Mode ` - -:ref:`Figure 11. Performance Benchmark Setup ` - -:ref:`Figure 12. Fast Host-based Packet Processing ` - -:ref:`Figure 13. Inter-VM Communication ` - -:ref:`Figure 14. Host2VM Communication Example Using kni vhost Back End ` - -:ref:`Figure 15. Host2VM Communication Example Using qemu vhost Back End ` - :ref:`Figure 16. Memory Sharing inthe Intel® DPDK Multi-process Sample Application ` :ref:`Figure 17. Components of an Intel® DPDK KNI Application ` diff --git a/doc/guides/prog_guide/libpcap_ring_based_poll_mode_drv.rst b/doc/guides/prog_guide/libpcap_ring_based_poll_mode_drv.rst deleted file mode 100644 index 6e14fd199a..0000000000 --- a/doc/guides/prog_guide/libpcap_ring_based_poll_mode_drv.rst +++ /dev/null @@ -1,271 +0,0 @@ -.. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of Intel Corporation nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -Libpcap and Ring Based Poll Mode Drivers -======================================== - -In addition to Poll Mode Drivers (PMDs) for physical and virtual hardware, -the DPDK also includes two pure-software PMDs. These two drivers are: - -* A libpcap -based PMD (librte_pmd_pcap) that reads and writes packets using libpcap, - - both from files on disk, as well as from physical NIC devices using standard Linux kernel drivers. - -* A ring-based PMD (librte_pmd_ring) that allows a set of software FIFOs (that is, rte_ring) - to be accessed using the PMD APIs, as though they were physical NICs. - -.. note:: - - The libpcap -based PMD is disabled by default in the build configuration files, - owing to an external dependency on the libpcap development files which must be installed on the board. - Once the libpcap development files are installed, - the library can be enabled by setting CONFIG_RTE_LIBRTE_PMD_PCAP=y and recompiling the Intel® DPDK. - -Using the Drivers from the EAL Command Line -------------------------------------------- - -For ease of use, the DPDK EAL also has been extended to allow pseudo-ethernet devices, -using one or more of these drivers, -to be created at application startup time during EAL initialization. - -To do so, the --vdev= parameter must be passed to the EAL. -This takes take options to allow ring and pcap-based Ethernet to be allocated and used transparently by the application. -This can be used, for example, for testing on a virtual machine where there are no Ethernet ports. - -Libpcap-based PMD -~~~~~~~~~~~~~~~~~ - -Pcap-based devices can be created using the virtual device --vdev option. -The device name must start with the eth_pcap prefix followed by numbers or letters. -The name is unique for each device. Each device can have multiple stream options and multiple devices can be used. -Multiple device definitions can be arranged using multiple --vdev. -Device name and stream options must be separated by commas as shown below: - -.. code-block:: console - - $RTE_TARGET/app/testpmd -c f -n 4 --vdev 'eth_pcap0,stream_opt0=..,stream_opt1=..' --vdev='eth_pcap1,stream_opt0=..' - -Device Streams -^^^^^^^^^^^^^^ - -Multiple ways of stream definitions can be assessed and combined as long as the following two rules are respected: - -* A device is provided with two different streams - reception and transmission. - -* A device is provided with one network interface name used for reading and writing packets. - -The different stream types are: - -* rx_pcap: Defines a reception stream based on a pcap file. - The driver reads each packet within the given pcap file as if it was receiving it from the wire. - The value is a path to a valid pcap file. - - rx_pcap=/path/to/file.pcap - -* tx_pcap: Defines a transmission stream based on a pcap file. - The driver writes each received packet to the given pcap file. - The value is a path to a pcap file. - The file is overwritten if it already exists and it is created if it does not. - - tx_pcap=/path/to/file.pcap - -* rx_iface: Defines a reception stream based on a network interface name. - The driver reads packets coming from the given interface using the Linux kernel driver for that interface. - The value is an interface name. - - rx_iface=eth0 - -* tx_iface: Defines a transmission stream based on a network interface name. - The driver sends packets to the given interface using the Linux kernel driver for that interface. - The value is an interface name. - - tx_iface=eth0 - -* iface: Defines a device mapping a network interface. - The driver both reads and writes packets from and to the given interface. - The value is an interface name. - - iface=eth0 - -Examples of Usage -^^^^^^^^^^^^^^^^^ - -Read packets from one pcap file and write them to another: - -.. code-block:: console - - $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' -- --port-topology=chained - -Read packets from a network interface and write them to a pcap file: - -.. code-block:: console - - $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_iface=eth0,tx_pcap=/path/to/file_tx.pcap' -- --port-topology=chained - -Read packets from a pcap file and write them to a network interface: - -.. code-block:: console - - $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_iface=eth1' -- --port-topology=chained - -Forward packets through two network interfaces: - -.. code-block:: console - - $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,iface=eth0' --vdev='eth_pcap1;iface=eth1' - -Using libpcap-based PMD with the testpmd Application -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -One of the first things that testpmd does before starting to forward packets is to flush the RX streams -by reading the first 512 packets on every RX stream and discarding them. -When using a libpcap-based PMD this behavior can be turned off using the following command line option: - -.. code-block:: console - - --no-flush-rx - -It is also available in the runtime command line: - -.. code-block:: console - - set flush_rx on/off - -It is useful for the case where the rx_pcap is being used and no packets are meant to be discarded. -Otherwise, the first 512 packets from the input pcap file will be discarded by the RX flushing operation. - -.. code-block:: console - - $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' -- --port-topology=chained --no-flush-rx - - -Rings-based PMD -~~~~~~~~~~~~~~~ - -To run a DPDK application on a machine without any Ethernet devices, a pair of ring-based rte_ethdevs can be used as below. -The device names passed to the --vdev option must start with eth_ring and take no additional parameters. -Multiple devices may be specified, separated by commas. - -.. code-block:: console - - ./testpmd -c E -n 4 --vdev=eth_ring0 --vdev=eth_ring1 -- -i - EAL: Detected lcore 1 as core 1 on socket 0 - ... - - Interactive-mode selected - Configuring Port 0 (socket 0) - Configuring Port 1 (socket 0) - Checking link statuses... - Port 0 Link Up - speed 10000 Mbps - full-duplex - Port 1 Link Up - speed 10000 Mbps - full-duplex - Done - - testpmd> start tx_first - io packet forwarding - CRC stripping disabled - packets/burst=16 - nb forwarding cores=1 - nb forwarding ports=2 - RX queues=1 - RX desc=128 - RX free threshold=0 - RX threshold registers: pthresh=8 hthresh=8 wthresh=4 - TX queues=1 - TX desc=512 - TX free threshold=0 - TX threshold registers: pthresh=36 hthresh=0 wthresh=0 - TX RS bit threshold=0 - TXQ flags=0x0 - - testpmd> stop - Telling cores to stop... - Waiting for lcores to finish... - -.. image38_png has been renamed - -|forward_stats| - -.. code-block:: console - - +++++++++++++++ Accumulated forward statistics for allports++++++++++ - RX-packets: 462384736 RX-dropped: 0 RX-total: 462384736 - TX-packets: 462384768 TX-dropped: 0 TX-total: 462384768 - +++++++++++++++++++++++++++++++++++++++++++++++++++++ - - Done. - - -Using the Poll Mode Driver from an Application -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Both drivers can provide similar APIs to allow the user to create a PMD, that is, -rte_ethdev structure, instances at run-time in the end-application, -for example, using rte_eth_from_rings() or rte_eth_from_pcaps() APIs. -For the rings- based PMD, this functionality could be used, for example, -to allow data exchange between cores using rings to be done in exactly the -same way as sending or receiving packets from an Ethernet device. -For the libpcap-based PMD, it allows an application to open one or more pcap files -and use these as a source of packet input to the application. - -Usage Examples -^^^^^^^^^^^^^^ - -To create two pseudo-ethernet ports where all traffic sent to a port is looped back -for reception on the same port (error handling omitted for clarity): - -.. code-block:: c - - struct rte_ring *r1, *r2; - int port1, port2; - - r1 = rte_ring_create("R1", 256, SOCKET0,RING_F_SP_ENQ|RING_F_SC_DEQ); - r2 = rte_ring_create("R2", 256, SOCKET0, RING_F_SP_ENQ|RING_F_SC_DEQ); - - /* create an ethdev where RX and TX are done to/from r1, and * another from r2 */ - - port1 = rte_eth_from_rings(r1, 1, r1, 1, SOCKET0); - port2 = rte_eth_from_rings(r2, 1, r2, 1, SOCKET0); - - -To create two pseudo-Ethernet ports where the traffic is switched between them, -that is, traffic sent to port 1 is read back from port 2 and vice-versa, -the final two lines could be changed as below: - -.. code-block:: c - - port1 = rte_eth_from_rings(r1, 1, r2, 1, SOCKET0); - port2 = rte_eth_from_rings(r2, 1, r1, 1, SOCKET0); - -This type of configuration could be useful in a pipeline model, for example, -where one may want to have inter-core communication using pseudo Ethernet devices rather than raw rings, -for reasons of API consistency. - -Enqueuing and dequeuing items from an rte_ring using the rings-based PMD may be slower than using the native rings API. -This is because DPDK Ethernet drivers make use of function pointers to call the appropriate enqueue or dequeue functions, -while the rte_ring specific functions are direct function calls in the code and are often inlined by the compiler. - - Once an ethdev has been created, for either a ring or a pcap-based PMD, - it should be configured and started in the same way as a regular Ethernet device, that is, - by calling rte_eth_dev_configure() to set the number of receive and transmit queues, - then calling rte_eth_rx_queue_setup() / tx_queue_setup() for each of those queues and - finally calling rte_eth_dev_start() to allow transmission and reception of packets to begin. - -.. |forward_stats| image:: img/forward_stats.* diff --git a/doc/guides/prog_guide/mlx4_poll_mode_drv.rst b/doc/guides/prog_guide/mlx4_poll_mode_drv.rst deleted file mode 100644 index b26c2198f9..0000000000 --- a/doc/guides/prog_guide/mlx4_poll_mode_drv.rst +++ /dev/null @@ -1,315 +0,0 @@ -.. BSD LICENSE - Copyright 2012-2015 6WIND S.A. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of 6WIND S.A. nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -MLX4 poll mode driver library -============================= - -The MLX4 poll mode driver library (**librte_pmd_mlx4**) implements support -for **Mellanox ConnectX-3** 10/40 Gbps adapters (EN 40, EN 10, Pro EN 40) as -well as their virtual functions (VF) in SR-IOV context. - -.. note:: - - Due to external dependencies, this driver is disabled by default. It must - be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX4_PMD=y`` and - recompiling DPDK. - -Implementation details ----------------------- - -Most Mellanox ConnectX-3 devices provide two ports but expose a single PCI -bus address, thus unlike most drivers, librte_pmd_mlx4 registers itself as a -PCI driver that allocates one Ethernet device per detected port. - -For this reason, one cannot white/blacklist a single port without also -white/blacklisting the others on the same device. - -Besides its dependency on libibverbs (that implies libmlx4 and associated -kernel support), librte_pmd_mlx4 relies heavily on system calls for control -operations such as querying/updating the MTU and flow control parameters. - -For security reasons and robustness, this driver only deals with virtual -memory addresses. The way resources allocations are handled by the kernel -combined with hardware specifications that allow it to handle virtual memory -addresses directly ensure that DPDK applications cannot access random -physical memory (or memory that does not belong to the current process). - -This capability allows the PMD to coexist with kernel network interfaces -which remain functional, although they stop receiving unicast packets as -long as they share the same MAC address. - -Compiling librte_pmd_mlx4 causes DPDK to be linked against libibverbs. - -Features and limitations ------------------------- - -- RSS, also known as RCA, is supported. In this mode the number of - configured RX queues must be a power of two. -- VLAN filtering is supported. -- Link state information is provided. -- Promiscuous mode is supported. -- All multicast mode is supported. -- Multiple MAC addresses (unicast, multicast) can be configured. -- Scattered packets are supported for TX and RX. - -.. - -- RSS hash key cannot be modified. -- Hardware counters are not implemented (they are software counters). -- Checksum offloads are not supported yet. - -Configuration -------------- - -Compilation options -~~~~~~~~~~~~~~~~~~~ - -- ``CONFIG_RTE_LIBRTE_MLX4_PMD`` (default **n**) - - Toggle compilation of librte_pmd_mlx4 itself. - -- ``CONFIG_RTE_LIBRTE_MLX4_DEBUG`` (default **n**) - - Toggle debugging code and stricter compilation flags. Enabling this option - adds additional run-time checks and debugging messages at the cost of - lower performance. - -- ``CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N`` (default **4**) - - Number of scatter/gather elements (SGEs) per work request (WR). Lowering - this number improves performance but also limits the ability to receive - scattered packets (packets that do not fit a single mbuf). The default - value is a safe tradeoff. - -- ``CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE`` (default **0**) - - Amount of data to be inlined during TX operations. Improves latency but - lowers throughput. - -- ``CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE`` (default **8**) - - Maximum number of cached memory pools (MPs) per TX queue. Each MP from - which buffers are to be transmitted must be associated to memory regions - (MRs). This is a slow operation that must be cached. - - This value is always 1 for RX queues since they use a single MP. - -- ``CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS`` (default **1**) - - Toggle software counters. No counters are available if this option is - disabled since hardware counters are not supported. - -Environment variables -~~~~~~~~~~~~~~~~~~~~~ - -- ``MLX4_INLINE_RECV_SIZE`` - - A nonzero value enables inline receive for packets up to that size. May - significantly improve performance in some cases but lower it in - others. Requires careful testing. - -Run-time configuration -~~~~~~~~~~~~~~~~~~~~~~ - -- The only constraint when RSS mode is requested is to make sure the number - of RX queues is a power of two. This is a hardware requirement. - -- librte_pmd_mlx4 brings kernel network interfaces up during initialization - because it is affected by their state. Forcing them down prevents packets - reception. - -- **ethtool** operations on related kernel interfaces also affect the PMD. - -Prerequisites -------------- - -This driver relies on external libraries and kernel drivers for resources -allocations and initialization. The following dependencies are not part of -DPDK and must be installed separately: - -- **libibverbs** - - User space verbs framework used by librte_pmd_mlx4. This library provides - a generic interface between the kernel and low-level user space drivers - such as libmlx4. - - It allows slow and privileged operations (context initialization, hardware - resources allocations) to be managed by the kernel and fast operations to - never leave user space. - -- **libmlx4** - - Low-level user space driver library for Mellanox ConnectX-3 devices, - it is automatically loaded by libibverbs. - - This library basically implements send/receive calls to the hardware - queues. - -- **Kernel modules** (mlnx-ofed-kernel) - - They provide the kernel-side verbs API and low level device drivers that - manage actual hardware initialization and resources sharing with user - space processes. - - Unlike most other PMDs, these modules must remain loaded and bound to - their devices: - - - mlx4_core: hardware driver managing Mellanox ConnectX-3 devices. - - mlx4_en: Ethernet device driver that provides kernel network interfaces. - - mlx4_ib: InifiniBand device driver. - - ib_uverbs: user space driver for verbs (entry point for libibverbs). - -While these libraries and kernel modules are available on OpenFabrics -Aliance's `website `_ and provided by package -managers on most distributions, this PMD requires Ethernet extensions that -may not be supported at the moment (this is a work in progress). - -`Mellanox OFED -`_ -includes the necessary support and should be used in the meantime. For DPDK, -only libibverbs, libmlx4 and mlnx-ofed-kernel packages are required from -that distribution. - -.. note:: - - Both libraries are BSD and GPL licensed. Linux kernel modules are GPL - licensed. - -Usage example -------------- - -This section demonstrates how to launch **testpmd** with Mellanox ConnectX-3 -devices managed by librte_pmd_mlx4. - -#. Load the kernel modules: - - .. code-block:: console - - modprobe -a ib_uverbs mlx4_en mlx4_core mlx4_ib - - .. note:: - - User space I/O kernel modules (uio and igb_uio) are not used and do - not have to be loaded. - -#. Make sure Ethernet interfaces are in working order and linked to kernel - verbs. Related sysfs entries should be present: - - .. code-block:: console - - ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 - - Example output: - - .. code-block:: console - - eth2 - eth3 - eth4 - eth5 - -#. Optionally, retrieve their PCI bus addresses for whitelisting: - - .. code-block:: console - - { - for intf in eth2 eth3 eth4 eth5; - do - (cd "/sys/class/net/${intf}/device/" && pwd -P); - done; - } | - sed -n 's,.*/\(.*\),-w \1,p' - - Example output: - - .. code-block:: console - - -w 0000:83:00.0 - -w 0000:83:00.0 - -w 0000:84:00.0 - -w 0000:84:00.0 - - .. note:: - - There are only two distinct PCI bus addresses because the Mellanox - ConnectX-3 adapters installed on this system are dual port. - -#. Request huge pages: - - .. code-block:: console - - echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages - -#. Start testpmd with basic parameters: - - .. code-block:: console - - testpmd -c 0xff00 -n 4 -w 0000:83:00.0 -w 0000:84:00.0 -- --rxq=2 --txq=2 -i - - Example output: - - .. code-block:: console - - [...] - EAL: PCI device 0000:83:00.0 on NUMA socket 1 - EAL: probe driver: 15b3:1007 librte_pmd_mlx4 - PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false) - PMD: librte_pmd_mlx4: 2 port(s) detected - PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:b7:50 - PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:b7:51 - EAL: PCI device 0000:84:00.0 on NUMA socket 1 - EAL: probe driver: 15b3:1007 librte_pmd_mlx4 - PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_1" (VF: false) - PMD: librte_pmd_mlx4: 2 port(s) detected - PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:ba:b0 - PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:ba:b1 - Interactive-mode selected - Configuring Port 0 (socket 0) - PMD: librte_pmd_mlx4: 0x867d60: TX queues number update: 0 -> 2 - PMD: librte_pmd_mlx4: 0x867d60: RX queues number update: 0 -> 2 - Port 0: 00:02:C9:B5:B7:50 - Configuring Port 1 (socket 0) - PMD: librte_pmd_mlx4: 0x867da0: TX queues number update: 0 -> 2 - PMD: librte_pmd_mlx4: 0x867da0: RX queues number update: 0 -> 2 - Port 1: 00:02:C9:B5:B7:51 - Configuring Port 2 (socket 0) - PMD: librte_pmd_mlx4: 0x867de0: TX queues number update: 0 -> 2 - PMD: librte_pmd_mlx4: 0x867de0: RX queues number update: 0 -> 2 - Port 2: 00:02:C9:B5:BA:B0 - Configuring Port 3 (socket 0) - PMD: librte_pmd_mlx4: 0x867e20: TX queues number update: 0 -> 2 - PMD: librte_pmd_mlx4: 0x867e20: RX queues number update: 0 -> 2 - Port 3: 00:02:C9:B5:BA:B1 - Checking link statuses... - Port 0 Link Up - speed 10000 Mbps - full-duplex - Port 1 Link Up - speed 40000 Mbps - full-duplex - Port 2 Link Up - speed 10000 Mbps - full-duplex - Port 3 Link Up - speed 40000 Mbps - full-duplex - Done - testpmd> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst index e9e1132fc1..b8b2d1aafa 100644 --- a/doc/guides/prog_guide/poll_mode_drv.rst +++ b/doc/guides/prog_guide/poll_mode_drv.rst @@ -288,155 +288,3 @@ Ethernet Device API ~~~~~~~~~~~~~~~~~~~ The Ethernet device API exported by the Ethernet PMDs is described in the *DPDK API Reference*. - -Vector PMD for IXGBE --------------------- - -Vector PMD uses Intel® SIMD instructions to optimize packet I/O. -It improves load/store bandwidth efficiency of L1 data cache by using a wider SSE/AVX register 1 (1). -The wider register gives space to hold multiple packet buffers so as to save instruction number when processing bulk of packets. - -There is no change to PMD API. The RX/TX handler are the only two entries for vPMD packet I/O. -They are transparently registered at runtime RX/TX execution if all condition checks pass. - -1. To date, only an SSE version of IX GBE vPMD is available. - To ensure that vPMD is in the binary code, ensure that the option CONFIG_RTE_IXGBE_INC_VECTOR=y is in the configure file. - -Some constraints apply as pre-conditions for specific optimizations on bulk packet transfers. -The following sections explain RX and TX constraints in the vPMD. - -RX Constraints -~~~~~~~~~~~~~~ - -Prerequisites and Pre-conditions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The following prerequisites apply: - -* To enable vPMD to work for RX, bulk allocation for Rx must be allowed. - -* The RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y configuration MACRO must be set before compiling the code. - -Ensure that the following pre-conditions are satisfied: - -* rxq->rx_free_thresh >= RTE_PMD_IXGBE_RX_MAX_BURST - -* rxq->rx_free_thresh < rxq->nb_rx_desc - -* (rxq->nb_rx_desc % rxq->rx_free_thresh) == 0 - -* rxq->nb_rx_desc < (IXGBE_MAX_RING_DESC - RTE_PMD_IXGBE_RX_MAX_BURST) - -These conditions are checked in the code. - -Scattered packets are not supported in this mode. -If an incoming packet is greater than the maximum acceptable length of one "mbuf" data size (by default, the size is 2 KB), -vPMD for RX would be disabled. - -By default, IXGBE_MAX_RING_DESC is set to 4096 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32. - -Feature not Supported by RX Vector PMD -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Some features are not supported when trying to increase the throughput in vPMD. -They are: - -* IEEE1588 - -* FDIR - -* Header split - -* RX checksum off load - -Other features are supported using optional MACRO configuration. They include: - -* HW VLAN strip - -* HW extend dual VLAN - -* Enabled by RX_OLFLAGS (RTE_IXGBE_RX_OLFLAGS_DISABLE=n) - - -To guarantee the constraint, configuration flags in dev_conf.rxmode will be checked: - -* hw_vlan_strip - -* hw_vlan_extend - -* hw_ip_checksum - -* header_split - -* dev_conf - -fdir_conf->mode will also be checked. - -RX Burst Size -^^^^^^^^^^^^^ - -As vPMD is focused on high throughput, it assumes that the RX burst size is equal to or greater than 32 per burst. -It returns zero if using nb_pkt < 32 as the expected packet number in the receive handler. - -TX Constraint -~~~~~~~~~~~~~ - -Prerequisite -^^^^^^^^^^^^ - -The only prerequisite is related to tx_rs_thresh. -The tx_rs_thresh value must be greater than or equal to RTE_PMD_IXGBE_TX_MAX_BURST, -but less or equal to RTE_IXGBE_TX_MAX_FREE_BUF_SZ. -Consequently, by default the tx_rs_thresh value is in the range 32 to 64. - -Feature not Supported by RX Vector PMD -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -TX vPMD only works when txq_flags is set to IXGBE_SIMPLE_FLAGS. - -This means that it does not support TX multi-segment, VLAN offload and TX csum offload. -The following MACROs are used for these three features: - -* ETH_TXQ_FLAGS_NOMULTSEGS - -* ETH_TXQ_FLAGS_NOVLANOFFL - -* ETH_TXQ_FLAGS_NOXSUMSCTP - -* ETH_TXQ_FLAGS_NOXSUMUDP - -* ETH_TXQ_FLAGS_NOXSUMTCP - - -Sample Application Notes -~~~~~~~~~~~~~~~~~~~~~~~~ - -testpmd -^^^^^^^ - -By default, using CONFIG_RTE_IXGBE_RX_OLFLAGS_DISABLE=n: - -.. code-block:: console - - ./x86_64-native-linuxapp-gcc/app/testpmd -c 300 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01 - -When CONFIG_RTE_IXGBE_RX_OLFLAGS_DISABLE=y, better performance can be achieved: - -.. code-block:: console - - ./x86_64-native-linuxapp-gcc/app/testpmd -c 300 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01 --disable-hw-vlan - -If scatter gather lists are not required, set CONFIG_RTE_MBUF_SCATTER_GATHER=n for better throughput. - -l3fwd -^^^^^ - -When running l3fwd with vPMD, there is one thing to note. -In the configuration, ensure that port_conf.rxmode.hw_ip_checksum=0. -Otherwise, by default, RX vPMD is disabled. - -load_balancer -^^^^^^^^^^^^^ - -As in the case of l3fwd, set configure port_conf.rxmode.hw_ip_checksum=0 to enable vPMD. -In addition, for improved performance, use -bsz "(32,32),(64,64),(32,32)" in load_balancer to avoid using the default burst size of 144. diff --git a/doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst b/doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst deleted file mode 100644 index 113a0cba56..0000000000 --- a/doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst +++ /dev/null @@ -1,221 +0,0 @@ -.. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of Intel Corporation nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -Poll Mode Driver for Emulated Virtio NIC -======================================== - -Virtio is a para-virtualization framework initiated by IBM, and supported by KVM hypervisor. -In the Data Plane Development Kit (DPDK), -we provide a virtio Poll Mode Driver (PMD) as a software solution, comparing to SRIOV hardware solution, -for fast guest VM to guest VM communication and guest VM to host communication. - -Vhost is a kernel acceleration module for virtio qemu backend. -The DPDK extends kni to support vhost raw socket interface, -which enables vhost to directly read/ write packets from/to a physical port. -With this enhancement, virtio could achieve quite promising performance. - -In future release, we will also make enhancement to vhost backend, -releasing peak performance of virtio PMD driver. - -For basic qemu-KVM installation and other Intel EM poll mode driver in guest VM, -please refer to Chapter "Driver for VM Emulated Devices". - -In this chapter, we will demonstrate usage of virtio PMD driver with two backends, -standard qemu vhost back end and vhost kni back end. - -Virtio Implementation in DPDK ------------------------------ - -For details about the virtio spec, refer to Virtio PCI Card Specification written by Rusty Russell. - -As a PMD, virtio provides packet reception and transmission callbacks virtio_recv_pkts and virtio_xmit_pkts. - -In virtio_recv_pkts, index in range [vq->vq_used_cons_idx , vq->vq_ring.used->idx) in vring is available for virtio to burst out. - -In virtio_xmit_pkts, same index range in vring is available for virtio to clean. -Virtio will enqueue to be transmitted packets into vring, advance the vq->vq_ring.avail->idx, -and then notify the host back end if necessary. - -Features and Limitations of virtio PMD --------------------------------------- - -In this release, the virtio PMD driver provides the basic functionality of packet reception and transmission. - -* It supports merge-able buffers per packet when receiving packets and scattered buffer per packet - when transmitting packets. The packet size supported is from 64 to 1518. - -* It supports multicast packets and promiscuous mode. - -* The descriptor number for the RX/TX queue is hard-coded to be 256 by qemu. - If given a different descriptor number by the upper application, - the virtio PMD generates a warning and fall back to the hard-coded value. - -* Features of mac/vlan filter are supported, negotiation with vhost/backend are needed to support them. - When backend can't support vlan filter, virtio app on guest should disable vlan filter to make sure - the virtio port is configured correctly. E.g. specify '--disable-hw-vlan' in testpmd command line. - -* RTE_PKTMBUF_HEADROOM should be defined larger than sizeof(struct virtio_net_hdr), which is 10 bytes. - -* Virtio does not support runtime configuration. - -* Virtio supports Link State interrupt. - -* Virtio supports software vlan stripping and inserting. - -* Virtio supports using port IO to get PCI resource when uio/igb_uio module is not available. - -Prerequisites -------------- - -The following prerequisites apply: - -* In the BIOS, turn VT-x and VT-d on - -* Linux kernel with KVM module; vhost module loaded and ioeventfd supported. - Qemu standard backend without vhost support isn't tested, and probably isn't supported. - -Virtio with kni vhost Back End ------------------------------- - -This section demonstrates kni vhost back end example setup for Phy-VM Communication. - -.. _pg_figure_14: - -**Figure 14. Host2VM Communication Example Using kni vhost Back End** - -.. image29_png has been renamed - -|host_vm_comms| - -Host2VM communication example - -#. Load the kni kernel module: - - .. code-block:: console - - insmod rte_kni.ko - - Other basic DPDK preparations like hugepage enabling, uio port binding are not listed here. - Please refer to the *DPDK Getting Started Guide* for detailed instructions. - -#. Launch the kni user application: - - .. code-block:: console - - examples/kni/build/app/kni -c 0xf -n 4 -- -p 0x1 -i 0x1 -o 0x2 - - This command generates one network device vEth0 for physical port. - If specify more physical ports, the generated network device will be vEth1, vEth2, and so on. - - For each physical port, kni creates two user threads. - One thread loops to fetch packets from the physical NIC port into the kni receive queue. - The other user thread loops to send packets in the kni transmit queue. - - For each physical port, kni also creates a kernel thread that retrieves packets from the kni receive queue, - place them onto kni's raw socket's queue and wake up the vhost kernel thread to exchange packets with the virtio virt queue. - - For more details about kni, please refer to Chapter 24 "Kernel NIC Interface". - -#. Enable the kni raw socket functionality for the specified physical NIC port, - get the generated file descriptor and set it in the qemu command line parameter. - Always remember to set ioeventfd_on and vhost_on. - - Example: - - .. code-block:: console - - echo 1 > /sys/class/net/vEth0/sock_en - fd=`cat /sys/class/net/vEth0/sock_fd` - exec qemu-system-x86_64 -enable-kvm -cpu host \ - -m 2048 -smp 4 -name dpdk-test1-vm1 \ - -drive file=/data/DPDKVMS/dpdk-vm.img \ - -netdev tap, fd=$fd,id=mynet_kni, script=no,vhost=on \ - -device virtio-net-pci,netdev=mynet_kni,bus=pci.0,addr=0x3,ioeventfd=on \ - -vnc:1 -daemonize - - In the above example, virtio port 0 in the guest VM will be associated with vEth0, which in turns corresponds to a physical port, - which means received packets come from vEth0, and transmitted packets is sent to vEth0. - -#. In the guest, bind the virtio device to the uio_pci_generic kernel module and start the forwarding application. - When the virtio port in guest bursts rx, it is getting packets from the raw socket's receive queue. - When the virtio port bursts tx, it is sending packet to the tx_q. - - .. code-block:: console - - modprobe uio - echo 512 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages - modprobe uio_pci_generic - python tools/dpdk_nic_bind.py -b uio_pci_generic 00:03.0 - - We use testpmd as the forwarding application in this example. - - .. image30_png has been renamed - - |console| - -#. Use IXIA packet generator to inject a packet stream into the KNI physical port. - - The packet reception and transmission flow path is: - - IXIA packet generator->82599 PF->KNI rx queue->KNI raw socket queue->Guest VM virtio port 0 rx burst->Guest VM virtio port 0 tx burst-> KNI tx queue->82599 PF-> IXIA packet generator - -Virtio with qemu virtio Back End --------------------------------- - -.. _pg_figure_15: - -**Figure 15. Host2VM Communication Example Using qemu vhost Back End** - - .. image31_png has been renamed - - |host_vm_comms_qemu| - -.. code-block:: console - - qemu-system-x86_64 -enable-kvm -cpu host -m 2048 -smp 2 -mem-path /dev/ - hugepages -mem-prealloc - -drive file=/data/DPDKVMS/dpdk-vm1 - -netdev tap,id=vm1_p1,ifname=tap0,script=no,vhost=on - -device virtio-net-pci,netdev=vm1_p1,bus=pci.0,addr=0x3,ioeventfd=on - -device pci-assign,host=04:10.1 \ - -In this example, the packet reception flow path is: - - IXIA packet generator->82599 PF->Linux Bridge->TAP0's socket queue-> Guest VM virtio port 0 rx burst-> Guest VM 82599 VF port1 tx burst-> IXIA packet generator - -The packet transmission flow is: - - IXIA packet generator-> Guest VM 82599 VF port1 rx burst-> Guest VM virtio port 0 tx burst-> tap -> Linux Bridge->82599 PF-> IXIA packet generator - -.. |host_vm_comms| image:: img/host_vm_comms.* - -.. |console| image:: img/console.* - -.. |host_vm_comms_qemu| image:: img/host_vm_comms_qemu.* diff --git a/doc/guides/prog_guide/poll_mode_drv_paravirtual_vmxnets_nic.rst b/doc/guides/prog_guide/poll_mode_drv_paravirtual_vmxnets_nic.rst deleted file mode 100644 index 04842c9646..0000000000 --- a/doc/guides/prog_guide/poll_mode_drv_paravirtual_vmxnets_nic.rst +++ /dev/null @@ -1,184 +0,0 @@ -.. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of Intel Corporation nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -Poll Mode Driver for Paravirtual VMXNET3 NIC -============================================ - -The VMXNET3 adapter is the next generation of a paravirtualized NIC, introduced by VMware* ESXi. -It is designed for performance and is not related to VMXNET or VMXENET2. -It offers all the features available in VMXNET2, and adds several new features such as, -multi-queue support (also known as Receive Side Scaling, RSS), -IPv6 offloads, and MSI/MSI-X interrupt delivery. -Because operating system vendors do not provide built-in drivers for this card, -VMware Tools must be installed to have a driver for the VMXNET3 network adapter available. -One can use the same device in a DPDK application with VMXNET3 PMD introduced in DPDK API. - -Currently, the driver provides basic support for using the device in a DPDK application running on a guest OS. -Optimization is needed on the backend, that is, the VMware* ESXi vmkernel switch, to achieve optimal performance end-to-end. - -In this chapter, two setups with the use of the VMXNET3 PMD are demonstrated: - -#. Vmxnet3 with a native NIC connected to a vSwitch - -#. Vmxnet3 chaining VMs connected to a vSwitch - -VMXNET3 Implementation in the DPDK ----------------------------------- - -For details on the VMXNET3 device, refer to the VMXNET3 driver's vmxnet3 directory and support manual from VMware*. - -For performance details, refer to the following link from VMware: - -`http://www.vmware.com/pdf/vsp_4_vmxnet3_perf.pdf `_ - -As a PMD, the VMXNET3 driver provides the packet reception and transmission callbacks, vmxnet3_recv_pkts and vmxnet3_xmit_pkts. -It does not support scattered packet reception as part of vmxnet3_recv_pkts and vmxnet3_xmit_pkts. -Also, it does not support scattered packet reception as part of the device operations supported. - -The VMXNET3 PMD handles all the packet buffer memory allocation and resides in guest address space -and it is solely responsible to free that memory when not needed. -The packet buffers and features to be supported are made available to hypervisor via VMXNET3 PCI configuration space BARs. -During RX/TX, the packet buffers are exchanged by their GPAs, -and the hypervisor loads the buffers with packets in the RX case and sends packets to vSwitch in the TX case. - -The VMXNET3 PMD is compiled with vmxnet3 device headers. -The interface is similar to that of the other PMDs available in the DPDK API. -The driver pre-allocates the packet buffers and loads the command ring descriptors in advance. -The hypervisor fills those packet buffers on packet arrival and write completion ring descriptors, -which are eventually pulled by the PMD. -After reception, the DPDK application frees the descriptors and loads new packet buffers for the coming packets. -The interrupts are disabled and there is no notification required. -This keeps performance up on the RX side, even though the device provides a notification feature. - -In the transmit routine, the DPDK application fills packet buffer pointers in the descriptors of the command ring -and notifies the hypervisor. -In response the hypervisor takes packets and passes them to the vSwitch. It writes into the completion descriptors ring. -The rings are read by the PMD in the next transmit routine call and the buffers and descriptors are freed from memory. - -Features and Limitations of VMXNET3 PMD ---------------------------------------- - -In release 1.6.0, the VMXNET3 PMD provides the basic functionality of packet reception and transmission. -There are several options available for filtering packets at VMXNET3 device level including: - -#. MAC Address based filtering: - - * Unicast, Broadcast, All Multicast modes - SUPPORTED BY DEFAULT - - * Multicast with Multicast Filter table - NOT SUPPORTED - - * Promiscuous mode - SUPPORTED - - * RSS based load balancing between queues - SUPPORTED - -#. VLAN filtering: - - * VLAN tag based filtering without load balancing - SUPPORTED - -.. note:: - - - * Release 1.6.0 does not support separate headers and body receive cmd_ring and hence, - multiple segment buffers are not supported. - Only cmd_ring_0 is used for packet buffers, one for each descriptor. - - * Receive and transmit of scattered packets is not supported. - - * Multicast with Multicast Filter table is not supported. - -Prerequisites -------------- - -The following prerequisites apply: - -* Before starting a VM, a VMXNET3 interface to a VM through VMware vSphere Client must be assigned. - This is shown in the figure below. - -.. image32_png has been renamed - -|vmxnet3_int| - -.. note:: - - Depending on the Virtual Machine type, the VMware vSphere Client shows Ethernet adaptors while adding an Ethernet device. - Ensure that the VM type used offers a VMXNET3 device. Refer to the VMware documentation for a listed of VMs. - -.. note:: - - Follow the *DPDK Getting Started Guide* to setup the basic DPDK environment. - -.. note:: - - Follow the *DPDK Sample Application's User Guide*, L2 Forwarding/L3 Forwarding and - TestPMD for instructions on how to run a DPDK application using an assigned VMXNET3 device. - -VMXNET3 with a Native NIC Connected to a vSwitch ------------------------------------------------- - -This section describes an example setup for Phy-vSwitch-VM-Phy communication. - -.. image33_png has been renamed - -|vswitch_vm| - -.. note:: - - Other instructions on preparing to use DPDK such as, hugepage enabling, uio port binding are not listed here. - Please refer to *DPDK Getting Started Guide and DPDK Sample Application's User Guide* for detailed instructions. - -The packet reception and transmission flow path is: - - Packet generator -> 82576 -> VMware ESXi vSwitch -> VMXNET3 device -> Guest VM VMXNET3 port 0 rx burst -> Guest - VM 82599 VF port 0 tx burst -> 82599 VF -> Packet generator - -VMXNET3 Chaining VMs Connected to a vSwitch -------------------------------------------- - -The following figure shows an example VM-to-VM communication over a Phy-VM-vSwitch-VM-Phy communication channel. - -.. image34_png has been renamed - -|vm_vm_comms| - -.. note:: - - When using the L2 Forwarding or L3 Forwarding applications, - a destination MAC address needs to be written in packets to hit the other VM's VMXNET3 interface. - -In this example, the packet flow path is: - - Packet generator -> 82599 VF -> Guest VM 82599 port 0 rx burst -> Guest VM VMXNET3 port 1 tx burst -> VMXNET3 - device -> VMware ESXi vSwitch -> VMXNET3 device -> Guest VM VMXNET3 port 0 rx burst -> Guest VM 82599 VF port 1 tx burst -> 82599 VF -> Packet generator - -.. |vm_vm_comms| image:: img/vm_vm_comms.* - -.. |vmxnet3_int| image:: img/vmxnet3_int.* - -.. |vswitch_vm| image:: img/vswitch_vm.*