2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
5 Redistribution and use in source and binary forms, with or without
6 modification, are permitted provided that the following conditions
9 * Redistributions of source code must retain the above copyright
10 notice, this list of conditions and the following disclaimer.
11 * Redistributions in binary form must reproduce the above copyright
12 notice, this list of conditions and the following disclaimer in
13 the documentation and/or other materials provided with the
15 * Neither the name of Intel Corporation nor the names of its
16 contributors may be used to endorse or promote products derived
17 from this software without specific prior written permission.
19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31 I40E/IXGBE/IGB Virtual Function Driver
32 ======================================
34 Supported Intel® Ethernet Controllers (see the *DPDK Release Notes* for details)
35 support the following modes of operation in a virtualized environment:
37 * **SR-IOV mode**: Involves direct assignment of part of the port resources to different guest operating systems
38 using the PCI-SIG Single Root I/O Virtualization (SR IOV) standard,
39 also known as "native mode" or"pass-through" mode.
40 In this chapter, this mode is referred to as IOV mode.
42 * **VMDq mode**: Involves central management of the networking resources by an IO Virtual Machine (IOVM) or
43 a Virtual Machine Monitor (VMM), also known as software switch acceleration mode.
44 In this chapter, this mode is referred to as the Next Generation VMDq mode.
46 SR-IOV Mode Utilization in a DPDK Environment
47 ---------------------------------------------
49 The DPDK uses the SR-IOV feature for hardware-based I/O sharing in IOV mode.
50 Therefore, it is possible to partition SR-IOV capability on Ethernet controller NIC resources logically and
51 expose them to a virtual machine as a separate PCI function called a "Virtual Function".
54 Therefore, a NIC is logically distributed among multiple virtual machines (as shown in Figure 10),
55 while still having global data in common to share with the Physical Function and other Virtual Functions.
56 The DPDK i40evf, igbvf or ixgbevf as a Poll Mode Driver (PMD) serves for the Intel® 82576 Gigabit Ethernet Controller,
57 Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller NIC,
58 or Intel® Fortville 10/40 Gigabit Ethernet Controller NIC's virtual PCI function.
59 Meanwhile the DPDK Poll Mode Driver (PMD) also supports "Physical Function" of such NIC's on the host.
61 The DPDK PF/VF Poll Mode Driver (PMD) supports the Layer 2 switch on Intel® 82576 Gigabit Ethernet Controller,
62 Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller,
63 and Intel® Fortville 10/40 Gigabit Ethernet Controller NICs so that guest can choose it for inter virtual machine traffic in SR-IOV mode.
65 For more detail on SR-IOV, please refer to the following documents:
67 * `SR-IOV provides hardware based I/O sharing <http://www.intel.com/network/connectivity/solutions/vmdc.htm>`_
69 * `PCI-SIG-Single Root I/O Virtualization Support on IA
70 <http://www.intel.com/content/www/us/en/pci-express/pci-sig-single-root-io-virtualization-support-in-virtualization-technology-for-connectivity-paper.html>`_
72 * `Scalable I/O Virtualized Servers <http://www.intel.com/content/www/us/en/virtualization/server-virtualization/scalable-i-o-virtualized-servers-paper.html>`_
76 **Figure 10. Virtualization for a Single Port NIC in SR-IOV Mode**
78 .. image24_png has been renamed
82 Physical and Virtual Function Infrastructure
83 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
85 The following describes the Physical Function and Virtual Functions infrastructure for the supported Ethernet Controller NICs.
87 Virtual Functions operate under the respective Physical Function on the same NIC Port and therefore have no access
88 to the global NIC resources that are shared between other functions for the same NIC port.
90 A Virtual Function has basic access to the queue resources and control structures of the queues assigned to it.
91 For global resource access, a Virtual Function has to send a request to the Physical Function for that port,
92 and the Physical Function operates on the global resources on behalf of the Virtual Function.
93 For this out-of-band communication, an SR-IOV enabled NIC provides a memory buffer for each Virtual Function,
94 which is called a "Mailbox".
96 Intel® Fortville 10/40 Gigabit Ethernet Controller VF Infrastructure
97 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
99 In a virtualized environment, the programmer can enable a maximum of *128 Virtual Functions (VF)*
100 globally per Intel® Fortville 10/40 Gigabit Ethernet Controller NIC device.
101 Each VF can have a maximum of 16 queue pairs.
102 The Physical Function in host could be either configured by the Linux* i40e driver
103 (in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver.
104 When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application.
108 * Using Linux* i40e driver:
110 .. code-block:: console
112 rmmod i40e (To remove the i40e module)
113 insmod i40e.ko max_vfs=2,2 (To enable two Virtual Functions per port)
115 * Using the DPDK PMD PF i40e driver:
117 Kernel Params: iommu=pt, intel_iommu=on
119 .. code-block:: console
123 ./dpdk_nic_bind.py -b igb_uio bb:ss.f
124 echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device)
126 Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library.
128 Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC.
129 When you enable the four Virtual Functions with the above command, the four enabled functions have a Function#
130 represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3.
133 * Virtual Functions 0 and 2 belong to Physical Function 0
135 * Virtual Functions 1 and 3 belong to Physical Function 1
139 The above is an important consideration to take into account when targeting specific packets to a selected port.
141 Intel® 82599 10 Gigabit Ethernet Controller VF Infrastructure
142 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
144 The programmer can enable a maximum of *63 Virtual Functions* and there must be *one Physical Function* per Intel® 82599
145 10 Gigabit Ethernet Controller NIC port.
146 The reason for this is that the device allows for a maximum of 128 queues per port and a virtual/physical function has to
147 have at least one queue pair (RX/TX).
148 The current implementation of the DPDK ixgbevf driver supports a single queue pair (RX/TX) per Virtual Function.
149 The Physical Function in host could be either configured by the Linux* ixgbe driver
150 (in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver.
151 When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application.
155 * Using Linux* ixgbe driver:
157 .. code-block:: console
159 rmmod ixgbe (To remove the ixgbe module)
160 insmod ixgbe max_vfs=2,2 (To enable two Virtual Functions per port)
162 * Using the DPDK PMD PF ixgbe driver:
164 Kernel Params: iommu=pt, intel_iommu=on
166 .. code-block:: console
170 ./dpdk_nic_bind.py -b igb_uio bb:ss.f
171 echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device)
173 Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library.
175 Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC.
176 When you enable the four Virtual Functions with the above command, the four enabled functions have a Function#
177 represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3.
180 * Virtual Functions 0 and 2 belong to Physical Function 0
182 * Virtual Functions 1 and 3 belong to Physical Function 1
186 The above is an important consideration to take into account when targeting specific packets to a selected port.
188 Intel® 82576 Gigabit Ethernet Controller and Intel® Ethernet Controller I350 Family VF Infrastructure
189 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
191 In a virtualized environment, an Intel® 82576 Gigabit Ethernet Controller serves up to eight virtual machines (VMs).
192 The controller has 16 TX and 16 RX queues.
193 They are generally referred to (or thought of) as queue pairs (one TX and one RX queue).
194 This gives the controller 16 queue pairs.
196 A pool is a group of queue pairs for assignment to the same VF, used for transmit and receive operations.
197 The controller has eight pools, with each pool containing two queue pairs, that is, two TX and two RX queues assigned to each VF.
199 In a virtualized environment, an Intel® Ethernet Controller I350 family device serves up to eight virtual machines (VMs) per port.
200 The eight queues can be accessed by eight different VMs if configured correctly (the i350 has 4x1GbE ports each with 8T X and 8 RX queues),
201 that means, one Transmit and one Receive queue assigned to each VF.
205 * Using Linux* igb driver:
207 .. code-block:: console
209 rmmod igb (To remove the igb module)
210 insmod igb max_vfs=2,2 (To enable two Virtual Functions per port)
212 * Using Intel® DPDK PMD PF igb driver:
214 Kernel Params: iommu=pt, intel_iommu=on modprobe uio
216 .. code-block:: console
219 ./dpdk_nic_bind.py -b igb_uio bb:ss.f
220 echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific pci device)
222 Launch DPDK testpmd/example or your own host daemon application using the DPDK PMD library.
224 Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a four-port NIC.
225 When you enable the four Virtual Functions with the above command, the four enabled functions have a Function#
226 represented by (Bus#, Device#, Function#) in sequence, starting from 0 to 7.
229 * Virtual Functions 0 and 4 belong to Physical Function 0
231 * Virtual Functions 1 and 5 belong to Physical Function 1
233 * Virtual Functions 2 and 6 belong to Physical Function 2
235 * Virtual Functions 3 and 7 belong to Physical Function 3
239 The above is an important consideration to take into account when targeting specific packets to a selected port.
241 Validated Hypervisors
242 ~~~~~~~~~~~~~~~~~~~~~
244 The validated hypervisor is:
246 * KVM (Kernel Virtual Machine) with Qemu, version 0.14.0
248 However, the hypervisor is bypassed to configure the Virtual Function devices using the Mailbox interface,
249 the solution is hypervisor-agnostic.
250 Xen* and VMware* (when SR- IOV is supported) will also be able to support the DPDK with Virtual Function driver support.
252 Expected Guest Operating System in Virtual Machine
253 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
255 The expected guest operating systems in a virtualized environment are:
257 * Fedora* 14 (64-bit)
259 * Ubuntu* 10.04 (64-bit)
261 For supported kernel versions, refer to the *DPDK Release Notes*.
263 Setting Up a KVM Virtual Machine Monitor
264 ----------------------------------------
266 The following describes a target environment:
268 * Host Operating System: Fedora 14
270 * Hypervisor: KVM (Kernel Virtual Machine) with Qemu version 0.14.0
272 * Guest Operating System: Fedora 14
274 * Linux Kernel Version: Refer to the *DPDK Getting Started Guide*
276 * Target Applications: l2fwd, l3fwd-vf
278 The setup procedure is as follows:
280 #. Before booting the Host OS, open **BIOS setup** and enable **Intel® VT features**.
282 #. While booting the Host OS kernel, pass the intel_iommu=on kernel command line argument using GRUB.
283 When using DPDK PF driver on host, pass the iommu=pt kernel command line argument in GRUB.
285 #. Download qemu-kvm-0.14.0 from
286 `http://sourceforge.net/projects/kvm/files/qemu-kvm/ <http://sourceforge.net/projects/kvm/files/qemu-kvm/>`_
287 and install it in the Host OS using the following steps:
289 When using a recent kernel (2.6.25+) with kvm modules included:
291 .. code-block:: console
293 tar xzf qemu-kvm-release.tar.gz
295 ./configure --prefix=/usr/local/kvm
298 sudo /sbin/modprobe kvm-intel
300 When using an older kernel, or a kernel from a distribution without the kvm modules,
301 you must download (from the same link), compile and install the modules yourself:
303 .. code-block:: console
305 tar xjf kvm-kmod-release.tar.bz2
310 sudo /sbin/modprobe kvm-intel
312 qemu-kvm installs in the /usr/local/bin directory.
314 For more details about KVM configuration and usage, please refer to:
316 `http://www.linux-kvm.org/page/HOWTO1 <http://www.linux-kvm.org/page/HOWTO1>`_.
318 #. Create a Virtual Machine and install Fedora 14 on the Virtual Machine.
319 This is referred to as the Guest Operating System (Guest OS).
321 #. Download and install the latest ixgbe driver from:
323 `http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=14687 <http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=14687>`_
327 When using Linux kernel ixgbe driver, unload the Linux ixgbe driver and reload it with the max_vfs=2,2 argument:
329 .. code-block:: console
332 "modprobe ixgbe max_vfs=2,2"
334 When using DPDK PMD PF driver, insert DPDK kernel module igb_uio and set the number of VF by sysfs max_vfs:
336 .. code-block:: console
340 ./dpdk_nic_bind.py -b igb_uio 02:00.0 02:00.1 0e:00.0 0e:00.1
341 echo 2 > /sys/bus/pci/devices/0000\:02\:00.0/max_vfs
342 echo 2 > /sys/bus/pci/devices/0000\:02\:00.1/max_vfs
343 echo 2 > /sys/bus/pci/devices/0000\:0e\:00.0/max_vfs
344 echo 2 > /sys/bus/pci/devices/0000\:0e\:00.1/max_vfs
348 You need to explicitly specify number of vfs for each port, for example,
349 in the command above, it creates two vfs for the first two ixgbe ports.
351 Let say we have a machine with four physical ixgbe ports:
362 The command above creates two vfs for device 0000:02:00.0:
364 .. code-block:: console
366 ls -alrt /sys/bus/pci/devices/0000\:02\:00.0/virt*
367 lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/ virtfn1 -> ../0000:02:10.2
368 lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/ virtfn0 -> ../0000:02:10.0
370 It also creates two vfs for device 0000:02:00.1:
372 .. code-block:: console
374 ls -alrt /sys/bus/pci/devices/0000\:02\:00.1/virt*
375 lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/
376 virtfn1 -> ../0000:02:10.3
377 lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/
378 virtfn0 -> ../0000:02:10.1
380 #. List the PCI devices connected and notice that the Host OS shows two Physical Functions (traditional ports)
381 and four Virtual Functions (two for each port).
382 This is the result of the previous step.
384 #. Insert the pci_stub module to hold the PCI devices that are freed from the default driver using the following command
385 (see http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM Section 4 for more information):
387 .. code-block:: console
389 sudo /sbin/modprobe pci-stub
391 Unbind the default driver from the PCI devices representing the Virtual Functions.
392 A script to perform this action is as follows:
394 .. code-block:: console
396 echo "8086 10ed" > /sys/bus/pci/drivers/pci-stub/new_id
397 echo 0000:08:10.0 > /sys/bus/pci/devices/0000:08:10.0/driver/unbind
398 echo 0000:08:10.0 > /sys/bus/pci/drivers/pci-stub/bind
400 where, 0000:08:10.0 belongs to the Virtual Function visible in the Host OS.
402 #. Now, start the Virtual Machine by running the following command:
404 .. code-block:: console
406 /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0
410 — -m = memory to assign
412 — -smp = number of smp cores
414 — -boot = boot option
416 — -hda = virtual disk image
418 — -device = device to attach
422 — The pci-assign,host=08:10.0 alue indicates that you want to attach a PCI device
423 to a Virtual Machine and the respective (Bus:Device.Function)
424 numbers should be passed for the Virtual Function to be attached.
426 — qemu-kvm-0.14.0 allows a maximum of four PCI devices assigned to a VM,
427 but this is qemu-kvm version dependent since qemu-kvm-0.14.1 allows a maximum of five PCI devices.
429 — qemu-system-x86_64 also has a -cpu command line option that is used to select the cpu_model
430 to emulate in a Virtual Machine. Therefore, it can be used as:
432 .. code-block:: console
434 /usr/local/kvm/bin/qemu-system-x86_64 -cpu ?
436 (to list all available cpu_models)
438 /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -cpu host -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0
440 (to use the same cpu_model equivalent to the host cpu)
442 For more information, please refer to: `http://wiki.qemu.org/Features/CPUModels <http://wiki.qemu.org/Features/CPUModels>`_.
444 #. Install and run DPDK host app to take over the Physical Function. Eg.
446 .. code-block:: console
448 make install T=x86_64-native-linuxapp-gcc
449 ./x86_64-native-linuxapp-gcc/app/testpmd -c f -n 4 -- -i
451 #. Finally, access the Guest OS using vncviewer with the localhost:5900 port and check the lspci command output in the Guest OS.
452 The virtual functions will be listed as available for use.
454 #. Configure and install the DPDK with an x86_64-native-linuxapp-gcc configuration on the Guest OS as normal,
455 that is, there is no change to the normal installation procedure.
457 .. code-block:: console
459 make config T=x86_64-native-linuxapp-gcc O=x86_64-native-linuxapp-gcc
460 cd x86_64-native-linuxapp-gcc
465 If you are unable to compile the DPDK and you are getting "error: CPU you selected does not support x86-64 instruction set",
466 power off the Guest OS and start the virtual machine with the correct -cpu option in the qemu- system-x86_64 command as shown in step 9.
467 You must select the best x86_64 cpu_model to emulate or you can select host option if available.
471 Run the DPDK l2fwd sample application in the Guest OS with Hugepages enabled.
472 For the expected benchmark performance, you must pin the cores from the Guest OS to the Host OS (taskset can be used to do this) and
473 you must also look at the PCI Bus layout on the board to ensure you are not running the traffic over the QPI Inteface.
477 * The Virtual Machine Manager (the Fedora package name is virt-manager) is a utility for virtual machine management
478 that can also be used to create, start, stop and delete virtual machines.
479 If this option is used, step 2 and 6 in the instructions provided will be different.
481 * virsh, a command line utility for virtual machine management,
482 can also be used to bind and unbind devices to a virtual machine in Ubuntu.
483 If this option is used, step 6 in the instructions provided will be different.
485 * The Virtual Machine Monitor (see Figure 11) is equivalent to a Host OS with KVM installed as described in the instructions.
489 **Figure 11. Performance Benchmark Setup**
491 .. image25_png has been renamed
495 DPDK SR-IOV PMD PF/VF Driver Usage Model
496 ----------------------------------------
498 Fast Host-based Packet Processing
499 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
501 Software Defined Network (SDN) trends are demanding fast host-based packet handling.
502 In a virtualization environment,
503 the DPDK VF PMD driver performs the same throughput result as a non-VT native environment.
505 With such host instance fast packet processing, lots of services such as filtering, QoS,
506 DPI can be offloaded on the host fast path.
508 shows the scenario where some VMs directly communicate externally via a VFs,
509 while others connect to a virtual switch and share the same uplink bandwidth.
513 **Figure 12. Fast Host-based Packet Processing**
515 .. image26_png has been renamed
519 SR-IOV (PF/VF) Approach for Inter-VM Communication
520 --------------------------------------------------
522 Inter-VM data communication is one of the traffic bottle necks in virtualization platforms.
523 SR-IOV device assignment helps a VM to attach the real device, taking advantage of the bridge in the NIC.
524 So VF-to-VF traffic within the same physical port (VM0<->VM1) have hardware acceleration.
525 However, when VF crosses physical ports (VM0<->VM2), there is no such hardware bridge.
526 In this case, the DPDK PMD PF driver provides host forwarding between such VMs.
528 Figure 13 shows an example.
529 In this case an update of the MAC address lookup tables in both the NIC and host DPDK application is required.
531 In the NIC, writing the destination of a MAC address belongs to another cross device VM to the PF specific pool.
532 So when a packet comes in, its destination MAC address will match and forward to the host DPDK PMD application.
534 In the host DPDK application, the behavior is similar to L2 forwarding,
535 that is, the packet is forwarded to the correct PF pool.
536 The SR-IOV NIC switch forwards the packet to a specific VM according to the MAC destination address
537 which belongs to the destination VF on the VM.
541 **Figure 13. Inter-VM Communication**
543 .. image27_png has been renamed
547 .. |perf_benchmark| image:: img/perf_benchmark.png
549 .. |single_port_nic| image:: img/single_port_nic.png
551 .. |inter_vm_comms| image:: img/inter_vm_comms.png
553 .. |fast_pkt_proc| image:: img/fast_pkt_proc.png