3 Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
6 Redistribution and use in source and binary forms, with or without
7 modification, are permitted provided that the following conditions
10 * Redistributions of source code must retain the above copyright
11 notice, this list of conditions and the following disclaimer.
12 * Redistributions in binary form must reproduce the above copyright
13 notice, this list of conditions and the following disclaimer in
14 the documentation and/or other materials provided with the
16 * Neither the name of Intel Corporation nor the names of its
17 contributors may be used to endorse or promote products derived
18 from this software without specific prior written permission.
20 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
21 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
22 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
23 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
24 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
25 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
26 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
27 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
28 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
29 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
30 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
33 Vhost Sample Application
34 ========================
36 The vhost sample application demonstrates integration of the Data Plane Development Kit (DPDK)
37 with the Linux* KVM hypervisor by implementing the vhost-net offload API.
38 The sample application performs simple packet switching between virtual machines based on Media Access Control
39 (MAC) address or Virtual Local Area Network (VLAN) tag.
40 The splitting of ethernet traffic from an external switch is performed in hardware by the Virtual Machine Device Queues
41 (VMDQ) and Data Center Bridging (DCB) features of the IntelĀ® 82599 10 Gigabit Ethernet Controller.
46 Virtio networking (virtio-net) was developed as the Linux* KVM para-virtualized method for communicating network packets
47 between host and guest.
48 It was found that virtio-net performance was poor due to context switching and packet copying between host, guest, and QEMU.
49 The following figure shows the system architecture for a virtio-based networking (virtio-net).
53 **Figure16. QEMU Virtio-net (prior to vhost-net)**
55 .. image19_png has been renamed
59 The Linux* Kernel vhost-net module was developed as an offload mechanism for virtio-net.
60 The vhost-net module enables KVM (QEMU) to offload the servicing of virtio-net devices to the vhost-net kernel module,
61 reducing the context switching and packet copies in the virtual dataplane.
63 This is achieved by QEMU sharing the following information with the vhost-net module through the vhost-net API:
65 * The layout of the guest memory space, to enable the vhost-net module to translate addresses.
67 * The locations of virtual queues in QEMU virtual address space,
68 to enable the vhost module to read/write directly to and from the virtqueues.
70 * An event file descriptor (eventfd) configured in KVM to send interrupts to the virtio- net device driver in the guest.
71 This enables the vhost-net module to notify (call) the guest.
73 * An eventfd configured in KVM to be triggered on writes to the virtio-net device's
74 Peripheral Component Interconnect (PCI) config space.
75 This enables the vhost-net module to receive notifications (kicks) from the guest.
77 The following figure shows the system architecture for virtio-net networking with vhost-net offload.
81 **Figure 17. Virtio with Linux* Kernel Vhost**
83 .. image20_png has been renamed
90 The DPDK vhost-net sample code demonstrates KVM (QEMU) offloading the servicing of a Virtual Machine's (VM's)
91 virtio-net devices to a DPDK-based application in place of the kernel's vhost-net module.
93 The DPDK vhost-net sample code is based on vhost library. Vhost library is developed for user space ethernet switch to
94 easily integrate with vhost functionality.
96 The vhost library implements the following features:
98 * Management of virtio-net device creation/destruction events.
100 * Mapping of the VM's physical memory into the DPDK vhost-net's address space.
102 * Triggering/receiving notifications to/from VMs via eventfds.
104 * A virtio-net back-end implementation providing a subset of virtio-net features.
106 There are two vhost implementations in vhost library, vhost cuse and vhost user. In vhost cuse, a character device driver is implemented to
107 receive and process vhost requests through ioctl messages. In vhost user, a socket server is created to received vhost requests through
108 socket messages. Most of the messages share the same handler routine.
111 **Any vhost cuse specific requirement in the following sections will be emphasized**.
113 Two impelmentations are turned on and off statically through configure file. Only one implementation could be turned on. They don't co-exist in current implementation.
115 The vhost sample code application is a simple packet switching application with the following feature:
117 * Packet switching between virtio-net devices and the network interface card,
118 including using VMDQs to reduce the switching that needs to be performed in software.
120 The following figure shows the architecture of the Vhost sample application based on vhost-cuse.
124 **Figure 18. Vhost-net Architectural Overview**
126 .. image21_png has been renamed
130 The following figure shows the flow of packets through the vhost-net sample application.
134 **Figure 19. Packet Flow Through the vhost-net Sample Application**
136 .. image22_png has been renamed
138 |vhost_net_sample_app|
140 Supported Distributions
141 -----------------------
143 The example in this section have been validated with the following distributions:
154 This section lists prerequisite packages that must be installed.
156 Installing Packages on the Host(vhost cuse required)
157 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
159 The vhost cuse code uses the following packages; fuse, fuse-devel, and kernel-modules-extra.
160 The vhost user code don't rely on those modules as eventfds are already installed into vhost process through
163 #. Install Fuse Development Libraries and headers:
165 .. code-block:: console
167 yum -y install fuse fuse-devel
169 #. Install the Cuse Kernel Module:
171 .. code-block:: console
173 yum -y install kernel-modules-extra
178 For vhost user, qemu 2.2 is required.
180 Setting up the Execution Environment
181 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
183 The vhost sample code requires that QEMU allocates a VM's memory on the hugetlbfs file system.
184 As the vhost sample code requires hugepages,
185 the best practice is to partition the system into separate hugepage mount points for the VMs and the vhost sample code.
189 This is best-practice only and is not mandatory.
190 For systems that only support 2 MB page sizes,
191 both QEMU and vhost sample code can use the same hugetlbfs mount point without issue.
195 VMs with gigabytes of memory can benefit from having QEMU allocate their memory from 1 GB huge pages.
196 1 GB huge pages must be allocated at boot time by passing kernel parameters through the grub boot loader.
198 #. Calculate the maximum memory usage of all VMs to be run on the system.
199 Then, round this value up to the nearest Gigabyte the execution environment will require.
201 #. Edit the /etc/default/grub file, and add the following to the GRUB_CMDLINE_LINUX entry:
203 .. code-block:: console
205 GRUB_CMDLINE_LINUX="... hugepagesz=1G hugepages=<Number of hugepages required> default_hugepagesz=1G"
207 #. Update the grub boot loader:
209 .. code-block:: console
211 grub2-mkconfig -o /boot/grub2/grub.cfg
213 #. Reboot the system.
215 #. The hugetlbfs mount point (/dev/hugepages) should now default to allocating gigabyte pages.
219 Making the above modification will change the system default hugepage size to 1 GB for all applications.
221 **Vhost Sample Code**
223 In this section, we create a second hugetlbs mount point to allocate hugepages for the DPDK vhost sample code.
225 #. Allocate sufficient 2 MB pages for the DPDK vhost sample code:
227 .. code-block:: console
229 echo 256 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
231 #. Mount hugetlbs at a separate mount point for 2 MB pages:
233 .. code-block:: console
235 mount -t hugetlbfs nodev /mnt/huge -o pagesize=2M
237 The above steps can be automated by doing the following:
239 #. Edit /etc/fstab to add an entry to automatically mount the second hugetlbfs mount point:
243 hugetlbfs <tab> /mnt/huge <tab> hugetlbfs defaults,pagesize=1G 0 0
245 #. Edit the /etc/default/grub file, and add the following to the GRUB_CMDLINE_LINUX entry:
249 GRUB_CMDLINE_LINUX="... hugepagesz=2M hugepages=256 ... default_hugepagesz=1G"
251 #. Update the grub bootloader:
253 .. code-block:: console
255 grub2-mkconfig -o /boot/grub2/grub.cfg
257 #. Reboot the system.
261 Ensure that the default hugepage size after this setup is 1 GB.
263 Setting up the Guest Execution Environment
264 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
266 It is recommended for testing purposes that the DPDK testpmd sample application is used in the guest to forward packets,
267 the reasons for this are discussed in Section 22.7, "Running the Virtual Machine (QEMU)".
269 The testpmd application forwards packets between pairs of Ethernet devices,
270 it requires an even number of Ethernet devices (virtio or otherwise) to execute.
271 It is therefore recommended to create multiples of two virtio-net devices for each Virtual Machine either through libvirt or
272 at the command line as follows.
276 Observe that in the example, "-device" and "-netdev" are repeated for two virtio-net devices.
280 .. code-block:: console
282 user@target:~$ qemu-system-x86_64 ... \
283 -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> \
284 -device virtio-net-pci, netdev=hostnet1,id=net1 \
285 -netdev tap,id=hostnet2,vhost=on,vhostfd=<open fd> \
286 -device virtio-net-pci, netdev=hostnet2,id=net1
290 .. code-block:: console
292 user@target:~$ qemu-system-x86_64 ... \
293 -chardev socket,id=char1,path=<sock_path> \
294 -netdev type=vhost-user,id=hostnet1,chardev=char1 \
295 -device virtio-net-pci,netdev=hostnet1,id=net1 \
296 -chardev socket,id=char2,path=<sock_path> \
297 -netdev type=vhost-user,id=hostnet2,chardev=char2 \
298 -device virtio-net-pci,netdev=hostnet2,id=net2
300 sock_path is the path for the socket file created by vhost.
302 Compiling the Sample Code
303 -------------------------
304 #. Compile vhost lib:
306 To enable vhost, turn on vhost library in the configure file config/common_linuxapp.
308 .. code-block:: console
310 CONFIG_RTE_LIBRTE_VHOST=n
312 vhost user is turned on by default in the lib/librte_vhost/Makefile.
313 To enable vhost cuse, uncomment vhost cuse and comment vhost user manually. In future, a configure will be created for switch between two implementations.
315 .. code-block:: console
317 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c
318 #SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c vhost_user/virtio-net-user.c vhost_user/fd_man.c
320 After vhost is enabled and the implementation is selected, build the vhost library.
322 #. Go to the examples directory:
324 .. code-block:: console
326 export RTE_SDK=/path/to/rte_sdk
327 cd ${RTE_SDK}/examples/vhost
329 #. Set the target (a default target is used if not specified). For example:
331 .. code-block:: console
333 export RTE_TARGET=x86_64-native-linuxapp-gcc
335 See the DPDK Getting Started Guide for possible RTE_TARGET values.
337 #. Build the application:
339 .. code-block:: console
342 make config ${RTE_TARGET}
343 make install ${RTE_TARGET}
344 cd ${RTE_SDK}/examples/vhost
347 #. Go to the eventfd_link directory(vhost cuse required):
349 .. code-block:: console
351 cd ${RTE_SDK}/lib/librte_vhost/eventfd_link
353 #. Build the eventfd_link kernel module(vhost cuse required):
355 .. code-block:: console
359 Running the Sample Code
360 -----------------------
362 #. Install the cuse kernel module(vhost cuse required):
364 .. code-block:: console
368 #. Go to the eventfd_link directory(vhost cuse required):
370 .. code-block:: console
372 export RTE_SDK=/path/to/rte_sdk
373 cd ${RTE_SDK}/lib/librte_vhost/eventfd_link
375 #. Install the eventfd_link module(vhost cuse required):
377 .. code-block:: console
379 insmod ./eventfd_link.ko
381 #. Go to the examples directory:
383 .. code-block:: console
385 export RTE_SDK=/path/to/rte_sdk
386 cd ${RTE_SDK}/examples/vhost
388 #. Run the vhost-switch sample code:
392 .. code-block:: console
394 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost --dev-index 1
396 vhost user: a socket file named usvhost will be created under current directory. Use its path as the socket path in guest's qemu commandline.
398 .. code-block:: console
400 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost
404 Please note the huge-dir parameter instructs the DPDK to allocate its memory from the 2 MB page hugetlbfs.
409 **Basename and Index.**
410 vhost cuse uses a Linux* character device to communicate with QEMU.
411 The basename and the index are used to generate the character devices name.
413 /dev/<basename>-<index>
415 The index parameter is provided for a situation where multiple instances of the virtual switch is required.
417 For compatibility with the QEMU wrapper script, a base name of "usvhost" and an index of "1" should be used:
419 .. code-block:: console
421 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost --dev-index 1
424 The vm2vm parameter disable/set mode of packet switching between guests in the host.
425 Value of "0" means disabling vm2vm implies that on virtual machine packet transmission will always go to the Ethernet port;
426 Value of "1" means software mode packet forwarding between guests, it needs packets copy in vHOST,
427 so valid only in one-copy implementation, and invalid for zero copy implementation;
428 value of "2" means hardware mode packet forwarding between guests, it allows packets go to the Ethernet port,
429 hardware L2 switch will determine which guest the packet should forward to or need send to external,
430 which bases on the packet destination MAC address and VLAN tag.
432 .. code-block:: console
434 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --vm2vm [0,1,2]
436 **Mergeable Buffers.**
437 The mergeable buffers parameter controls how virtio-net descriptors are used for virtio-net headers.
438 In a disabled state, one virtio-net header is used per packet buffer;
439 in an enabled state one virtio-net header is used for multiple packets.
440 The default value is 0 or disabled since recent kernels virtio-net drivers show performance degradation with this feature is enabled.
442 .. code-block:: console
444 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --mergeable [0,1]
447 The stats parameter controls the printing of virtio-net device statistics.
448 The parameter specifies an interval second to print statistics, with an interval of 0 seconds disabling statistics.
450 .. code-block:: console
452 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --stats [0,n]
455 The rx-retry option enables/disables enqueue retries when the guests RX queue is full.
456 This feature resolves a packet loss that is observed at high data-rates,
457 by allowing it to delay and retry in the receive path.
458 This option is enabled by default.
460 .. code-block:: console
462 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry [0,1]
465 The rx-retry-num option specifies the number of retries on an RX burst,
466 it takes effect only when rx retry is enabled.
467 The default value is 4.
469 .. code-block:: console
471 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry 1 --rx-retry-num 5
473 **RX Retry Delay Time.**
474 The rx-retry-delay option specifies the timeout (in micro seconds) between retries on an RX burst,
475 it takes effect only when rx retry is enabled.
476 The default value is 15.
478 .. code-block:: console
480 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry 1 --rx-retry-delay 20
483 The zero copy option enables/disables the zero copy mode for RX/TX packet,
484 in the zero copy mode the packet buffer address from guest translate into host physical address
485 and then set directly as DMA address.
486 If the zero copy mode is disabled, then one copy mode is utilized in the sample.
487 This option is disabled by default.
489 .. code-block:: console
491 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy [0,1]
493 **RX descriptor number.**
494 The RX descriptor number option specify the Ethernet RX descriptor number,
495 Linux legacy virtio-net has different behaviour in how to use the vring descriptor from DPDK based virtio-net PMD,
496 the former likely allocate half for virtio header, another half for frame buffer,
497 while the latter allocate all for frame buffer,
498 this lead to different number for available frame buffer in vring,
499 and then lead to different Ethernet RX descriptor number could be used in zero copy mode.
500 So it is valid only in zero copy mode is enabled. The value is 32 by default.
502 .. code-block:: console
504 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy 1 --rx-desc-num [0, n]
506 **TX descriptornumber.**
507 The TX descriptor number option specify the Ethernet TX descriptor number, it is valid only in zero copy mode is enabled.
508 The value is 64 by default.
510 .. code-block:: console
512 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy 1 --tx-desc-num [0, n]
515 The VLAN strip option enable/disable the VLAN strip on host, if disabled, the guest will receive the packets with VLAN tag.
516 It is enabled by default.
518 .. code-block:: console
520 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --vlan-strip [0, 1]
522 Running the Virtual Machine (QEMU)
523 ----------------------------------
525 QEMU must be executed with specific parameters to:
527 * Ensure the guest is configured to use virtio-net network adapters.
529 .. code-block:: console
531 user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1,id=net1 ...
533 * Ensure the guest's virtio-net network adapter is configured with offloads disabled.
535 .. code-block:: console
537 user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1,id=net1,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off
539 * Redirect QEMU to communicate with the DPDK vhost-net sample code in place of the vhost-net kernel module(vhost cuse).
541 .. code-block:: console
543 user@target:~$ qemu-system-x86_64 ... -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> ...
545 * Enable the vhost-net sample code to map the VM's memory into its own process address space.
547 .. code-block:: console
549 user@target:~$ qemu-system-x86_64 ... -mem-prealloc -mem-path / dev/hugepages ...
553 The QEMU wrapper (qemu-wrap.py) is a Python script designed to automate the QEMU configuration described above.
554 It also facilitates integration with libvirt, although the script may also be used standalone without libvirt.
556 Redirecting QEMU to vhost-net Sample Code(vhost cuse)
557 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
559 To redirect QEMU to the vhost-net sample code implementation of the vhost-net API,
560 an open file descriptor must be passed to QEMU running as a child process.
562 .. code-block:: python
565 fd = os.open("/dev/usvhost-1", os.O_RDWR)
566 subprocess.call("qemu-system-x86_64 ... . -netdev tap,id=vhostnet0,vhost=on,vhostfd=" + fd +"...", shell=True)
570 This process is automated in the QEMU wrapper script discussed in Section 24.7.3.
572 Mapping the Virtual Machine's Memory
573 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
575 For the DPDK vhost-net sample code to be run correctly, QEMU must allocate the VM's memory on hugetlbfs.
576 This is done by specifying mem-prealloc and mem-path when executing QEMU.
577 The vhost-net sample code accesses the virtio-net device's virtual rings and packet buffers
578 by finding and mapping the VM's physical memory on hugetlbfs.
579 In this case, the path passed to the guest should be that of the 1 GB page hugetlbfs:
581 .. code-block:: console
583 user@target:~$ qemu-system-x86_64 ... -mem-prealloc -mem-path / dev/hugepages ...
587 This process is automated in the QEMU wrapper script discussed in Section 24.7.3.
588 The following two sections only applies to vhost cuse. For vhost-user, please make corresponding changes to qemu-wrapper script and guest XML file.
593 The QEMU wrapper script automatically detects and calls QEMU with the necessary parameters required
594 to integrate with the vhost sample code.
595 It performs the following actions:
597 * Automatically detects the location of the hugetlbfs and inserts this into the command line parameters.
599 * Automatically open file descriptors for each virtio-net device and inserts this into the command line parameters.
601 * Disables offloads on each virtio-net device.
603 * Calls Qemu passing both the command line parameters passed to the script itself and those it has auto-detected.
605 The QEMU wrapper script will automatically configure calls to QEMU:
607 .. code-block:: console
609 user@target:~$ qemu-wrap.py -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu SandyBridge -smp 4,sockets=4,cores=1,threads=1
610 -netdev tap,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1 -hda <disk img> -m 4096
612 which will become the following call to QEMU:
614 .. code-block:: console
616 /usr/local/bin/qemu-system-x86_64 -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu SandyBridge -smp 4,sockets=4,cores=1,threads=1
617 -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> -device virtio-net-pci,netdev=hostnet1,id=net1,
618 csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off -hda <disk img> -m 4096 -mem-path /dev/hugepages -mem-prealloc
623 The QEMU wrapper script (qemu-wrap.py) "wraps" libvirt calls to QEMU,
624 such that QEMU is called with the correct parameters described above.
625 To call the QEMU wrapper automatically from libvirt, the following configuration changes must be made:
627 * Place the QEMU wrapper script in libvirt's binary search PATH ($PATH).
628 A good location is in the directory that contains the QEMU binary.
630 * Ensure that the script has the same owner/group and file permissions as the QEMU binary.
632 * Update the VM xml file using virsh edit <vm name>:
634 * Set the VM to use the launch script
636 * Set the emulator path contained in the #<emulator><emulator/> tags For example,
637 replace <emulator>/usr/bin/qemu-kvm<emulator/> with <emulator>/usr/bin/qemu-wrap.py<emulator/>
639 * Set the VM's virtio-net device's to use vhost-net offload:
643 <interface type="network">
644 <model type="virtio"/>
645 <driver name="vhost"/>
648 * Enable libvirt to access the DPDK Vhost sample code's character device file by adding it
649 to controllers cgroup for libvirtd using the following steps:
653 cgroup_controllers = [ ... "devices", ... ] clear_emulator_capabilities = 0
654 user = "root" group = "root"
655 cgroup_device_acl = [
656 "/dev/null", "/dev/full", "/dev/zero",
657 "/dev/random", "/dev/urandom",
658 "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
659 "/dev/rtc", "/dev/hpet", "/dev/net/tun",
660 "/dev/<devbase-name>-<index>",
663 * Disable SELinux or set to permissive mode.
666 * Mount cgroup device controller:
668 .. code-block:: console
670 user@target:~$ mkdir /dev/cgroup
671 user@target:~$ mount -t cgroup none /dev/cgroup -o devices
673 * Restart the libvirtd system process
675 For example, on Fedora* "systemctl restart libvirtd.service"
677 * Edit the configuration parameters section of the script:
679 * Configure the "emul_path" variable to point to the QEMU emulator.
683 emul_path = "/usr/local/bin/qemu-system-x86_64"
685 * Configure the "us_vhost_path" variable to point to the DPDK vhost-net sample code's character devices name.
686 DPDK vhost-net sample code's character device will be in the format "/dev/<basename>-<index>".
690 us_vhost_path = "/dev/usvhost-1"
695 * QEMU failing to allocate memory on hugetlbfs, with an error like the following::
697 file_ram_alloc: can't mmap RAM pages: Cannot allocate memory
699 When running QEMU the above error indicates that it has failed to allocate memory for the Virtual Machine on
700 the hugetlbfs. This is typically due to insufficient hugepages being free to support the allocation request.
701 The number of free hugepages can be checked as follows:
703 .. code-block:: console
705 cat /sys/kernel/mm/hugepages/hugepages-<pagesize>/nr_hugepages
707 The command above indicates how many hugepages are free to support QEMU's allocation request.
709 * User space VHOST when the guest has 2MB sized huge pages:
711 The guest may have 2MB or 1GB sized huge pages. The user space VHOST should work properly in both cases.
713 * User space VHOST will not work with QEMU without the ``-mem-prealloc`` option:
715 The current implementation works properly only when the guest memory is pre-allocated, so it is required to
716 use a QEMU version (e.g. 1.6) which supports ``-mem-prealloc``. The ``-mem-prealloc`` option must be
717 specified explicitly in the QEMU command line.
719 * User space VHOST will not work with a QEMU version without shared memory mapping:
721 As shared memory mapping is mandatory for user space VHOST to work properly with the guest, user space VHOST
722 needs access to the shared memory from the guest to receive and transmit packets. It is important to make sure
723 the QEMU version supports shared memory mapping.
725 * Issues with ``virsh destroy`` not destroying the VM:
727 Using libvirt ``virsh create`` the ``qemu-wrap.py`` spawns a new process to run ``qemu-kvm``. This impacts the behavior
728 of ``virsh destroy`` which kills the process running ``qemu-wrap.py`` without actually destroying the VM (it leaves
729 the ``qemu-kvm`` process running):
731 This following patch should fix this issue:
732 http://dpdk.org/ml/archives/dev/2014-June/003607.html
734 * In an Ubuntu environment, QEMU fails to start a new guest normally with user space VHOST due to not being able
735 to allocate huge pages for the new guest:
737 The solution for this issue is to add ``-boot c`` into the QEMU command line to make sure the huge pages are
738 allocated properly and then the guest should start normally.
740 Use ``cat /proc/meminfo`` to check if there is any changes in the value of ``HugePages_Total`` and ``HugePages_Free``
741 after the guest startup.
743 * Log message: ``eventfd_link: module verification failed: signature and/or required key missing - tainting kernel``:
745 This log message may be ignored. The message occurs due to the kernel module ``eventfd_link``, which is not a standard
746 Linux module but which is necessary for the user space VHOST current implementation (CUSE-based) to communicate with
750 Running DPDK in the Virtual Machine
751 -----------------------------------
753 For the DPDK vhost-net sample code to switch packets into the VM,
754 the sample code must first learn the MAC address of the VM's virtio-net device.
755 The sample code detects the address from packets being transmitted from the VM, similar to a learning switch.
757 This behavior requires no special action or configuration with the Linux* virtio-net driver in the VM
758 as the Linux* Kernel will automatically transmit packets during device initialization.
759 However, DPDK-based applications must be modified to automatically transmit packets during initialization
760 to facilitate the DPDK vhost- net sample code's MAC learning.
762 The DPDK testpmd application can be configured to automatically transmit packets during initialization
763 and to act as an L2 forwarding switch.
765 Testpmd MAC Forwarding
766 ~~~~~~~~~~~~~~~~~~~~~~
768 At high packet rates, a minor packet loss may be observed.
769 To resolve this issue, a "wait and retry" mode is implemented in the testpmd and vhost sample code.
770 In the "wait and retry" mode if the virtqueue is found to be full, then testpmd waits for a period of time before retrying to enqueue packets.
772 The "wait and retry" algorithm is implemented in DPDK testpmd as a forwarding method call "mac_retry".
773 The following sequence diagram describes the algorithm in detail.
777 **Figure 20. Packet Flow on TX in DPDK-testpmd**
779 .. image23_png has been renamed
786 The testpmd application is automatically built when DPDK is installed.
787 Run the testpmd application as follows:
789 .. code-block:: console
791 user@target:~$ x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -- n 4 -socket-mem 128 -- --burst=64 -i
793 The destination MAC address for packets transmitted on each port can be set at the command line:
795 .. code-block:: console
797 user@target:~$ x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -- n 4 -socket-mem 128 -- --burst=64 -i --eth- peer=0,aa:bb:cc:dd:ee:ff --eth-peer=1,ff,ee,dd,cc,bb,aa
799 * Packets received on port 1 will be forwarded on port 0 to MAC address
803 * Packets received on port 0 will be forwarded on port 1 to MAC address
807 The testpmd application can then be configured to act as an L2 forwarding application:
809 .. code-block:: console
811 testpmd> set fwd mac_retry
813 The testpmd can then be configured to start processing packets,
814 transmitting packets first so the DPDK vhost sample code on the host can learn the MAC address:
816 .. code-block:: console
818 testpmd> start tx_first
822 Please note "set fwd mac_retry" is used in place of "set fwd mac_fwd" to ensure the retry feature is activated.
824 Passing Traffic to the Virtual Machine Device
825 ---------------------------------------------
827 For a virtio-net device to receive traffic,
828 the traffic's Layer 2 header must include both the virtio-net device's MAC address and VLAN tag.
829 The DPDK sample code behaves in a similar manner to a learning switch in that
830 it learns the MAC address of the virtio-net devices from the first transmitted packet.
831 On learning the MAC address,
832 the DPDK vhost sample code prints a message with the MAC address and VLAN tag virtio-net device.
835 .. code-block:: console
837 DATA: (0) MAC_ADDRESS cc:bb:bb:bb:bb:bb and VLAN_TAG 1000 registered
839 The above message indicates that device 0 has been registered with MAC address cc:bb:bb:bb:bb:bb and VLAN tag 1000.
840 Any packets received on the NIC with these values is placed on the devices receive queue.
841 When a virtio-net device transmits packets, the VLAN tag is added to the packet by the DPDK vhost sample code.
843 .. |vhost_net_arch| image:: img/vhost_net_arch.*
845 .. |qemu_virtio_net| image:: img/qemu_virtio_net.*
847 .. |tx_dpdk_testpmd| image:: img/tx_dpdk_testpmd.*
849 .. |vhost_net_sample_app| image:: img/vhost_net_sample_app.*
851 .. |virtio_linux_vhost| image:: img/virtio_linux_vhost.*