X-Git-Url: http://git.droids-corp.org/?a=blobdiff_plain;f=doc%2Fguides%2Fsample_app_ug%2Fvhost.rst;h=5e7f24c4e8bd75f940b176c1aba655c3825d78d2;hb=d9a42a69febf453cdb735e77fc0e01463ddf4acc;hp=8a7eb3b84074c2811af731c6e01077cab06058fd;hpb=2ee98e69e104b551aeb1d9567e8a28a854849c0f;p=dpdk.git diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst index 8a7eb3b840..5e7f24c4e8 100644 --- a/doc/guides/sample_app_ug/vhost.rst +++ b/doc/guides/sample_app_ug/vhost.rst @@ -1,851 +1,183 @@ - -.. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of Intel Corporation nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2010-2016 Intel Corporation. Vhost Sample Application ======================== -The vhost sample application demonstrates integration of the Data Plane Development Kit (DPDK) -with the Linux* KVM hypervisor by implementing the vhost-net offload API. -The sample application performs simple packet switching between virtual machines based on Media Access Control -(MAC) address or Virtual Local Area Network (VLAN) tag. -The splitting of ethernet traffic from an external switch is performed in hardware by the Virtual Machine Device Queues -(VMDQ) and Data Center Bridging (DCB) features of the Intel® 82599 10 Gigabit Ethernet Controller. - -Background ----------- - -Virtio networking (virtio-net) was developed as the Linux* KVM para-virtualized method for communicating network packets -between host and guest. -It was found that virtio-net performance was poor due to context switching and packet copying between host, guest, and QEMU. -The following figure shows the system architecture for a virtio-based networking (virtio-net). - -.. _figure_16: - -**Figure16. QEMU Virtio-net (prior to vhost-net)** - -.. image19_png has been renamed - -|qemu_virtio_net| - -The Linux* Kernel vhost-net module was developed as an offload mechanism for virtio-net. -The vhost-net module enables KVM (QEMU) to offload the servicing of virtio-net devices to the vhost-net kernel module, -reducing the context switching and packet copies in the virtual dataplane. - -This is achieved by QEMU sharing the following information with the vhost-net module through the vhost-net API: - -* The layout of the guest memory space, to enable the vhost-net module to translate addresses. - -* The locations of virtual queues in QEMU virtual address space, - to enable the vhost module to read/write directly to and from the virtqueues. - -* An event file descriptor (eventfd) configured in KVM to send interrupts to the virtio- net device driver in the guest. - This enables the vhost-net module to notify (call) the guest. - -* An eventfd configured in KVM to be triggered on writes to the virtio-net device's - Peripheral Component Interconnect (PCI) config space. - This enables the vhost-net module to receive notifications (kicks) from the guest. - -The following figure shows the system architecture for virtio-net networking with vhost-net offload. - -.. _figure_17: - -**Figure 17. Virtio with Linux* Kernel Vhost** - -.. image20_png has been renamed - -|virtio_linux_vhost| - -Sample Code Overview --------------------- - -The DPDK vhost-net sample code demonstrates KVM (QEMU) offloading the servicing of a Virtual Machine's (VM's) -virtio-net devices to a DPDK-based application in place of the kernel's vhost-net module. - -The DPDK vhost-net sample code is based on vhost library. Vhost library is developed for user space ethernet switch to -easily integrate with vhost functionality. - -The vhost library implements the following features: - -* Management of virtio-net device creation/destruction events. - -* Mapping of the VM's physical memory into the DPDK vhost-net's address space. - -* Triggering/receiving notifications to/from VMs via eventfds. - -* A virtio-net back-end implementation providing a subset of virtio-net features. - -There are two vhost implementations in vhost library, vhost cuse and vhost user. In vhost cuse, a character device driver is implemented to -receive and process vhost requests through ioctl messages. In vhost user, a socket server is created to received vhost requests through -socket messages. Most of the messages share the same handler routine. - -.. note:: - **Any vhost cuse specific requirement in the following sections will be emphasized**. - -Two impelmentations are turned on and off statically through configure file. Only one implementation could be turned on. They don't co-exist in current implementation. - -The vhost sample code application is a simple packet switching application with the following feature: - -* Packet switching between virtio-net devices and the network interface card, - including using VMDQs to reduce the switching that needs to be performed in software. - -The following figure shows the architecture of the Vhost sample application based on vhost-cuse. - -.. _figure_18: - -**Figure 18. Vhost-net Architectural Overview** - -.. image21_png has been renamed - -|vhost_net_arch| +The vhost sample application demonstrates integration of the Data Plane +Development Kit (DPDK) with the Linux* KVM hypervisor by implementing the +vhost-net offload API. The sample application performs simple packet +switching between virtual machines based on Media Access Control (MAC) +address or Virtual Local Area Network (VLAN) tag. The splitting of Ethernet +traffic from an external switch is performed in hardware by the Virtual +Machine Device Queues (VMDQ) and Data Center Bridging (DCB) features of +the Intel® 82599 10 Gigabit Ethernet Controller. -The following figure shows the flow of packets through the vhost-net sample application. - -.. _figure_19: - -**Figure 19. Packet Flow Through the vhost-net Sample Application** - -.. image22_png has been renamed - -|vhost_net_sample_app| - -Supported Distributions ------------------------ - -The example in this section have been validated with the following distributions: - -* Fedora* 18 - -* Fedora* 19 - -* Fedora* 20 - -Prerequisites +Testing steps ------------- -This section lists prerequisite packages that must be installed. - -Installing Packages on the Host(vhost cuse required) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The vhost cuse code uses the following packages; fuse, fuse-devel, and kernel-modules-extra. -The vhost user code don't rely on those modules as eventfds are already installed into vhost process through -unix domain socket. - -#. Install Fuse Development Libraries and headers: - - .. code-block:: console - - yum -y install fuse fuse-devel - -#. Install the Cuse Kernel Module: - - .. code-block:: console - - yum -y install kernel-modules-extra - -QEMU simulator -~~~~~~~~~~~~~~ - -For vhost user, qemu 2.2 is required. - -Setting up the Execution Environment -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The vhost sample code requires that QEMU allocates a VM's memory on the hugetlbfs file system. -As the vhost sample code requires hugepages, -the best practice is to partition the system into separate hugepage mount points for the VMs and the vhost sample code. - -.. note:: - - This is best-practice only and is not mandatory. - For systems that only support 2 MB page sizes, - both QEMU and vhost sample code can use the same hugetlbfs mount point without issue. - -**QEMU** - -VMs with gigabytes of memory can benefit from having QEMU allocate their memory from 1 GB huge pages. -1 GB huge pages must be allocated at boot time by passing kernel parameters through the grub boot loader. - -#. Calculate the maximum memory usage of all VMs to be run on the system. - Then, round this value up to the nearest Gigabyte the execution environment will require. - -#. Edit the /etc/default/grub file, and add the following to the GRUB_CMDLINE_LINUX entry: - - .. code-block:: console - - GRUB_CMDLINE_LINUX="... hugepagesz=1G hugepages= default_hugepagesz=1G" - -#. Update the grub boot loader: - - .. code-block:: console - - grub2-mkconfig -o /boot/grub2/grub.cfg - -#. Reboot the system. - -#. The hugetlbfs mount point (/dev/hugepages) should now default to allocating gigabyte pages. - -.. note:: - - Making the above modification will change the system default hugepage size to 1 GB for all applications. - -**Vhost Sample Code** - -In this section, we create a second hugetlbs mount point to allocate hugepages for the DPDK vhost sample code. - -#. Allocate sufficient 2 MB pages for the DPDK vhost sample code: - - .. code-block:: console - - echo 256 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages - -#. Mount hugetlbs at a separate mount point for 2 MB pages: - - .. code-block:: console - - mount -t hugetlbfs nodev /mnt/huge -o pagesize=2M - -The above steps can be automated by doing the following: - -#. Edit /etc/fstab to add an entry to automatically mount the second hugetlbfs mount point: - - :: - - hugetlbfs /mnt/huge hugetlbfs defaults,pagesize=1G 0 0 - -#. Edit the /etc/default/grub file, and add the following to the GRUB_CMDLINE_LINUX entry: - - :: - - GRUB_CMDLINE_LINUX="... hugepagesz=2M hugepages=256 ... default_hugepagesz=1G" - -#. Update the grub bootloader: - - .. code-block:: console - - grub2-mkconfig -o /boot/grub2/grub.cfg - -#. Reboot the system. - -.. note:: - - Ensure that the default hugepage size after this setup is 1 GB. - -Setting up the Guest Execution Environment -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -It is recommended for testing purposes that the DPDK testpmd sample application is used in the guest to forward packets, -the reasons for this are discussed in Section 22.7, "Running the Virtual Machine (QEMU)". - -The testpmd application forwards packets between pairs of Ethernet devices, -it requires an even number of Ethernet devices (virtio or otherwise) to execute. -It is therefore recommended to create multiples of two virtio-net devices for each Virtual Machine either through libvirt or -at the command line as follows. - -.. note:: - - Observe that in the example, "-device" and "-netdev" are repeated for two virtio-net devices. - -For vhost cuse: - -.. code-block:: console - - user@target:~$ qemu-system-x86_64 ... \ - -netdev tap,id=hostnet1,vhost=on,vhostfd= \ - -device virtio-net-pci, netdev=hostnet1,id=net1 \ - -netdev tap,id=hostnet2,vhost=on,vhostfd= \ - -device virtio-net-pci, netdev=hostnet2,id=net1 - -For vhost user: - -.. code-block:: console - - user@target:~$ qemu-system-x86_64 ... \ - -chardev socket,id=char1,path= \ - -netdev type=vhost-user,id=hostnet1,chardev=char1 \ - -device virtio-net-pci,netdev=hostnet1,id=net1 \ - -chardev socket,id=char2,path= \ - -netdev type=vhost-user,id=hostnet2,chardev=char2 \ - -device virtio-net-pci,netdev=hostnet2,id=net2 - -sock_path is the path for the socket file created by vhost. - -Compiling the Sample Code -------------------------- -#. Compile vhost lib: - - To enable vhost, turn on vhost library in the configure file config/common_linuxapp. - - .. code-block:: console - - CONFIG_RTE_LIBRTE_VHOST=n - - vhost user is turned on by default in the lib/librte_vhost/Makefile. - To enable vhost cuse, uncomment vhost cuse and comment vhost user manually. In future, a configure will be created for switch between two implementations. - - .. code-block:: console - - SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c - #SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c vhost_user/virtio-net-user.c vhost_user/fd_man.c - - After vhost is enabled and the implementation is selected, build the vhost library. - -#. Go to the examples directory: - - .. code-block:: console - - export RTE_SDK=/path/to/rte_sdk - cd ${RTE_SDK}/examples/vhost - -#. Set the target (a default target is used if not specified). For example: - - .. code-block:: console - - export RTE_TARGET=x86_64-native-linuxapp-gcc - - See the DPDK Getting Started Guide for possible RTE_TARGET values. - -#. Build the application: - - .. code-block:: console - - cd ${RTE_SDK} - make config ${RTE_TARGET} - make install ${RTE_TARGET} - cd ${RTE_SDK}/examples/vhost - make - -#. Go to the eventfd_link directory(vhost cuse required): - - .. code-block:: console - - cd ${RTE_SDK}/lib/librte_vhost/eventfd_link - -#. Build the eventfd_link kernel module(vhost cuse required): - - .. code-block:: console - - make - -Running the Sample Code ------------------------ - -#. Install the cuse kernel module(vhost cuse required): - - .. code-block:: console - - modprobe cuse - -#. Go to the eventfd_link directory(vhost cuse required): - - .. code-block:: console - - export RTE_SDK=/path/to/rte_sdk - cd ${RTE_SDK}/lib/librte_vhost/eventfd_link - -#. Install the eventfd_link module(vhost cuse required): - - .. code-block:: console - - insmod ./eventfd_link.ko - -#. Go to the examples directory: - - .. code-block:: console - - export RTE_SDK=/path/to/rte_sdk - cd ${RTE_SDK}/examples/vhost - -#. Run the vhost-switch sample code: - - vhost cuse: +This section shows the steps how to test a typical PVP case with this +vhost-switch sample, whereas packets are received from the physical NIC +port first and enqueued to the VM's Rx queue. Through the guest testpmd's +default forwarding mode (io forward), those packets will be put into +the Tx queue. The vhost-switch example, in turn, gets the packets and +puts back to the same physical NIC port. - .. code-block:: console +Build +~~~~~ - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost --dev-index 1 +To compile the sample application see :doc:`compiling`. - vhost user: a socket file named usvhost will be created under current directory. Use its path as the socket path in guest's qemu commandline. - - .. code-block:: console - - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost +The application is located in the ``vhost`` sub-directory. .. note:: + In this example, you need build DPDK both on the host and inside guest. - Please note the huge-dir parameter instructs the DPDK to allocate its memory from the 2 MB page hugetlbfs. - -Parameters -~~~~~~~~~~ - -**Basename and Index.** -vhost cuse uses a Linux* character device to communicate with QEMU. -The basename and the index are used to generate the character devices name. - - /dev/- - -The index parameter is provided for a situation where multiple instances of the virtual switch is required. - -For compatibility with the QEMU wrapper script, a base name of "usvhost" and an index of "1" should be used: - -.. code-block:: console - - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost --dev-index 1 - -**vm2vm.** -The vm2vm parameter disable/set mode of packet switching between guests in the host. -Value of "0" means disabling vm2vm implies that on virtual machine packet transmission will always go to the Ethernet port; -Value of "1" means software mode packet forwarding between guests, it needs packets copy in vHOST, -so valid only in one-copy implementation, and invalid for zero copy implementation; -value of "2" means hardware mode packet forwarding between guests, it allows packets go to the Ethernet port, -hardware L2 switch will determine which guest the packet should forward to or need send to external, -which bases on the packet destination MAC address and VLAN tag. - -.. code-block:: console - - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --vm2vm [0,1,2] - -**Mergeable Buffers.** -The mergeable buffers parameter controls how virtio-net descriptors are used for virtio-net headers. -In a disabled state, one virtio-net header is used per packet buffer; -in an enabled state one virtio-net header is used for multiple packets. -The default value is 0 or disabled since recent kernels virtio-net drivers show performance degradation with this feature is enabled. +Start the vswitch example +~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: console - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --mergeable [0,1] - -**Stats.** -The stats parameter controls the printing of virtio-net device statistics. -The parameter specifies an interval second to print statistics, with an interval of 0 seconds disabling statistics. - -.. code-block:: console + ./vhost-switch -l 0-3 -n 4 --socket-mem 1024 \ + -- --socket-file /tmp/sock0 --client \ + ... - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --stats [0,n] - -**RX Retry.** -The rx-retry option enables/disables enqueue retries when the guests RX queue is full. -This feature resolves a packet loss that is observed at high data-rates, -by allowing it to delay and retry in the receive path. -This option is enabled by default. - -.. code-block:: console - - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry [0,1] - -**RX Retry Number.** -The rx-retry-num option specifies the number of retries on an RX burst, -it takes effect only when rx retry is enabled. -The default value is 4. - -.. code-block:: console - - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry 1 --rx-retry-num 5 - -**RX Retry Delay Time.** -The rx-retry-delay option specifies the timeout (in micro seconds) between retries on an RX burst, -it takes effect only when rx retry is enabled. -The default value is 15. - -.. code-block:: console - - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry 1 --rx-retry-delay 20 - -**Zero copy.** -The zero copy option enables/disables the zero copy mode for RX/TX packet, -in the zero copy mode the packet buffer address from guest translate into host physical address -and then set directly as DMA address. -If the zero copy mode is disabled, then one copy mode is utilized in the sample. -This option is disabled by default. - -.. code-block:: console - - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy [0,1] - -**RX descriptor number.** -The RX descriptor number option specify the Ethernet RX descriptor number, -Linux legacy virtio-net has different behaviour in how to use the vring descriptor from DPDK based virtio-net PMD, -the former likely allocate half for virtio header, another half for frame buffer, -while the latter allocate all for frame buffer, -this lead to different number for available frame buffer in vring, -and then lead to different Ethernet RX descriptor number could be used in zero copy mode. -So it is valid only in zero copy mode is enabled. The value is 32 by default. - -.. code-block:: console +Check the `Parameters`_ section for the explanations on what do those +parameters mean. - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy 1 --rx-desc-num [0, n] +.. _vhost_app_run_vm: -**TX descriptornumber.** -The TX descriptor number option specify the Ethernet TX descriptor number, it is valid only in zero copy mode is enabled. -The value is 64 by default. +Start the VM +~~~~~~~~~~~~ .. code-block:: console - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy 1 --tx-desc-num [0, n] - -**VLAN strip.** -The VLAN strip option enable/disable the VLAN strip on host, if disabled, the guest will receive the packets with VLAN tag. -It is enabled by default. - -.. code-block:: console - - user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --vlan-strip [0, 1] - -Running the Virtual Machine (QEMU) ----------------------------------- - -QEMU must be executed with specific parameters to: - -* Ensure the guest is configured to use virtio-net network adapters. - - .. code-block:: console - - user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1,id=net1 ... - -* Ensure the guest's virtio-net network adapter is configured with offloads disabled. - - .. code-block:: console - - user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1,id=net1,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off - -* Redirect QEMU to communicate with the DPDK vhost-net sample code in place of the vhost-net kernel module(vhost cuse). - - .. code-block:: console - - user@target:~$ qemu-system-x86_64 ... -netdev tap,id=hostnet1,vhost=on,vhostfd= ... - -* Enable the vhost-net sample code to map the VM's memory into its own process address space. - - .. code-block:: console - - user@target:~$ qemu-system-x86_64 ... -mem-prealloc -mem-path / dev/hugepages ... - -.. note:: - - The QEMU wrapper (qemu-wrap.py) is a Python script designed to automate the QEMU configuration described above. - It also facilitates integration with libvirt, although the script may also be used standalone without libvirt. - -Redirecting QEMU to vhost-net Sample Code(vhost cuse) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To redirect QEMU to the vhost-net sample code implementation of the vhost-net API, -an open file descriptor must be passed to QEMU running as a child process. - -.. code-block:: python - - #!/usr/bin/python - fd = os.open("/dev/usvhost-1", os.O_RDWR) - subprocess.call("qemu-system-x86_64 ... . -netdev tap,id=vhostnet0,vhost=on,vhostfd=" + fd +"...", shell=True) + qemu-system-x86_64 -machine accel=kvm -cpu host \ + -m $mem -object memory-backend-file,id=mem,size=$mem,mem-path=/dev/hugepages,share=on \ + -mem-prealloc -numa node,memdev=mem \ + \ + -chardev socket,id=char1,path=/tmp/sock0,server \ + -netdev type=vhost-user,id=hostnet1,chardev=char1 \ + -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:00:00:14 \ + ... .. note:: + For basic vhost-user support, QEMU 2.2 (or above) is required. For + some specific features, a higher version might be need. Such as + QEMU 2.7 (or above) for the reconnect feature. - This process is automated in the QEMU wrapper script discussed in Section 24.7.3. - -Mapping the Virtual Machine's Memory -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -For the DPDK vhost-net sample code to be run correctly, QEMU must allocate the VM's memory on hugetlbfs. -This is done by specifying mem-prealloc and mem-path when executing QEMU. -The vhost-net sample code accesses the virtio-net device's virtual rings and packet buffers -by finding and mapping the VM's physical memory on hugetlbfs. -In this case, the path passed to the guest should be that of the 1 GB page hugetlbfs: - -.. code-block:: console - - user@target:~$ qemu-system-x86_64 ... -mem-prealloc -mem-path / dev/hugepages ... - -.. note:: - - This process is automated in the QEMU wrapper script discussed in Section 24.7.3. - The following two sections only applies to vhost cuse. For vhost-user, please make corresponding changes to qemu-wrapper script and guest XML file. - -QEMU Wrapper Script -~~~~~~~~~~~~~~~~~~~ - -The QEMU wrapper script automatically detects and calls QEMU with the necessary parameters required -to integrate with the vhost sample code. -It performs the following actions: - -* Automatically detects the location of the hugetlbfs and inserts this into the command line parameters. - -* Automatically open file descriptors for each virtio-net device and inserts this into the command line parameters. - -* Disables offloads on each virtio-net device. +.. _vhost_app_run_dpdk_inside_guest: -* Calls Qemu passing both the command line parameters passed to the script itself and those it has auto-detected. +Run testpmd inside guest +~~~~~~~~~~~~~~~~~~~~~~~~ -The QEMU wrapper script will automatically configure calls to QEMU: +Make sure you have DPDK built inside the guest. Also make sure the +corresponding virtio-net PCI device is bond to a uio driver, which +could be done by: .. code-block:: console - user@target:~$ qemu-wrap.py -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu SandyBridge -smp 4,sockets=4,cores=1,threads=1 - -netdev tap,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1 -hda -m 4096 + modprobe uio_pci_generic + $RTE_SDK/usertools/dpdk-devbind.py -b=uio_pci_generic 0000:00:04.0 -which will become the following call to QEMU: +Then start testpmd for packet forwarding testing. .. code-block:: console - /usr/local/bin/qemu-system-x86_64 -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu SandyBridge -smp 4,sockets=4,cores=1,threads=1 - -netdev tap,id=hostnet1,vhost=on,vhostfd= -device virtio-net-pci,netdev=hostnet1,id=net1, - csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off -hda -m 4096 -mem-path /dev/hugepages -mem-prealloc + ./x86_64-native-gcc/app/testpmd -l 0-1 -- -i + > start tx_first -Libvirt Integration -~~~~~~~~~~~~~~~~~~~ +Inject packets +-------------- -The QEMU wrapper script (qemu-wrap.py) "wraps" libvirt calls to QEMU, -such that QEMU is called with the correct parameters described above. -To call the QEMU wrapper automatically from libvirt, the following configuration changes must be made: +While a virtio-net is connected to vhost-switch, a VLAN tag starts with +1000 is assigned to it. So make sure configure your packet generator +with the right MAC and VLAN tag, you should be able to see following +log from the vhost-switch console. It means you get it work:: -* Place the QEMU wrapper script in libvirt's binary search PATH ($PATH). - A good location is in the directory that contains the QEMU binary. + VHOST_DATA: (0) mac 52:54:00:00:00:14 and vlan 1000 registered -* Ensure that the script has the same owner/group and file permissions as the QEMU binary. -* Update the VM xml file using virsh edit : +.. _vhost_app_parameters: - * Set the VM to use the launch script - - * Set the emulator path contained in the # tags For example, - replace /usr/bin/qemu-kvm with /usr/bin/qemu-wrap.py - - * Set the VM's virtio-net device's to use vhost-net offload: - - .. code-block:: xml - - - - - - - * Enable libvirt to access the DPDK Vhost sample code's character device file by adding it - to controllers cgroup for libvirtd using the following steps: - - .. code-block:: xml - - cgroup_controllers = [ ... "devices", ... ] clear_emulator_capabilities = 0 - user = "root" group = "root" - cgroup_device_acl = [ - "/dev/null", "/dev/full", "/dev/zero", - "/dev/random", "/dev/urandom", - "/dev/ptmx", "/dev/kvm", "/dev/kqemu", - "/dev/rtc", "/dev/hpet", "/dev/net/tun", - "/dev/-", - ] - -* Disable SELinux or set to permissive mode. - - -* Mount cgroup device controller: - - .. code-block:: console - - user@target:~$ mkdir /dev/cgroup - user@target:~$ mount -t cgroup none /dev/cgroup -o devices - -* Restart the libvirtd system process - - For example, on Fedora* "systemctl restart libvirtd.service" +Parameters +---------- -* Edit the configuration parameters section of the script: +**--socket-file path** +Specifies the vhost-user socket file path. - * Configure the "emul_path" variable to point to the QEMU emulator. +**--client** +DPDK vhost-user will act as the client mode when such option is given. +In the client mode, QEMU will create the socket file. Otherwise, DPDK +will create it. Put simply, it's the server to create the socket file. - .. code-block:: xml - emul_path = "/usr/local/bin/qemu-system-x86_64" +**--vm2vm mode** +The vm2vm parameter sets the mode of packet switching between guests in +the host. - * Configure the "us_vhost_path" variable to point to the DPDK vhost-net sample code's character devices name. - DPDK vhost-net sample code's character device will be in the format "/dev/-". +- 0 disables vm2vm, impling that VM's packets will always go to the NIC port. +- 1 means a normal mac lookup packet routing. +- 2 means hardware mode packet forwarding between guests, it allows packets + go to the NIC port, hardware L2 switch will determine which guest the + packet should forward to or need send to external, which bases on the + packet destination MAC address and VLAN tag. - .. code-block:: xml +**--mergeable 0|1** +Set 0/1 to disable/enable the mergeable Rx feature. It's disabled by default. - us_vhost_path = "/dev/usvhost-1" +**--stats interval** +The stats parameter controls the printing of virtio-net device statistics. +The parameter specifies an interval (in unit of seconds) to print statistics, +with an interval of 0 seconds disabling statistics. + +**--rx-retry 0|1** +The rx-retry option enables/disables enqueue retries when the guests Rx queue +is full. This feature resolves a packet loss that is observed at high data +rates, by allowing it to delay and retry in the receive path. This option is +enabled by default. + +**--rx-retry-num num** +The rx-retry-num option specifies the number of retries on an Rx burst, it +takes effect only when rx retry is enabled. The default value is 4. + +**--rx-retry-delay msec** +The rx-retry-delay option specifies the timeout (in micro seconds) between +retries on an RX burst, it takes effect only when rx retry is enabled. The +default value is 15. + +**--dequeue-zero-copy** +Dequeue zero copy will be enabled when this option is given. it is worth to +note that if NIC is binded to driver with iommu enabled, dequeue zero copy +cannot work at VM2NIC mode (vm2vm=0) due to currently we don't setup iommu +dma mapping for guest memory. + +**--vlan-strip 0|1** +VLAN strip option is removed, because different NICs have different behaviors +when disabling VLAN strip. Such feature, which heavily depends on hardware, +should be removed from this example to reduce confusion. Now, VLAN strip is +enabled and cannot be disabled. Common Issues -~~~~~~~~~~~~~ - -* QEMU failing to allocate memory on hugetlbfs, with an error like the following:: - - file_ram_alloc: can't mmap RAM pages: Cannot allocate memory - - When running QEMU the above error indicates that it has failed to allocate memory for the Virtual Machine on - the hugetlbfs. This is typically due to insufficient hugepages being free to support the allocation request. - The number of free hugepages can be checked as follows: - - .. code-block:: console - - cat /sys/kernel/mm/hugepages/hugepages-/nr_hugepages - - The command above indicates how many hugepages are free to support QEMU's allocation request. - -* User space VHOST when the guest has 2MB sized huge pages: - - The guest may have 2MB or 1GB sized huge pages. The user space VHOST should work properly in both cases. - -* User space VHOST will not work with QEMU without the ``-mem-prealloc`` option: - - The current implementation works properly only when the guest memory is pre-allocated, so it is required to - use a QEMU version (e.g. 1.6) which supports ``-mem-prealloc``. The ``-mem-prealloc`` option must be - specified explicitly in the QEMU command line. - -* User space VHOST will not work with a QEMU version without shared memory mapping: - - As shared memory mapping is mandatory for user space VHOST to work properly with the guest, user space VHOST - needs access to the shared memory from the guest to receive and transmit packets. It is important to make sure - the QEMU version supports shared memory mapping. - -* Issues with ``virsh destroy`` not destroying the VM: - - Using libvirt ``virsh create`` the ``qemu-wrap.py`` spawns a new process to run ``qemu-kvm``. This impacts the behavior - of ``virsh destroy`` which kills the process running ``qemu-wrap.py`` without actually destroying the VM (it leaves - the ``qemu-kvm`` process running): - - This following patch should fix this issue: - http://dpdk.org/ml/archives/dev/2014-June/003607.html - -* In an Ubuntu environment, QEMU fails to start a new guest normally with user space VHOST due to not being able - to allocate huge pages for the new guest: - - The solution for this issue is to add ``-boot c`` into the QEMU command line to make sure the huge pages are - allocated properly and then the guest should start normally. - - Use ``cat /proc/meminfo`` to check if there is any changes in the value of ``HugePages_Total`` and ``HugePages_Free`` - after the guest startup. - -* Log message: ``eventfd_link: module verification failed: signature and/or required key missing - tainting kernel``: - - This log message may be ignored. The message occurs due to the kernel module ``eventfd_link``, which is not a standard - Linux module but which is necessary for the user space VHOST current implementation (CUSE-based) to communicate with - the guest. - - -Running DPDK in the Virtual Machine ------------------------------------ - -For the DPDK vhost-net sample code to switch packets into the VM, -the sample code must first learn the MAC address of the VM's virtio-net device. -The sample code detects the address from packets being transmitted from the VM, similar to a learning switch. - -This behavior requires no special action or configuration with the Linux* virtio-net driver in the VM -as the Linux* Kernel will automatically transmit packets during device initialization. -However, DPDK-based applications must be modified to automatically transmit packets during initialization -to facilitate the DPDK vhost- net sample code's MAC learning. - -The DPDK testpmd application can be configured to automatically transmit packets during initialization -and to act as an L2 forwarding switch. - -Testpmd MAC Forwarding -~~~~~~~~~~~~~~~~~~~~~~ - -At high packet rates, a minor packet loss may be observed. -To resolve this issue, a "wait and retry" mode is implemented in the testpmd and vhost sample code. -In the "wait and retry" mode if the virtqueue is found to be full, then testpmd waits for a period of time before retrying to enqueue packets. - -The "wait and retry" algorithm is implemented in DPDK testpmd as a forwarding method call "mac_retry". -The following sequence diagram describes the algorithm in detail. - -.. _figure_20: - -**Figure 20. Packet Flow on TX in DPDK-testpmd** - -.. image23_png has been renamed - -|tx_dpdk_testpmd| - -Running Testpmd -~~~~~~~~~~~~~~~ - -The testpmd application is automatically built when DPDK is installed. -Run the testpmd application as follows: - -.. code-block:: console - - user@target:~$ x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -- n 4 -socket-mem 128 -- --burst=64 -i - -The destination MAC address for packets transmitted on each port can be set at the command line: - -.. code-block:: console - - user@target:~$ x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -- n 4 -socket-mem 128 -- --burst=64 -i --eth- peer=0,aa:bb:cc:dd:ee:ff --eth-peer=1,ff,ee,dd,cc,bb,aa - -* Packets received on port 1 will be forwarded on port 0 to MAC address - - aa:bb:cc:dd:ee:ff. - -* Packets received on port 0 will be forwarded on port 1 to MAC address - - ff,ee,dd,cc,bb,aa. - -The testpmd application can then be configured to act as an L2 forwarding application: - -.. code-block:: console - - testpmd> set fwd mac_retry - -The testpmd can then be configured to start processing packets, -transmitting packets first so the DPDK vhost sample code on the host can learn the MAC address: - -.. code-block:: console - - testpmd> start tx_first - -.. note:: - - Please note "set fwd mac_retry" is used in place of "set fwd mac_fwd" to ensure the retry feature is activated. - -Passing Traffic to the Virtual Machine Device ---------------------------------------------- - -For a virtio-net device to receive traffic, -the traffic's Layer 2 header must include both the virtio-net device's MAC address and VLAN tag. -The DPDK sample code behaves in a similar manner to a learning switch in that -it learns the MAC address of the virtio-net devices from the first transmitted packet. -On learning the MAC address, -the DPDK vhost sample code prints a message with the MAC address and VLAN tag virtio-net device. -For example: +------------- -.. code-block:: console +* QEMU fails to allocate memory on hugetlbfs, with an error like the + following:: - DATA: (0) MAC_ADDRESS cc:bb:bb:bb:bb:bb and VLAN_TAG 1000 registered + file_ram_alloc: can't mmap RAM pages: Cannot allocate memory -The above message indicates that device 0 has been registered with MAC address cc:bb:bb:bb:bb:bb and VLAN tag 1000. -Any packets received on the NIC with these values is placed on the devices receive queue. -When a virtio-net device transmits packets, the VLAN tag is added to the packet by the DPDK vhost sample code. + When running QEMU the above error indicates that it has failed to allocate + memory for the Virtual Machine on the hugetlbfs. This is typically due to + insufficient hugepages being free to support the allocation request. The + number of free hugepages can be checked as follows: -.. |vhost_net_arch| image:: img/vhost_net_arch.* + .. code-block:: console -.. |qemu_virtio_net| image:: img/qemu_virtio_net.* + cat /sys/kernel/mm/hugepages/hugepages-/nr_hugepages -.. |tx_dpdk_testpmd| image:: img/tx_dpdk_testpmd.* + The command above indicates how many hugepages are free to support QEMU's + allocation request. -.. |vhost_net_sample_app| image:: img/vhost_net_sample_app.* +* Failed to build DPDK in VM -.. |virtio_linux_vhost| image:: img/virtio_linux_vhost.* + Make sure "-cpu host" QEMU option is given.