+
.. BSD LICENSE
Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
All rights reserved.
Virtio networking (virtio-net) was developed as the Linux* KVM para-virtualized method for communicating network packets
between host and guest.
It was found that virtio-net performance was poor due to context switching and packet copying between host, guest, and QEMU.
-The following figure shows the system architecture for a virtio- based networking (virtio-net).
+The following figure shows the system architecture for a virtio-based networking (virtio-net).
.. _figure_16:
The DPDK vhost-net sample code demonstrates KVM (QEMU) offloading the servicing of a Virtual Machine's (VM's)
virtio-net devices to a DPDK-based application in place of the kernel's vhost-net module.
-The DPDK vhost-net sample code is a simple packet switching application with the following features:
+The DPDK vhost-net sample code is based on vhost library. Vhost library is developed for user space ethernet switch to
+easily integrate with vhost functionality.
+
+The vhost library implements the following features:
* Management of virtio-net device creation/destruction events.
-* Mapping of the VM's physical memory into the DPDK vhost-net sample code's address space.
+* Mapping of the VM's physical memory into the DPDK vhost-net's address space.
* Triggering/receiving notifications to/from VMs via eventfds.
* A virtio-net back-end implementation providing a subset of virtio-net features.
+There are two vhost implementations in vhost library, vhost cuse and vhost user. In vhost cuse, a character device driver is implemented to
+receive and process vhost requests through ioctl messages. In vhost user, a socket server is created to received vhost requests through
+socket messages. Most of the messages share the same handler routine.
+
+.. note::
+ **Any vhost cuse specific requirement in the following sections will be emphasized**.
+
+Two impelmentations are turned on and off statically through configure file. Only one implementation could be turned on. They don't co-exist in current implementation.
+
+The vhost sample code application is a simple packet switching application with the following feature:
+
* Packet switching between virtio-net devices and the network interface card,
including using VMDQs to reduce the switching that needs to be performed in software.
-The following figure shows the architecture of the Vhost sample application.
+The following figure shows the architecture of the Vhost sample application based on vhost-cuse.
.. _figure_18:
* Fedora* 19
+* Fedora* 20
+
Prerequisites
-------------
This section lists prerequisite packages that must be installed.
-Installing Packages on the Host
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Installing Packages on the Host(vhost cuse required)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The vhost sample code uses the following packages; fuse, fuse-devel, and kernel- modules-extra.
+The vhost cuse code uses the following packages; fuse, fuse-devel, and kernel-modules-extra.
+The vhost user code don't rely on those modules as eventfds are already installed into vhost process through
+unix domain socket.
#. Install Fuse Development Libraries and headers:
yum -y install kernel-modules-extra
+QEMU simulator
+~~~~~~~~~~~~~~
+
+For vhost user, qemu 2.2 is required.
+
Setting up the Execution Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: console
- echo 256 > /sys/kernel/mm/hugepages/hugepages-2048kB/ nr_hugepages
+ echo 256 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
#. Mount hugetlbs at a separate mount point for 2 MB pages:
Observe that in the example, "-device" and "-netdev" are repeated for two virtio-net devices.
+For vhost cuse:
+
.. code-block:: console
user@target:~$ qemu-system-x86_64 ... \
-netdev tap,id=hostnet2,vhost=on,vhostfd=<open fd> \
-device virtio-net-pci, netdev=hostnet2,id=net1
+For vhost user:
+
+.. code-block:: console
+
+ user@target:~$ qemu-system-x86_64 ... \
+ -chardev socket,id=char1,path=<sock_path> \
+ -netdev type=vhost-user,id=hostnet1,chardev=char1 \
+ -device virtio-net-pci,netdev=hostnet1,id=net1 \
+ -chardev socket,id=char2,path=<sock_path> \
+ -netdev type=vhost-user,id=hostnet2,chardev=char2 \
+ -device virtio-net-pci,netdev=hostnet2,id=net2
+
+sock_path is the path for the socket file created by vhost.
Compiling the Sample Code
-------------------------
+#. Compile vhost lib:
-#. Go to the examples directory:
+ To enable vhost, turn on vhost library in the configure file config/common_linuxapp.
.. code-block:: console
- export RTE_SDK=/path/to/rte_sdk cd ${RTE_SDK}/examples/vhost-net
+ CONFIG_RTE_LIBRTE_VHOST=n
-#. Set the target (a default target is used if not specified). For example:
+ vhost user is turned on by default in the configure file config/common_linuxapp.
+ To enable vhost cuse, disable vhost user.
.. code-block:: console
- export RTE_TARGET=x86_64-native-linuxapp-gcc
+ CONFIG_RTE_LIBRTE_VHOST_USER=y
- See the DPDK Getting Started Guide for possible RTE_TARGET values.
+ After vhost is enabled and the implementation is selected, build the vhost library.
-#. Build the application:
+#. Go to the examples directory:
.. code-block:: console
- make
-
- .. note::
+ export RTE_SDK=/path/to/rte_sdk
+ cd ${RTE_SDK}/examples/vhost
- Note For zero copy, need firstly disable CONFIG_RTE_MBUF_SCATTER_GATHER,
- CONFIG_RTE_LIBRTE_IP_FRAG and CONFIG_RTE_LIBRTE_DISTRIBUTOR
- in the config file and then re-configure and compile the core lib, and then build the application:
+#. Set the target (a default target is used if not specified). For example:
.. code-block:: console
- vi ${RTE_SDK}/config/common_linuxapp
-
- change it as follows:
+ export RTE_TARGET=x86_64-native-linuxapp-gcc
- ::
+ See the DPDK Getting Started Guide for possible RTE_TARGET values.
- CONFIG_RTE_MBUF_SCATTER_GATHER=n
- CONFIG_RTE_LIBRTE_IP_FRAG=n
- CONFIG_RTE_LIBRTE_DISTRIBUTOR=n
+#. Build the application:
.. code-block:: console
cd ${RTE_SDK}/examples/vhost
make
-#. Go to the eventfd_link directory:
+#. Go to the eventfd_link directory(vhost cuse required):
.. code-block:: console
- cd ${RTE_SDK}/examples/vhost-net/eventfd_link
+ cd ${RTE_SDK}/lib/librte_vhost/eventfd_link
-#. Build the eventfd_link kernel module:
+#. Build the eventfd_link kernel module(vhost cuse required):
.. code-block:: console
Running the Sample Code
-----------------------
-#. Install the cuse kernel module:
+#. Install the cuse kernel module(vhost cuse required):
.. code-block:: console
modprobe cuse
-#. Go to the eventfd_link directory:
+#. Go to the eventfd_link directory(vhost cuse required):
.. code-block:: console
export RTE_SDK=/path/to/rte_sdk
- cd ${RTE_SDK}/examples/vhost-net/eventfd_link
+ cd ${RTE_SDK}/lib/librte_vhost/eventfd_link
-#. Install the eventfd_link module:
+#. Install the eventfd_link module(vhost cuse required):
.. code-block:: console
.. code-block:: console
export RTE_SDK=/path/to/rte_sdk
- cd ${RTE_SDK}/examples/vhost-net
+ cd ${RTE_SDK}/examples/vhost
#. Run the vhost-switch sample code:
+ vhost cuse:
+
.. code-block:: console
user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost --dev-index 1
+ vhost user: a socket file named usvhost will be created under current directory. Use its path as the socket path in guest's qemu commandline.
+
+ .. code-block:: console
+
+ user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost
+
.. note::
Please note the huge-dir parameter instructs the DPDK to allocate its memory from the 2 MB page hugetlbfs.
~~~~~~~~~~
**Basename and Index.**
-The DPDK vhost-net sample code uses a Linux* character device to communicate with QEMU.
+vhost cuse uses a Linux* character device to communicate with QEMU.
The basename and the index are used to generate the character devices name.
/dev/<basename>-<index>
user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy 1 --tx-desc-num [0, n]
+**VLAN strip.**
+The VLAN strip option enable/disable the VLAN strip on host, if disabled, the guest will receive the packets with VLAN tag.
+It is enabled by default.
+
+.. code-block:: console
+
+ user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --vlan-strip [0, 1]
+
Running the Virtual Machine (QEMU)
----------------------------------
.. code-block:: console
- user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci, netdev=hostnet1,id=net1 ...
+ user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1,id=net1 ...
* Ensure the guest's virtio-net network adapter is configured with offloads disabled.
.. code-block:: console
- user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci, netdev=hostnet1,id=net1,csum=off,gso=off,guest_tso4=off,guest_ tso6=off,guest_ecn=off
+ user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1,id=net1,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off
-* Redirect QEMU to communicate with the DPDK vhost-net sample code in place of the vhost-net kernel module.
+* Redirect QEMU to communicate with the DPDK vhost-net sample code in place of the vhost-net kernel module(vhost cuse).
.. code-block:: console
The QEMU wrapper (qemu-wrap.py) is a Python script designed to automate the QEMU configuration described above.
It also facilitates integration with libvirt, although the script may also be used standalone without libvirt.
-Redirecting QEMU to vhost-net Sample Code
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Redirecting QEMU to vhost-net Sample Code(vhost cuse)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To redirect QEMU to the vhost-net sample code implementation of the vhost-net API,
an open file descriptor must be passed to QEMU running as a child process.
.. note::
- This process is automated in the QEMU wrapper script discussed in Section 22.7.3.
+ This process is automated in the QEMU wrapper script discussed in Section 24.7.3.
Mapping the Virtual Machine's Memory
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. note::
- This process is automated in the QEMU wrapper script discussed in Section 22.7.3.
+ This process is automated in the QEMU wrapper script discussed in Section 24.7.3.
+ The following two sections only applies to vhost cuse. For vhost-user, please make corresponding changes to qemu-wrapper script and guest XML file.
QEMU Wrapper Script
~~~~~~~~~~~~~~~~~~~
.. code-block:: console
/usr/local/bin/qemu-system-x86_64 -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu SandyBridge -smp 4,sockets=4,cores=1,threads=1
- -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> -device virtio-net- pci,netdev=hostnet1,id=net1,
- csum=off,gso=off,guest_tso4=off,gu est_tso6=off,guest_ecn=off -hda <disk img> -m 4096 -mem-path /dev/hugepages -mem-prealloc
+ -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> -device virtio-net-pci,netdev=hostnet1,id=net1,
+ csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off -hda <disk img> -m 4096 -mem-path /dev/hugepages -mem-prealloc
Libvirt Integration
~~~~~~~~~~~~~~~~~~~
emul_path = "/usr/local/bin/qemu-system-x86_64"
- * Configure the "us_vhost_path" variable to point to the DPDK vhost- net sample code's character devices name.
+ * Configure the "us_vhost_path" variable to point to the DPDK vhost-net sample code's character devices name.
DPDK vhost-net sample code's character device will be in the format "/dev/<basename>-<index>".
.. code-block:: xml
Common Issues
~~~~~~~~~~~~~
-**QEMU failing to allocate memory on hugetlbfs.**
+* QEMU failing to allocate memory on hugetlbfs, with an error like the following::
-file_ram_alloc: can't mmap RAM pages: Cannot allocate memory
+ file_ram_alloc: can't mmap RAM pages: Cannot allocate memory
-When running QEMU the above error implies that it has failed to allocate memory for the Virtual Machine on the hugetlbfs.
-This is typically due to insufficient hugepages being free to support the allocation request.
-The number of free hugepages can be checked as follows:
+ When running QEMU the above error indicates that it has failed to allocate memory for the Virtual Machine on
+ the hugetlbfs. This is typically due to insufficient hugepages being free to support the allocation request.
+ The number of free hugepages can be checked as follows:
-.. code-block:: console
+ .. code-block:: console
+
+ cat /sys/kernel/mm/hugepages/hugepages-<pagesize>/nr_hugepages
+
+ The command above indicates how many hugepages are free to support QEMU's allocation request.
+
+* User space VHOST when the guest has 2MB sized huge pages:
+
+ The guest may have 2MB or 1GB sized huge pages. The user space VHOST should work properly in both cases.
+
+* User space VHOST will not work with QEMU without the ``-mem-prealloc`` option:
+
+ The current implementation works properly only when the guest memory is pre-allocated, so it is required to
+ use a QEMU version (e.g. 1.6) which supports ``-mem-prealloc``. The ``-mem-prealloc`` option must be
+ specified explicitly in the QEMU command line.
+
+* User space VHOST will not work with a QEMU version without shared memory mapping:
+
+ As shared memory mapping is mandatory for user space VHOST to work properly with the guest, user space VHOST
+ needs access to the shared memory from the guest to receive and transmit packets. It is important to make sure
+ the QEMU version supports shared memory mapping.
+
+* Issues with ``virsh destroy`` not destroying the VM:
+
+ Using libvirt ``virsh create`` the ``qemu-wrap.py`` spawns a new process to run ``qemu-kvm``. This impacts the behavior
+ of ``virsh destroy`` which kills the process running ``qemu-wrap.py`` without actually destroying the VM (it leaves
+ the ``qemu-kvm`` process running):
+
+ This following patch should fix this issue:
+ http://dpdk.org/ml/archives/dev/2014-June/003607.html
+
+* In an Ubuntu environment, QEMU fails to start a new guest normally with user space VHOST due to not being able
+ to allocate huge pages for the new guest:
+
+ The solution for this issue is to add ``-boot c`` into the QEMU command line to make sure the huge pages are
+ allocated properly and then the guest should start normally.
+
+ Use ``cat /proc/meminfo`` to check if there is any changes in the value of ``HugePages_Total`` and ``HugePages_Free``
+ after the guest startup.
+
+* Log message: ``eventfd_link: module verification failed: signature and/or required key missing - tainting kernel``:
- user@target:cat /sys/kernel/mm/hugepages/hugepages-<pagesize> / nr_hugepages
+ This log message may be ignored. The message occurs due to the kernel module ``eventfd_link``, which is not a standard
+ Linux module but which is necessary for the user space VHOST current implementation (CUSE-based) to communicate with
+ the guest.
-The command above indicates how many hugepages are free to support QEMU's allocation request.
Running DPDK in the Virtual Machine
-----------------------------------