* Know all the necessary information about the vring:
Information such as where the available ring is stored. Vhost defines some
- messages to tell the backend all the information it needs to know how to
- manipulate the vring.
-
-Currently, there are two ways to pass these messages and as a result there are
-two Vhost implementations in DPDK: *vhost-cuse* (where the character devices
-are in user space) and *vhost-user*.
-
-Vhost-cuse creates a user space character device and hook to a function ioctl,
-so that all ioctl commands that are sent from the frontend (QEMU) will be
-captured and handled.
-
-Vhost-user creates a Unix domain socket file through which messages are
-passed.
-
-.. Note::
-
- Since DPDK v2.2, the majority of the development effort has gone into
- enhancing vhost-user, such as multiple queue, live migration, and
- reconnect. Thus, it is strongly advised to use vhost-user instead of
- vhost-cuse.
+ messages (passed through a Unix domain socket file) to tell the backend all
+ the information it needs to know how to manipulate the vring.
Vhost API Overview
------------------
-The following is an overview of the Vhost API functions:
+The following is an overview of some key Vhost API functions:
* ``rte_vhost_driver_register(path, flags)``
- This function registers a vhost driver into the system. For vhost-cuse, a
- ``/dev/path`` character device file will be created. For vhost-user server
- mode, a Unix domain socket file ``path`` will be created.
+ This function registers a vhost driver into the system. ``path`` specifies
+ the Unix domain socket file path.
- Currently two flags are supported (these are valid for vhost-user only):
+ Currently supported flags are:
- ``RTE_VHOST_USER_CLIENT``
This reconnect option is enabled by default. However, it can be turned off
by setting this flag.
+ - ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY``
+
+ Dequeue zero copy will be enabled when this flag is set. It is disabled by
+ default.
+
+ There are some truths (including limitations) you might want to know while
+ setting this flag:
+
+ * zero copy is not good for small packets (typically for packet size below
+ 512).
+
+ * zero copy is really good for VM2VM case. For iperf between two VMs, the
+ boost could be above 70% (when TSO is enableld).
+
+ * for VM2NIC case, the ``nb_tx_desc`` has to be small enough: <= 64 if virtio
+ indirect feature is not enabled and <= 128 if it is enabled.
+
+ This is because when dequeue zero copy is enabled, guest Tx used vring will
+ be updated only when corresponding mbuf is freed. Thus, the nb_tx_desc
+ has to be small enough so that the PMD driver will run out of available
+ Tx descriptors and free mbufs timely. Otherwise, guest Tx vring would be
+ starved.
+
+ * Guest memory should be backended with huge pages to achieve better
+ performance. Using 1G page size is the best.
+
+ When dequeue zero copy is enabled, the guest phys address and host phys
+ address mapping has to be established. Using non-huge pages means far
+ more page segments. To make it simple, DPDK vhost does a linear search
+ of those segments, thus the fewer the segments, the quicker we will get
+ the mapping. NOTE: we may speed it by using tree searching in future.
+
+* ``rte_vhost_driver_set_features(path, features)``
+
+ This function sets the feature bits the vhost-user driver supports. The
+ vhost-user driver could be vhost-user net, yet it could be something else,
+ say, vhost-user SCSI.
+
* ``rte_vhost_driver_session_start()``
This function starts the vhost session loop to handle vhost messages. It
starts an infinite loop, therefore it should be called in a dedicated
thread.
-* ``rte_vhost_driver_callback_register(virtio_net_device_ops)``
+* ``rte_vhost_driver_callback_register(path, vhost_device_ops)``
This function registers a set of callbacks, to let DPDK applications take
the appropriate action when some events happen. The following events are
* ``new_device(int vid)``
- This callback is invoked when a virtio net device becomes ready. ``vid``
- is the virtio net device ID.
+ This callback is invoked when a virtio device becomes ready. ``vid``
+ is the vhost device ID.
* ``destroy_device(int vid)``
- This callback is invoked when a virtio net device shuts down (or when the
+ This callback is invoked when a virtio device shuts down (or when the
vhost connection is broken).
* ``vring_state_changed(int vid, uint16_t queue_id, int enable)``
Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``.
-* ``rte_vhost_feature_disable/rte_vhost_feature_enable(feature_mask)``
+* ``rte_vhost_driver_disable/enable_features(path, features))``
This function disables/enables some features. For example, it can be used to
disable mergeable buffers and TSO features, which both are enabled by
default.
-Vhost Implementations
----------------------
-
-Vhost-cuse implementation
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-When vSwitch registers the vhost driver, it will register a cuse device driver
-into the system and creates a character device file. This cuse driver will
-receive vhost open/release/IOCTL messages from the QEMU simulator.
-
-When the open call is received, the vhost driver will create a vhost device
-for the virtio device in the guest.
-
-When the ``VHOST_SET_MEM_TABLE`` ioctl is received, vhost searches the memory
-region to find the starting user space virtual address that maps the memory of
-the guest virtual machine. Through this virtual address and the QEMU pid,
-vhost can find the file QEMU uses to map the guest memory. Vhost maps this
-file into its address space, in this way vhost can fully access the guest
-physical memory, which means vhost could access the shared virtio ring and the
-guest physical address specified in the entry of the ring.
-
-The guest virtual machine tells the vhost whether the virtio device is ready
-for processing or is de-activated through the ``VHOST_NET_SET_BACKEND``
-message. The registered callback from vSwitch will be called.
-
-When the release call is made, vhost will destroy the device.
-
-Vhost-user implementation
-~~~~~~~~~~~~~~~~~~~~~~~~~
+Vhost-user Implementations
+--------------------------
Vhost-user uses Unix domain sockets for passing messages. This means the DPDK
vhost-user implementation has two options:
memory region and its file descriptor in the ancillary data of the message.
The file descriptor is used to map that region.
-There is no ``VHOST_NET_SET_BACKEND`` message as in vhost-cuse to signal
-whether the virtio device is ready or stopped. Instead,
``VHOST_SET_VRING_KICK`` is used as the signal to put the vhost device into
the data plane, and ``VHOST_GET_VRING_BASE`` is used as the signal to remove
the vhost device from the data plane.