This reconnect option is enabled by default. However, it can be turned off
by setting this flag.
- - ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY``
+ - ``RTE_VHOST_USER_IOMMU_SUPPORT``
- Dequeue zero copy will be enabled when this flag is set. It is disabled by
+ IOMMU support will be enabled when this flag is set. It is disabled by
default.
- There are some truths (including limitations) you might want to know while
- setting this flag:
+ Enabling this flag makes possible to use guest vIOMMU to protect vhost
+ from accessing memory the virtio device isn't allowed to, when the feature
+ is negotiated and an IOMMU device is declared.
- * zero copy is not good for small packets (typically for packet size below
- 512).
+ - ``RTE_VHOST_USER_POSTCOPY_SUPPORT``
- * zero copy is really good for VM2VM case. For iperf between two VMs, the
- boost could be above 70% (when TSO is enableld).
+ Postcopy live-migration support will be enabled when this flag is set.
+ It is disabled by default.
- * for VM2NIC case, the ``nb_tx_desc`` has to be small enough: <= 64 if virtio
- indirect feature is not enabled and <= 128 if it is enabled.
+ Enabling this flag should only be done when the calling application does
+ not pre-fault the guest shared memory, otherwise migration would fail.
- This is because when dequeue zero copy is enabled, guest Tx used vring will
- be updated only when corresponding mbuf is freed. Thus, the nb_tx_desc
- has to be small enough so that the PMD driver will run out of available
- Tx descriptors and free mbufs timely. Otherwise, guest Tx vring would be
- starved.
+ - ``RTE_VHOST_USER_LINEARBUF_SUPPORT``
- * Guest memory should be backended with huge pages to achieve better
- performance. Using 1G page size is the best.
+ Enabling this flag forces vhost dequeue function to only provide linear
+ pktmbuf (no multi-segmented pktmbuf).
- When dequeue zero copy is enabled, the guest phys address and host phys
- address mapping has to be established. Using non-huge pages means far
- more page segments. To make it simple, DPDK vhost does a linear search
- of those segments, thus the fewer the segments, the quicker we will get
- the mapping. NOTE: we may speed it by using tree searching in future.
+ The vhost library by default provides a single pktmbuf for given a
+ packet, but if for some reason the data doesn't fit into a single
+ pktmbuf (e.g., TSO is enabled), the library will allocate additional
+ pktmbufs from the same mempool and chain them together to create a
+ multi-segmented pktmbuf.
- - ``RTE_VHOST_USER_IOMMU_SUPPORT``
+ However, the vhost application needs to support multi-segmented format.
+ If the vhost application does not support that format and requires large
+ buffers to be dequeue, this flag should be enabled to force only linear
+ buffers (see RTE_VHOST_USER_EXTBUF_SUPPORT) or drop the packet.
- IOMMU support will be enabled when this flag is set. It is disabled by
- default.
+ It is disabled by default.
- Enabling this flag makes possible to use guest vIOMMU to protect vhost
- from accessing memory the virtio device isn't allowed to, when the feature
- is negotiated and an IOMMU device is declared.
+ - ``RTE_VHOST_USER_EXTBUF_SUPPORT``
+
+ Enabling this flag allows vhost dequeue function to allocate and attach
+ an external buffer to a pktmbuf if the pkmbuf doesn't provide enough
+ space to store all data.
+
+ This is useful when the vhost application wants to support large packets
+ but doesn't want to increase the default mempool object size nor to
+ support multi-segmented mbufs (non-linear). In this case, a fresh buffer
+ is allocated using rte_malloc() which gets attached to a pktmbuf using
+ rte_pktmbuf_attach_extbuf().
+
+ See RTE_VHOST_USER_LINEARBUF_SUPPORT as well to disable multi-segmented
+ mbufs for application that doesn't support chained mbufs.
+
+ It is disabled by default.
+
+ - ``RTE_VHOST_USER_ASYNC_COPY``
+
+ Asynchronous data path will be enabled when this flag is set. Async data
+ path allows applications to register async copy devices (typically
+ hardware DMA channels) to the vhost queues. Vhost leverages the copy
+ device registered to free CPU from memory copy operations. A set of
+ async data path APIs are defined for DPDK applications to make use of
+ the async capability. Only packets enqueued/dequeued by async APIs are
+ processed through the async data path.
- However, this feature enables vhost-user's reply-ack protocol feature,
- which implementation is buggy in Qemu v2.7.0-v2.9.0 when doing multiqueue.
- Enabling this flag with these Qemu version results in Qemu being blocked
- when multiple queue pairs are declared.
+ Currently this feature is only implemented on split ring enqueue data
+ path.
+
+ It is disabled by default.
* ``rte_vhost_driver_set_features(path, features)``
Enable or disable zero copy feature of the vhost crypto backend.
+* ``rte_vhost_async_channel_register(vid, queue_id, features, ops)``
+
+ Register a vhost queue with async copy device channel.
+ Following device ``features`` must be specified together with the
+ registration:
+
+ * ``async_inorder``
+
+ Async copy device can guarantee the ordering of copy completion
+ sequence. Copies are completed in the same order with that at
+ the submission time.
+
+ Currently, only ``async_inorder`` capable device is supported by vhost.
+
+ * ``async_threshold``
+
+ The copy length (in bytes) below which CPU copy will be used even if
+ applications call async vhost APIs to enqueue/dequeue data.
+
+ Typical value is 512~1024 depending on the async device capability.
+
+ Applications must provide following ``ops`` callbacks for vhost lib to
+ work with the async copy devices:
+
+ * ``transfer_data(vid, queue_id, descs, opaque_data, count)``
+
+ vhost invokes this function to submit copy data to the async devices.
+ For non-async_inorder capable devices, ``opaque_data`` could be used
+ for identifying the completed packets.
+
+ * ``check_completed_copies(vid, queue_id, opaque_data, max_packets)``
+
+ vhost invokes this function to get the copy data completed by async
+ devices.
+
+* ``rte_vhost_async_channel_unregister(vid, queue_id)``
+
+ Unregister the async copy device channel from a vhost queue.
+
+* ``rte_vhost_submit_enqueue_burst(vid, queue_id, pkts, count, comp_pkts, comp_count)``
+
+ Submit an enqueue request to transmit ``count`` packets from host to guest
+ by async data path. Successfully enqueued packets can be transfer completed
+ or being occupied by DMA engines; transfer completed packets are returned in
+ ``comp_pkts``, but others are not guaranteed to finish, when this API
+ call returns.
+
+ Applications must not free the packets submitted for enqueue until the
+ packets are completed.
+
+* ``rte_vhost_poll_enqueue_completed(vid, queue_id, pkts, count)``
+
+ Poll enqueue completion status from async data path. Completed packets
+ are returned to applications through ``pkts``.
+
Vhost-user Implementations
--------------------------
* Memory pre-allocation
- For non-zerocopy, guest memory pre-allocation is not a must. This can help
- save of memory. If users really want the guest memory to be pre-allocated
- (e.g., for performance reason), we can add option ``-mem-prealloc`` when
- starting QEMU. Or, we can lock all memory at vhost side which will force
- memory to be allocated when mmap at vhost side; option --mlockall in
- ovs-dpdk is an example in hand.
+ For non-async data path, guest memory pre-allocation is not a
+ must. This can help save of memory. If users really want the guest memory
+ to be pre-allocated (e.g., for performance reason), we can add option
+ ``-mem-prealloc`` when starting QEMU. Or, we can lock all memory at vhost
+ side which will force memory to be allocated when mmap at vhost side;
+ option --mlockall in ovs-dpdk is an example in hand.
- For zerocopy, we force the VM memory to be pre-allocated at vhost lib when
- mapping the guest memory; and also we need to lock the memory to prevent
- pages being swapped out to disk.
+ For async data path, we force the VM memory to be pre-allocated at vhost
+ lib when mapping the guest memory; and also we need to lock the memory to
+ prevent pages being swapped out to disk.
* Memory sharing
For more vhost details and how to support vhost in vSwitch, please refer to
the vhost example in the DPDK Sample Applications Guide.
+
+Vhost data path acceleration (vDPA)
+-----------------------------------
+
+vDPA supports selective datapath in vhost-user lib by enabling virtio ring
+compatible devices to serve virtio driver directly for datapath acceleration.
+
+``rte_vhost_driver_attach_vdpa_device`` is used to configure the vhost device
+with accelerated backend.
+
+Also vhost device capabilities are made configurable to adopt various devices.
+Such capabilities include supported features, protocol features, queue number.
+
+Finally, a set of device ops is defined for device specific operations:
+
+* ``get_queue_num``
+
+ Called to get supported queue number of the device.
+
+* ``get_features``
+
+ Called to get supported features of the device.
+
+* ``get_protocol_features``
+
+ Called to get supported protocol features of the device.
+
+* ``dev_conf``
+
+ Called to configure the actual device when the virtio device becomes ready.
+
+* ``dev_close``
+
+ Called to close the actual device when the virtio device is stopped.
+
+* ``set_vring_state``
+
+ Called to change the state of the vring in the actual device when vring state
+ changes.
+
+* ``set_features``
+
+ Called to set the negotiated features to device.
+
+* ``migration_done``
+
+ Called to allow the device to response to RARP sending.
+
+* ``get_vfio_group_fd``
+
+ Called to get the VFIO group fd of the device.
+
+* ``get_vfio_device_fd``
+
+ Called to get the VFIO device fd of the device.
+
+* ``get_notify_area``
+
+ Called to get the notify area info of the queue.