X-Git-Url: http://git.droids-corp.org/?a=blobdiff_plain;f=doc%2Fguides%2Fprog_guide%2Fvhost_lib.rst;h=ba4c62aeb84426a6185628c8b7cccd94fe95b56c;hb=7a483d58db54e54201739e6b6e359078f07ed48d;hp=597929072e2288778ba592ebfff94e9d63c6eee5;hpb=af14759181240120f76c82f894982e8f33f0ba2a;p=dpdk.git diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index 597929072e..ba4c62aeb8 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -1,32 +1,5 @@ -.. BSD LICENSE - Copyright(c) 2010-2016 Intel Corporation. All rights reserved. - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of Intel Corporation nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2010-2016 Intel Corporation. Vhost Library ============= @@ -78,37 +51,72 @@ The following is an overview of some key Vhost API functions: This reconnect option is enabled by default. However, it can be turned off by setting this flag. - - ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` + - ``RTE_VHOST_USER_IOMMU_SUPPORT`` - Dequeue zero copy will be enabled when this flag is set. It is disabled by + IOMMU support will be enabled when this flag is set. It is disabled by default. - There are some truths (including limitations) you might want to know while - setting this flag: + Enabling this flag makes possible to use guest vIOMMU to protect vhost + from accessing memory the virtio device isn't allowed to, when the feature + is negotiated and an IOMMU device is declared. - * zero copy is not good for small packets (typically for packet size below - 512). + - ``RTE_VHOST_USER_POSTCOPY_SUPPORT`` - * zero copy is really good for VM2VM case. For iperf between two VMs, the - boost could be above 70% (when TSO is enableld). + Postcopy live-migration support will be enabled when this flag is set. + It is disabled by default. - * for VM2NIC case, the ``nb_tx_desc`` has to be small enough: <= 64 if virtio - indirect feature is not enabled and <= 128 if it is enabled. + Enabling this flag should only be done when the calling application does + not pre-fault the guest shared memory, otherwise migration would fail. - This is because when dequeue zero copy is enabled, guest Tx used vring will - be updated only when corresponding mbuf is freed. Thus, the nb_tx_desc - has to be small enough so that the PMD driver will run out of available - Tx descriptors and free mbufs timely. Otherwise, guest Tx vring would be - starved. + - ``RTE_VHOST_USER_LINEARBUF_SUPPORT`` - * Guest memory should be backended with huge pages to achieve better - performance. Using 1G page size is the best. + Enabling this flag forces vhost dequeue function to only provide linear + pktmbuf (no multi-segmented pktmbuf). - When dequeue zero copy is enabled, the guest phys address and host phys - address mapping has to be established. Using non-huge pages means far - more page segments. To make it simple, DPDK vhost does a linear search - of those segments, thus the fewer the segments, the quicker we will get - the mapping. NOTE: we may speed it by using tree searching in future. + The vhost library by default provides a single pktmbuf for given a + packet, but if for some reason the data doesn't fit into a single + pktmbuf (e.g., TSO is enabled), the library will allocate additional + pktmbufs from the same mempool and chain them together to create a + multi-segmented pktmbuf. + + However, the vhost application needs to support multi-segmented format. + If the vhost application does not support that format and requires large + buffers to be dequeue, this flag should be enabled to force only linear + buffers (see RTE_VHOST_USER_EXTBUF_SUPPORT) or drop the packet. + + It is disabled by default. + + - ``RTE_VHOST_USER_EXTBUF_SUPPORT`` + + Enabling this flag allows vhost dequeue function to allocate and attach + an external buffer to a pktmbuf if the pkmbuf doesn't provide enough + space to store all data. + + This is useful when the vhost application wants to support large packets + but doesn't want to increase the default mempool object size nor to + support multi-segmented mbufs (non-linear). In this case, a fresh buffer + is allocated using rte_malloc() which gets attached to a pktmbuf using + rte_pktmbuf_attach_extbuf(). + + See RTE_VHOST_USER_LINEARBUF_SUPPORT as well to disable multi-segmented + mbufs for application that doesn't support chained mbufs. + + It is disabled by default. + + - ``RTE_VHOST_USER_ASYNC_COPY`` + + Asynchronous data path will be enabled when this flag is set. Async data + path allows applications to register async copy devices (typically + hardware DMA channels) to the vhost queues. Vhost leverages the copy + device registered to free CPU from memory copy operations. A set of + async data path APIs are defined for DPDK applications to make use of + the async capability. Only packets enqueued/dequeued by async APIs are + processed through the async data path. + + Currently this feature is only implemented on split ring enqueue data + path. + + It is disabled by default. * ``rte_vhost_driver_set_features(path, features)`` @@ -129,8 +137,7 @@ The following is an overview of some key Vhost API functions: * ``destroy_device(int vid)`` - This callback is invoked when a virtio device shuts down (or when the - vhost connection is broken). + This callback is invoked when a virtio device is paused or shut down. * ``vring_state_changed(int vid, uint16_t queue_id, int enable)`` @@ -143,6 +150,18 @@ The following is an overview of some key Vhost API functions: ``VHOST_F_LOG_ALL`` will be set/cleared at the start/end of live migration, respectively. + * ``new_connection(int vid)`` + + This callback is invoked on new vhost-user socket connection. If DPDK + acts as the server the device should not be deleted before + ``destroy_connection`` callback is received. + + * ``destroy_connection(int vid)`` + + This callback is invoked when vhost-user socket connection is closed. + It indicates that device with id ``vid`` is no longer in use and can be + safely deleted. + * ``rte_vhost_driver_disable/enable_features(path, features))`` This function disables/enables some features. For example, it can be used to @@ -162,6 +181,84 @@ The following is an overview of some key Vhost API functions: Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``. +* ``rte_vhost_crypto_create(vid, cryptodev_id, sess_mempool, socket_id)`` + + As an extension of new_device(), this function adds virtio-crypto workload + acceleration capability to the device. All crypto workload is processed by + DPDK cryptodev with the device ID of ``cryptodev_id``. + +* ``rte_vhost_crypto_free(vid)`` + + Frees the memory and vhost-user message handlers created in + rte_vhost_crypto_create(). + +* ``rte_vhost_crypto_fetch_requests(vid, queue_id, ops, nb_ops)`` + + Receives (dequeues) ``nb_ops`` virtio-crypto requests from guest, parses + them to DPDK Crypto Operations, and fills the ``ops`` with parsing results. + +* ``rte_vhost_crypto_finalize_requests(queue_id, ops, nb_ops)`` + + After the ``ops`` are dequeued from Cryptodev, finalizes the jobs and + notifies the guest(s). + +* ``rte_vhost_crypto_set_zero_copy(vid, option)`` + + Enable or disable zero copy feature of the vhost crypto backend. + +* ``rte_vhost_async_channel_register(vid, queue_id, features, ops)`` + + Register a vhost queue with async copy device channel. + Following device ``features`` must be specified together with the + registration: + + * ``async_inorder`` + + Async copy device can guarantee the ordering of copy completion + sequence. Copies are completed in the same order with that at + the submission time. + + Currently, only ``async_inorder`` capable device is supported by vhost. + + * ``async_threshold`` + + The copy length (in bytes) below which CPU copy will be used even if + applications call async vhost APIs to enqueue/dequeue data. + + Typical value is 512~1024 depending on the async device capability. + + Applications must provide following ``ops`` callbacks for vhost lib to + work with the async copy devices: + + * ``transfer_data(vid, queue_id, descs, opaque_data, count)`` + + vhost invokes this function to submit copy data to the async devices. + For non-async_inorder capable devices, ``opaque_data`` could be used + for identifying the completed packets. + + * ``check_completed_copies(vid, queue_id, opaque_data, max_packets)`` + + vhost invokes this function to get the copy data completed by async + devices. + +* ``rte_vhost_async_channel_unregister(vid, queue_id)`` + + Unregister the async copy device channel from a vhost queue. + +* ``rte_vhost_submit_enqueue_burst(vid, queue_id, pkts, count)`` + + Submit an enqueue request to transmit ``count`` packets from host to guest + by async data path. Enqueue is not guaranteed to finish upon the return of + this API call. + + Applications must not free the packets submitted for enqueue until the + packets are completed. + +* ``rte_vhost_poll_enqueue_completed(vid, queue_id, pkts, count)`` + + Poll enqueue completion status from async data path. Completed packets + are returned to applications through ``pkts``. + Vhost-user Implementations -------------------------- @@ -216,8 +313,88 @@ the vhost device from the data plane. When the socket connection is closed, vhost will destroy the device. +Guest memory requirement +------------------------ + +* Memory pre-allocation + + For non-async data path, guest memory pre-allocation is not a + must. This can help save of memory. If users really want the guest memory + to be pre-allocated (e.g., for performance reason), we can add option + ``-mem-prealloc`` when starting QEMU. Or, we can lock all memory at vhost + side which will force memory to be allocated when mmap at vhost side; + option --mlockall in ovs-dpdk is an example in hand. + + For async data path, we force the VM memory to be pre-allocated at vhost + lib when mapping the guest memory; and also we need to lock the memory to + prevent pages being swapped out to disk. + +* Memory sharing + + Make sure ``share=on`` QEMU option is given. vhost-user will not work with + a QEMU version without shared memory mapping. + Vhost supported vSwitch reference --------------------------------- For more vhost details and how to support vhost in vSwitch, please refer to the vhost example in the DPDK Sample Applications Guide. + +Vhost data path acceleration (vDPA) +----------------------------------- + +vDPA supports selective datapath in vhost-user lib by enabling virtio ring +compatible devices to serve virtio driver directly for datapath acceleration. + +``rte_vhost_driver_attach_vdpa_device`` is used to configure the vhost device +with accelerated backend. + +Also vhost device capabilities are made configurable to adopt various devices. +Such capabilities include supported features, protocol features, queue number. + +Finally, a set of device ops is defined for device specific operations: + +* ``get_queue_num`` + + Called to get supported queue number of the device. + +* ``get_features`` + + Called to get supported features of the device. + +* ``get_protocol_features`` + + Called to get supported protocol features of the device. + +* ``dev_conf`` + + Called to configure the actual device when the virtio device becomes ready. + +* ``dev_close`` + + Called to close the actual device when the virtio device is stopped. + +* ``set_vring_state`` + + Called to change the state of the vring in the actual device when vring state + changes. + +* ``set_features`` + + Called to set the negotiated features to device. + +* ``migration_done`` + + Called to allow the device to response to RARP sending. + +* ``get_vfio_group_fd`` + + Called to get the VFIO group fd of the device. + +* ``get_vfio_device_fd`` + + Called to get the VFIO device fd of the device. + +* ``get_notify_area`` + + Called to get the notify area info of the queue.