X-Git-Url: http://git.droids-corp.org/?a=blobdiff_plain;f=doc%2Fguides%2Fprog_guide%2Fvhost_lib.rst;h=6a4d206f6e795579a6993433c49cc92d37a44528;hb=5fbb3941da9f;hp=ba6065da0cc8ec9cfbf13e09e3daba98cdee6fe5;hpb=45db8927a832fc3ad717b81387ac765f22d43996;p=dpdk.git diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index ba6065da0c..6a4d206f6e 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -1,5 +1,5 @@ .. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + Copyright(c) 2010-2016 Intel Corporation. All rights reserved. All rights reserved. Redistribution and use in source and binary forms, with or without @@ -31,72 +31,189 @@ Vhost Library ============= -The vhost cuse (cuse: user space character device driver) library implements a -vhost cuse driver. It also creates, manages and destroys vhost devices for -corresponding virtio devices in the guest. Vhost supported vSwitch could register -callbacks to this library, which will be called when a vhost device is activated -or deactivated by guest virtual machine. +The vhost library implements a user space virtio net server allowing the user +to manipulate the virtio ring directly. In another words, it allows the user +to fetch/put packets from/to the VM virtio net device. To achieve this, a +vhost library should be able to: + +* Access the guest memory: + + For QEMU, this is done by using the ``-object memory-backend-file,share=on,...`` + option. Which means QEMU will create a file to serve as the guest RAM. + The ``share=on`` option allows another process to map that file, which + means it can access the guest RAM. + +* Know all the necessary information about the vring: + + Information such as where the available ring is stored. Vhost defines some + messages (passed through a Unix domain socket file) to tell the backend all + the information it needs to know how to manipulate the vring. + Vhost API Overview ------------------ -* Vhost driver registration +The following is an overview of some key Vhost API functions: + +* ``rte_vhost_driver_register(path, flags)`` + + This function registers a vhost driver into the system. ``path`` specifies + the Unix domain socket file path. + + Currently supported flags are: + + - ``RTE_VHOST_USER_CLIENT`` + + DPDK vhost-user will act as the client when this flag is given. See below + for an explanation. + + - ``RTE_VHOST_USER_NO_RECONNECT`` + + When DPDK vhost-user acts as the client it will keep trying to reconnect + to the server (QEMU) until it succeeds. This is useful in two cases: + + * When QEMU is not started yet. + * When QEMU restarts (for example due to a guest OS reboot). + + This reconnect option is enabled by default. However, it can be turned off + by setting this flag. + + - ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` + + Dequeue zero copy will be enabled when this flag is set. It is disabled by + default. + + There are some truths (including limitations) you might want to know while + setting this flag: + + * zero copy is not good for small packets (typically for packet size below + 512). + + * zero copy is really good for VM2VM case. For iperf between two VMs, the + boost could be above 70% (when TSO is enableld). + + * for VM2NIC case, the ``nb_tx_desc`` has to be small enough: <= 64 if virtio + indirect feature is not enabled and <= 128 if it is enabled. + + This is because when dequeue zero copy is enabled, guest Tx used vring will + be updated only when corresponding mbuf is freed. Thus, the nb_tx_desc + has to be small enough so that the PMD driver will run out of available + Tx descriptors and free mbufs timely. Otherwise, guest Tx vring would be + starved. + + * Guest memory should be backended with huge pages to achieve better + performance. Using 1G page size is the best. + + When dequeue zero copy is enabled, the guest phys address and host phys + address mapping has to be established. Using non-huge pages means far + more page segments. To make it simple, DPDK vhost does a linear search + of those segments, thus the fewer the segments, the quicker we will get + the mapping. NOTE: we may speed it by using tree searching in future. + +* ``rte_vhost_driver_set_features(path, features)`` + + This function sets the feature bits the vhost-user driver supports. The + vhost-user driver could be vhost-user net, yet it could be something else, + say, vhost-user SCSI. + +* ``rte_vhost_driver_session_start()`` + + This function starts the vhost session loop to handle vhost messages. It + starts an infinite loop, therefore it should be called in a dedicated + thread. + +* ``rte_vhost_driver_callback_register(virtio_net_device_ops)`` + + This function registers a set of callbacks, to let DPDK applications take + the appropriate action when some events happen. The following events are + currently supported: + + * ``new_device(int vid)`` + + This callback is invoked when a virtio net device becomes ready. ``vid`` + is the virtio net device ID. + + * ``destroy_device(int vid)`` + + This callback is invoked when a virtio net device shuts down (or when the + vhost connection is broken). + + * ``vring_state_changed(int vid, uint16_t queue_id, int enable)`` + + This callback is invoked when a specific queue's state is changed, for + example to enabled or disabled. + +* ``rte_vhost_enqueue_burst(vid, queue_id, pkts, count)`` + + Transmits (enqueues) ``count`` packets from host to guest. + +* ``rte_vhost_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count)`` + + Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``. + +* ``rte_vhost_driver_disable/enable_features(path, features))`` + + This function disables/enables some features. For example, it can be used to + disable mergeable buffers and TSO features, which both are enabled by + default. + + +Vhost-user Implementations +-------------------------- + +Vhost-user uses Unix domain sockets for passing messages. This means the DPDK +vhost-user implementation has two options: + +* DPDK vhost-user acts as the server. - rte_vhost_driver_register registers the vhost cuse driver into the system. - Character device file will be created in the /dev directory. - Character device name is specified as the parameter. + DPDK will create a Unix domain socket server file and listen for + connections from the frontend. -* Vhost session start + Note, this is the default mode, and the only mode before DPDK v16.07. - rte_vhost_driver_session_start starts the vhost session loop. - Vhost cuse session is an infinite blocking loop. - Put the session in a dedicate DPDK thread. -* Callback register +* DPDK vhost-user acts as the client. - Vhost supported vSwitch could call rte_vhost_driver_callback_register to - register two callbacks, new_destory and destroy_device. - When virtio device is activated or deactivated by guest virtual machine, - the callback will be called, then vSwitch could put the device onto data - core or remove the device from data core by setting or unsetting - VIRTIO_DEV_RUNNING on the device flags. + Unlike the server mode, this mode doesn't create the socket file; + it just tries to connect to the server (which responses to create the + file instead). -* Read/write packets from/to guest virtual machine + When the DPDK vhost-user application restarts, DPDK vhost-user will try to + connect to the server again. This is how the "reconnect" feature works. - rte_vhost_enqueue_burst transmit host packets to guest. - rte_vhost_dequeue_burst receives packets from guest. + .. Note:: + * The "reconnect" feature requires **QEMU v2.7** (or above). -* Feature enable/disable + * The vhost supported features must be exactly the same before and + after the restart. For example, if TSO is disabled and then enabled, + nothing will work and issues undefined might happen. - Now one negotiate-able feature in vhost is merge-able. - vSwitch could enable/disable this feature for performance consideration. +No matter which mode is used, once a connection is established, DPDK +vhost-user will start receiving and processing vhost messages from QEMU. -Vhost Implementation --------------------- +For messages with a file descriptor, the file descriptor can be used directly +in the vhost process as it is already installed by the Unix domain socket. -When vSwitch registers the vhost driver, it will register a cuse device driver -into the system and creates a character device file. This cuse driver will -receive vhost open/release/IOCTL message from QEMU simulator. +The supported vhost messages are: -When the open call is received, vhost driver will create a vhost device for the -virtio device in the guest. +* ``VHOST_SET_MEM_TABLE`` +* ``VHOST_SET_VRING_KICK`` +* ``VHOST_SET_VRING_CALL`` +* ``VHOST_SET_LOG_FD`` +* ``VHOST_SET_VRING_ERR`` -When VHOST_SET_MEM_TABLE IOCTL is received, vhost searches the memory region -to find the starting user space virtual address that maps the memory of guest -virtual machine. Through this virtual address and the QEMU pid, vhost could -find the file QEMU uses to map the guest memory. Vhost maps this file into its -address space, in this way vhost could fully access the guest physical memory, -which means vhost could access the shared virtio ring and the guest physical -address specified in the entry of the ring. +For ``VHOST_SET_MEM_TABLE`` message, QEMU will send information for each +memory region and its file descriptor in the ancillary data of the message. +The file descriptor is used to map that region. -The guest virtual machine tells the vhost whether the virtio device is ready -for processing or is de-activated through VHOST_SET_BACKEND message. -The registered callback from vSwitch will be called. +``VHOST_SET_VRING_KICK`` is used as the signal to put the vhost device into +the data plane, and ``VHOST_GET_VRING_BASE`` is used as the signal to remove +the vhost device from the data plane. -When the release call is released, vhost will destroy the device. +When the socket connection is closed, vhost will destroy the device. Vhost supported vSwitch reference --------------------------------- -For how to support vhost in vSwitch, please refer to vhost example in the -DPDK Sample Applications Guide. +For more vhost details and how to support vhost in vSwitch, please refer to +the vhost example in the DPDK Sample Applications Guide.