X-Git-Url: http://git.droids-corp.org/?a=blobdiff_plain;f=doc%2Fguides%2Fprog_guide%2Fpoll_mode_drv.rst;h=d1d4b1cb743b69a664b4679f6df1e5926c0c0256;hb=7db274b9ada2221acb7110204a3b2c6a37d2614a;hp=bf3ea9fde25aa1706db398d23e3e22bd988e08ee;hpb=d085232a14bc40451aecdcf490abedcb85b438b7;p=dpdk.git

diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
index bf3ea9fde2..d1d4b1cb74 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -84,7 +84,7 @@ Whenever needed and appropriate, asynchronous communication should be introduced
 
 Avoiding lock contention is a key issue in a multi-core environment.
 To address this issue, PMDs are designed to work with per-core private resources as much as possible.
-For example, a PMD maintains a separate transmit queue per-core, per-port.
+For example, a PMD maintains a separate transmit queue per-core, per-port, if the PMD is not ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capable.
 In the same way, every receive queue of a port is assigned to and polled by a single logical core (lcore).
 
 To comply with Non-Uniform Memory Access (NUMA), memory management is designed to assign to each logical core
@@ -146,6 +146,16 @@ This is also true for the pipe-line model provided all logical cores used are lo
 
 Multiple logical cores should never share receive or transmit queues for interfaces since this would require global locks and hinder performance.
 
+If the PMD is ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capable, multiple threads can invoke ``rte_eth_tx_burst()``
+concurrently on the same tx queue without SW lock. This PMD feature found in some NICs and useful in the following use cases:
+
+*  Remove explicit spinlock in some applications where lcores are not mapped to Tx queues with 1:1 relation.
+
+*  In the eventdev use case, avoid dedicating a separate TX core for transmitting and thus
+   enables more scaling as all workers can send the packets.
+
+See `Hardware Offload`_ for ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capability probing details.
+
 Device Identification and Configuration
 ---------------------------------------
 
@@ -200,8 +210,8 @@ Ethernet* flow control (pause frame) can be configured on the individual port.
 Refer to the testpmd source code for details.
 Also, L4 (UDP/TCP/ SCTP) checksum offload by the NIC can be enabled for an individual packet as long as the packet mbuf is set up correctly. See `Hardware Offload`_ for details.
 
-Configuration of Transmit and Receive Queues
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Configuration of Transmit Queues
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Each transmit queue is independently configured with the following information:
 
@@ -249,12 +259,49 @@ One descriptor in the TX ring is used as a sentinel to avoid a hardware race con
 
     When configuring for DCB operation, at port initialization, both the number of transmit queues and the number of receive queues must be set to 128.
 
+Free Tx mbuf on Demand
+~~~~~~~~~~~~~~~~~~~~~~
+
+Many of the drivers do not release the mbuf back to the mempool, or local cache,
+immediately after the packet has been transmitted.
+Instead, they leave the mbuf in their Tx ring and
+either perform a bulk release when the ``tx_rs_thresh`` has been crossed
+or free the mbuf when a slot in the Tx ring is needed.
+
+An application can request the driver to release used mbufs with the ``rte_eth_tx_done_cleanup()`` API.
+This API requests the driver to release mbufs that are no longer in use,
+independent of whether or not the ``tx_rs_thresh`` has been crossed.
+There are two scenarios when an application may want the mbuf released immediately:
+
+* When a given packet needs to be sent to multiple destination interfaces
+  (either for Layer 2 flooding or Layer 3 multi-cast).
+  One option is to make a copy of the packet or a copy of the header portion that needs to be manipulated.
+  A second option is to transmit the packet and then poll the ``rte_eth_tx_done_cleanup()`` API
+  until the reference count on the packet is decremented.
+  Then the same packet can be transmitted to the next destination interface.
+  The application is still responsible for managing any packet manipulations needed
+  between the different destination interfaces, but a packet copy can be avoided.
+  This API is independent of whether the packet was transmitted or dropped,
+  only that the mbuf is no longer in use by the interface.
+
+* Some applications are designed to make multiple runs, like a packet generator.
+  For performance reasons and consistency between runs,
+  the application may want to reset back to an initial state
+  between each run, where all mbufs are returned to the mempool.
+  In this case, it can call the ``rte_eth_tx_done_cleanup()`` API
+  for each destination interface it has been using
+  to request it to release of all its used mbufs.
+
+To determine if a driver supports this API, check for the *Free Tx mbuf on demand* feature
+in the *Network Interface Controller Drivers* document.
+
 Hardware Offload
 ~~~~~~~~~~~~~~~~
 
 Depending on driver capabilities advertised by
 ``rte_eth_dev_info_get()``, the PMD may support hardware offloading
-feature like checksumming, TCP segmentation or VLAN insertion.
+feature like checksumming, TCP segmentation, VLAN insertion or
+lockfree multithreaded TX burst on the same TX queue.
 
 The support of these offload features implies the addition of dedicated
 status bit(s) and value field(s) into the rte_mbuf data structure, along
@@ -263,6 +310,26 @@ exported by each PMD. The list of flags and their precise meaning is
 described in the mbuf API documentation and in the in :ref:`Mbuf Library
 <Mbuf_Library>`, section "Meta Information".
 
+Per-Port and Per-Queue Offloads
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In the DPDK offload API, offloads are divided into per-port and per-queue offloads.
+The different offloads capabilities can be queried using ``rte_eth_dev_info_get()``.
+Supported offloads can be either per-port or per-queue.
+
+Offloads are enabled using the existing ``DEV_TX_OFFLOAD_*`` or ``DEV_RX_OFFLOAD_*`` flags.
+Per-port offload configuration is set using ``rte_eth_dev_configure``.
+Per-queue offload configuration is set using ``rte_eth_rx_queue_setup`` and ``rte_eth_tx_queue_setup``.
+To enable per-port offload, the offload should be set on both device configuration and queue setup.
+In case of a mixed configuration the queue setup shall return with an error.
+To enable per-queue offload, the offload can be set only on the queue setup.
+Offloads which are not enabled are disabled by default.
+
+For an application to use the Tx offloads API it should set the ``ETH_TXQ_FLAGS_IGNORE`` flag in the ``txq_flags`` field located in ``rte_eth_txconf`` struct.
+In such cases it is not required to set other flags in ``txq_flags``.
+For an application to use the Rx offloads API it should set the ``ignore_offload_bitfield`` bit in the ``rte_eth_rxmode`` struct.
+In such cases it is not required to set other bitfield offloads in the ``rxmode`` struct.
+
 Poll Mode Driver API
 --------------------
 
@@ -298,24 +365,21 @@ The Ethernet device API exported by the Ethernet PMDs is described in the *DPDK
 Extended Statistics API
 ~~~~~~~~~~~~~~~~~~~~~~~
 
-The extended statistics API allows each individual PMD to expose a unique set
-of statistics. Accessing these from application programs is done via two
-functions:
-
-* ``rte_eth_xstats_get``: Fills in an array of ``struct rte_eth_xstat``
-  with extended statistics.
-* ``rte_eth_xstats_get_names``: Fills in an array of
-  ``struct rte_eth_xstat_name`` with extended statistic name lookup
-  information.
-
-Each ``struct rte_eth_xstat`` contains an identifier and value pair, and
-each ``struct rte_eth_xstat_name`` contains a string. Each identifier
-within the ``struct rte_eth_xstat`` lookup array must have a corresponding
-entry in the ``struct rte_eth_xstat_name`` lookup array. Within the latter
-the index of the entry is the identifier the string is associated with.
-These identifiers, as well as the number of extended statistic exposed, must
-remain constant during runtime. Note that extended statistic identifiers are
+The extended statistics API allows a PMD to expose all statistics that are
+available to it, including statistics that are unique to the device.
+Each statistic has three properties ``name``, ``id`` and ``value``:
+
+* ``name``: A human readable string formatted by the scheme detailed below.
+* ``id``: An integer that represents only that statistic.
+* ``value``: A unsigned 64-bit integer that is the value of the statistic.
+
+Note that extended statistic identifiers are
 driver-specific, and hence might not be the same for different ports.
+The API consists of various ``rte_eth_xstats_*()`` functions, and allows an
+application to be flexible in how it retrieves statistics.
+
+Scheme for Human Readable Names
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 A naming scheme exists for the strings exposed to clients of the API. This is
 to allow scraping of the API for statistics of interest. The naming scheme uses
@@ -356,3 +420,179 @@ Some additions in the metadata scheme are as follows:
 An example where queue numbers are used is as follows: ``tx_q7_bytes`` which
 indicates this statistic applies to queue number 7, and represents the number
 of transmitted bytes on that queue.
+
+API Design
+^^^^^^^^^^
+
+The xstats API uses the ``name``, ``id``, and ``value`` to allow performant
+lookup of specific statistics. Performant lookup means two things;
+
+* No string comparisons with the ``name`` of the statistic in fast-path
+* Allow requesting of only the statistics of interest
+
+The API ensures these requirements are met by mapping the ``name`` of the
+statistic to a unique ``id``, which is used as a key for lookup in the fast-path.
+The API allows applications to request an array of ``id`` values, so that the
+PMD only performs the required calculations. Expected usage is that the
+application scans the ``name`` of each statistic, and caches the ``id``
+if it has an interest in that statistic. On the fast-path, the integer can be used
+to retrieve the actual ``value`` of the statistic that the ``id`` represents.
+
+API Functions
+^^^^^^^^^^^^^
+
+The API is built out of a small number of functions, which can be used to
+retrieve the number of statistics and the names, IDs and values of those
+statistics.
+
+* ``rte_eth_xstats_get_names_by_id()``: returns the names of the statistics. When given a
+  ``NULL`` parameter the function returns the number of statistics that are available.
+
+* ``rte_eth_xstats_get_id_by_name()``: Searches for the statistic ID that matches
+  ``xstat_name``. If found, the ``id`` integer is set.
+
+* ``rte_eth_xstats_get_by_id()``: Fills in an array of ``uint64_t`` values
+  with matching the provided ``ids`` array. If the ``ids`` array is NULL, it
+  returns all statistics that are available.
+
+
+Application Usage
+^^^^^^^^^^^^^^^^^
+
+Imagine an application that wants to view the dropped packet count. If no
+packets are dropped, the application does not read any other metrics for
+performance reasons. If packets are dropped, the application has a particular
+set of statistics that it requests. This "set" of statistics allows the app to
+decide what next steps to perform. The following code-snippets show how the
+xstats API can be used to achieve this goal.
+
+First step is to get all statistics names and list them:
+
+.. code-block:: c
+
+    struct rte_eth_xstat_name *xstats_names;
+    uint64_t *values;
+    int len, i;
+
+    /* Get number of stats */
+    len = rte_eth_xstats_get_names_by_id(port_id, NULL, NULL, 0);
+    if (len < 0) {
+        printf("Cannot get xstats count\n");
+        goto err;
+    }
+
+    xstats_names = malloc(sizeof(struct rte_eth_xstat_name) * len);
+    if (xstats_names == NULL) {
+        printf("Cannot allocate memory for xstat names\n");
+        goto err;
+    }
+
+    /* Retrieve xstats names, passing NULL for IDs to return all statistics */
+    if (len != rte_eth_xstats_get_names_by_id(port_id, xstats_names, NULL, len)) {
+        printf("Cannot get xstat names\n");
+        goto err;
+    }
+
+    values = malloc(sizeof(values) * len);
+    if (values == NULL) {
+        printf("Cannot allocate memory for xstats\n");
+        goto err;
+    }
+
+    /* Getting xstats values */
+    if (len != rte_eth_xstats_get_by_id(port_id, NULL, values, len)) {
+        printf("Cannot get xstat values\n");
+        goto err;
+    }
+
+    /* Print all xstats names and values */
+    for (i = 0; i < len; i++) {
+        printf("%s: %"PRIu64"\n", xstats_names[i].name, values[i]);
+    }
+
+The application has access to the names of all of the statistics that the PMD
+exposes. The application can decide which statistics are of interest, cache the
+ids of those statistics by looking up the name as follows:
+
+.. code-block:: c
+
+    uint64_t id;
+    uint64_t value;
+    const char *xstat_name = "rx_errors";
+
+    if(!rte_eth_xstats_get_id_by_name(port_id, xstat_name, &id)) {
+        rte_eth_xstats_get_by_id(port_id, &id, &value, 1);
+        printf("%s: %"PRIu64"\n", xstat_name, value);
+    }
+    else {
+        printf("Cannot find xstats with a given name\n");
+        goto err;
+    }
+
+The API provides flexibility to the application so that it can look up multiple
+statistics using an array containing multiple ``id`` numbers. This reduces the
+function call overhead of retrieving statistics, and makes lookup of multiple
+statistics simpler for the application.
+
+.. code-block:: c
+
+    #define APP_NUM_STATS 4
+    /* application cached these ids previously; see above */
+    uint64_t ids_array[APP_NUM_STATS] = {3,4,7,21};
+    uint64_t value_array[APP_NUM_STATS];
+
+    /* Getting multiple xstats values from array of IDs */
+    rte_eth_xstats_get_by_id(port_id, ids_array, value_array, APP_NUM_STATS);
+
+    uint32_t i;
+    for(i = 0; i < APP_NUM_STATS; i++) {
+        printf("%d: %"PRIu64"\n", ids_array[i], value_array[i]);
+    }
+
+
+This array lookup API for xstats allows the application create multiple
+"groups" of statistics, and look up the values of those IDs using a single API
+call. As an end result, the application is able to achieve its goal of
+monitoring a single statistic ("rx_errors" in this case), and if that shows
+packets being dropped, it can easily retrieve a "set" of statistics using the
+IDs array parameter to ``rte_eth_xstats_get_by_id`` function.
+
+NIC Reset API
+~~~~~~~~~~~~~
+
+.. code-block:: c
+
+    int rte_eth_dev_reset(uint16_t port_id);
+
+Sometimes a port has to be reset passively. For example when a PF is
+reset, all its VFs should also be reset by the application to make them
+consistent with the PF. A DPDK application also can call this function
+to trigger a port reset. Normally, a DPDK application would invokes this
+function when an RTE_ETH_EVENT_INTR_RESET event is detected.
+
+It is the duty of the PMD to trigger RTE_ETH_EVENT_INTR_RESET events and
+the application should register a callback function to handle these
+events. When a PMD needs to trigger a reset, it can trigger an
+RTE_ETH_EVENT_INTR_RESET event. On receiving an RTE_ETH_EVENT_INTR_RESET
+event, applications can handle it as follows: Stop working queues, stop
+calling Rx and Tx functions, and then call rte_eth_dev_reset(). For
+thread safety all these operations should be called from the same thread.
+
+For example when PF is reset, the PF sends a message to notify VFs of
+this event and also trigger an interrupt to VFs. Then in the interrupt
+service routine the VFs detects this notification message and calls
+_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET, NULL).
+This means that a PF reset triggers an RTE_ETH_EVENT_INTR_RESET
+event within VFs. The function _rte_eth_dev_callback_process() will
+call the registered callback function. The callback function can trigger
+the application to handle all operations the VF reset requires including
+stopping Rx/Tx queues and calling rte_eth_dev_reset().
+
+The rte_eth_dev_reset() itself is a generic function which only does
+some hardware reset operations through calling dev_unint() and
+dev_init(), and itself does not handle synchronization, which is handled
+by application.
+
+The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
+the application to handle reset event. It is duty of application to
+handle all synchronization before it calls rte_eth_dev_reset().