Avoiding lock contention is a key issue in a multi-core environment.
To address this issue, PMDs are designed to work with per-core private resources as much as possible.
-For example, a PMD maintains a separate transmit queue per-core, per-port.
+For example, a PMD maintains a separate transmit queue per-core, per-port, if the PMD is not ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capable.
In the same way, every receive queue of a port is assigned to and polled by a single logical core (lcore).
To comply with Non-Uniform Memory Access (NUMA), memory management is designed to assign to each logical core
Multiple logical cores should never share receive or transmit queues for interfaces since this would require global locks and hinder performance.
+If the PMD is ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capable, multiple threads can invoke ``rte_eth_tx_burst()``
+concurrently on the same tx queue without SW lock. This PMD feature found in some NICs and useful in the following use cases:
+
+* Remove explicit spinlock in some applications where lcores are not mapped to Tx queues with 1:1 relation.
+
+* In the eventdev use case, avoid dedicating a separate TX core for transmitting and thus
+ enables more scaling as all workers can send the packets.
+
+See `Hardware Offload`_ for ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capability probing details.
+
Device Identification and Configuration
---------------------------------------
Depending on driver capabilities advertised by
``rte_eth_dev_info_get()``, the PMD may support hardware offloading
-feature like checksumming, TCP segmentation or VLAN insertion.
+feature like checksumming, TCP segmentation, VLAN insertion or
+lockfree multithreaded TX burst on the same TX queue.
The support of these offload features implies the addition of dedicated
status bit(s) and value field(s) into the rte_mbuf data structure, along
described in the mbuf API documentation and in the in :ref:`Mbuf Library
<Mbuf_Library>`, section "Meta Information".
+Per-Port and Per-Queue Offloads
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In the DPDK offload API, offloads are divided into per-port and per-queue offloads.
+The different offloads capabilities can be queried using ``rte_eth_dev_info_get()``.
+Supported offloads can be either per-port or per-queue.
+
+Offloads are enabled using the existing ``DEV_TX_OFFLOAD_*`` or ``DEV_RX_OFFLOAD_*`` flags.
+Per-port offload configuration is set using ``rte_eth_dev_configure``.
+Per-queue offload configuration is set using ``rte_eth_rx_queue_setup`` and ``rte_eth_tx_queue_setup``.
+To enable per-port offload, the offload should be set on both device configuration and queue setup.
+In case of a mixed configuration the queue setup shall return with an error.
+To enable per-queue offload, the offload can be set only on the queue setup.
+Offloads which are not enabled are disabled by default.
+
+For an application to use the Tx offloads API it should set the ``ETH_TXQ_FLAGS_IGNORE`` flag in the ``txq_flags`` field located in ``rte_eth_txconf`` struct.
+In such cases it is not required to set other flags in ``txq_flags``.
+For an application to use the Rx offloads API it should set the ``ignore_offload_bitfield`` bit in the ``rte_eth_rxmode`` struct.
+In such cases it is not required to set other bitfield offloads in the ``rxmode`` struct.
+
Poll Mode Driver API
--------------------
* detail n
* unit
-Examples of common statistics xstats strings, formatted to comply to the
-above scheme:
+Examples of common statistics xstats strings, formatted to comply to the scheme
+proposed above:
* ``rx_bytes``
* ``rx_crc_errors``
indicates that the unit of measure is packets.
A more complicated example: ``tx_size_128_to_255_packets``. In this example,
-``tx`` indicates transmission, ``size`` is the first detail, ``128`` etc. are
+``tx`` indicates transmission, ``size`` is the first detail, ``128`` etc are
more details, and ``packets`` indicates that this is a packet counter.
Some additions in the metadata scheme are as follows:
retrieve the number of statistics and the names, IDs and values of those
statistics.
-* ``rte_eth_xstats_get_names()``: returns the names of the statistics. When given a
+* ``rte_eth_xstats_get_names_by_id()``: returns the names of the statistics. When given a
``NULL`` parameter the function returns the number of statistics that are available.
* ``rte_eth_xstats_get_id_by_name()``: Searches for the statistic ID that matches
``xstat_name``. If found, the ``id`` integer is set.
-* ``rte_eth_xstats_get()``: Fills in an array of ``uint64_t`` values
+* ``rte_eth_xstats_get_by_id()``: Fills in an array of ``uint64_t`` values
with matching the provided ``ids`` array. If the ``ids`` array is NULL, it
returns all statistics that are available.
int len, i;
/* Get number of stats */
- len = rte_eth_xstats_get_names(port_id, NULL, NULL, 0);
+ len = rte_eth_xstats_get_names_by_id(port_id, NULL, NULL, 0);
if (len < 0) {
printf("Cannot get xstats count\n");
goto err;
}
/* Retrieve xstats names, passing NULL for IDs to return all statistics */
- if (len != rte_eth_xstats_get_names(port_id, xstats_names, NULL, len)) {
+ if (len != rte_eth_xstats_get_names_by_id(port_id, xstats_names, NULL, len)) {
printf("Cannot get xstat names\n");
goto err;
}
}
/* Getting xstats values */
- if (len != rte_eth_xstats_get(port_id, NULL, values, len)) {
+ if (len != rte_eth_xstats_get_by_id(port_id, NULL, values, len)) {
printf("Cannot get xstat values\n");
goto err;
}
const char *xstat_name = "rx_errors";
if(!rte_eth_xstats_get_id_by_name(port_id, xstat_name, &id)) {
- rte_eth_xstats_get(port_id, &id, &value, 1);
+ rte_eth_xstats_get_by_id(port_id, &id, &value, 1);
printf("%s: %"PRIu64"\n", xstat_name, value);
}
else {
uint64_t value_array[APP_NUM_STATS];
/* Getting multiple xstats values from array of IDs */
- rte_eth_xstats_get(port_id, ids_array, value_array, APP_NUM_STATS);
+ rte_eth_xstats_get_by_id(port_id, ids_array, value_array, APP_NUM_STATS);
uint32_t i;
for(i = 0; i < APP_NUM_STATS; i++) {
call. As an end result, the application is able to achieve its goal of
monitoring a single statistic ("rx_errors" in this case), and if that shows
packets being dropped, it can easily retrieve a "set" of statistics using the
-IDs array parameter to ``rte_eth_xstats_get`` function.
+IDs array parameter to ``rte_eth_xstats_get_by_id`` function.
+
+NIC Reset API
+~~~~~~~~~~~~~
+
+.. code-block:: c
+
+ int rte_eth_dev_reset(uint16_t port_id);
+
+Sometimes a port has to be reset passively. For example when a PF is
+reset, all its VFs should also be reset by the application to make them
+consistent with the PF. A DPDK application also can call this function
+to trigger a port reset. Normally, a DPDK application would invokes this
+function when an RTE_ETH_EVENT_INTR_RESET event is detected.
+
+It is the duty of the PMD to trigger RTE_ETH_EVENT_INTR_RESET events and
+the application should register a callback function to handle these
+events. When a PMD needs to trigger a reset, it can trigger an
+RTE_ETH_EVENT_INTR_RESET event. On receiving an RTE_ETH_EVENT_INTR_RESET
+event, applications can handle it as follows: Stop working queues, stop
+calling Rx and Tx functions, and then call rte_eth_dev_reset(). For
+thread safety all these operations should be called from the same thread.
+
+For example when PF is reset, the PF sends a message to notify VFs of
+this event and also trigger an interrupt to VFs. Then in the interrupt
+service routine the VFs detects this notification message and calls
+_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET, NULL,
+NULL). This means that a PF reset triggers an RTE_ETH_EVENT_INTR_RESET
+event within VFs. The function _rte_eth_dev_callback_process() will
+call the registered callback function. The callback function can trigger
+the application to handle all operations the VF reset requires including
+stopping Rx/Tx queues and calling rte_eth_dev_reset().
+
+The rte_eth_dev_reset() itself is a generic function which only does
+some hardware reset operations through calling dev_unint() and
+dev_init(), and itself does not handle synchronization, which is handled
+by application.
+
+The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
+the application to handle reset event. It is duty of application to
+handle all synchronization before it calls rte_eth_dev_reset().