X-Git-Url: http://git.droids-corp.org/?a=blobdiff_plain;f=doc%2Fguides%2Fprog_guide%2Fpoll_mode_drv.rst;h=e48c121c003c27029409dd952e2400298c56ac5a;hb=44cebef721a644dd4139596942d24d0967c1cfb3;hp=d6943e39de304b588120362740eaa36dc2cf7095;hpb=5eb379550f46cb1a79de4cd6487a0f1321bc9c71;p=dpdk.git diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst old mode 100755 new mode 100644 index d6943e39de..e48c121c00 --- a/doc/guides/prog_guide/poll_mode_drv.rst +++ b/doc/guides/prog_guide/poll_mode_drv.rst @@ -1,5 +1,5 @@ .. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + Copyright(c) 2010-2015 Intel Corporation. All rights reserved. All rights reserved. Redistribution and use in source and binary forms, with or without @@ -198,17 +198,10 @@ the Intel® 82599 10 Gigabit Ethernet Controller controllers in the testpmd appl Other features such as the L3/L4 5-Tuple packet filtering feature of a port can be configured in the same way. Ethernet* flow control (pause frame) can be configured on the individual port. Refer to the testpmd source code for details. -Also, L4 (UDP/TCP/ SCTP) checksum offload by the NIC can be enabled for an individual packet as long as the packet mbuf is set up correctly. -Refer to the testpmd source code (specifically the csumonly.c file) for details. +Also, L4 (UDP/TCP/ SCTP) checksum offload by the NIC can be enabled for an individual packet as long as the packet mbuf is set up correctly. See `Hardware Offload`_ for details. -That being said, the support of some offload features implies the addition of dedicated status bit(s) and value field(s) into the rte_mbuf -data structure, along with their appropriate handling by the receive/transmit functions exported by each PMD. - -For instance, this is the case for the IEEE1588 packet timestamp mechanism, the VLAN tagging and the IP checksum computation, as described in -the Section 7.6 "Meta Information". - -Configuration of Transmit and Receive Queues -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Configuration of Transmit Queues +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Each transmit queue is independently configured with the following information: @@ -256,6 +249,56 @@ One descriptor in the TX ring is used as a sentinel to avoid a hardware race con When configuring for DCB operation, at port initialization, both the number of transmit queues and the number of receive queues must be set to 128. +Free Tx mbuf on Demand +~~~~~~~~~~~~~~~~~~~~~~ + +Many of the drivers do not release the mbuf back to the mempool, or local cache, +immediately after the packet has been transmitted. +Instead, they leave the mbuf in their Tx ring and +either perform a bulk release when the ``tx_rs_thresh`` has been crossed +or free the mbuf when a slot in the Tx ring is needed. + +An application can request the driver to release used mbufs with the ``rte_eth_tx_done_cleanup()`` API. +This API requests the driver to release mbufs that are no longer in use, +independent of whether or not the ``tx_rs_thresh`` has been crossed. +There are two scenarios when an application may want the mbuf released immediately: + +* When a given packet needs to be sent to multiple destination interfaces + (either for Layer 2 flooding or Layer 3 multi-cast). + One option is to make a copy of the packet or a copy of the header portion that needs to be manipulated. + A second option is to transmit the packet and then poll the ``rte_eth_tx_done_cleanup()`` API + until the reference count on the packet is decremented. + Then the same packet can be transmitted to the next destination interface. + The application is still responsible for managing any packet manipulations needed + between the different destination interfaces, but a packet copy can be avoided. + This API is independent of whether the packet was transmitted or dropped, + only that the mbuf is no longer in use by the interface. + +* Some applications are designed to make multiple runs, like a packet generator. + For performance reasons and consistency between runs, + the application may want to reset back to an initial state + between each run, where all mbufs are returned to the mempool. + In this case, it can call the ``rte_eth_tx_done_cleanup()`` API + for each destination interface it has been using + to request it to release of all its used mbufs. + +To determine if a driver supports this API, check for the *Free Tx mbuf on demand* feature +in the *Network Interface Controller Drivers* document. + +Hardware Offload +~~~~~~~~~~~~~~~~ + +Depending on driver capabilities advertised by +``rte_eth_dev_info_get()``, the PMD may support hardware offloading +feature like checksumming, TCP segmentation or VLAN insertion. + +The support of these offload features implies the addition of dedicated +status bit(s) and value field(s) into the rte_mbuf data structure, along +with their appropriate handling by the receive/transmit functions +exported by each PMD. The list of flags and their precise meaning is +described in the mbuf API documentation and in the in :ref:`Mbuf Library +`, section "Meta Information". + Poll Mode Driver API -------------------- @@ -288,154 +331,64 @@ Ethernet Device API The Ethernet device API exported by the Ethernet PMDs is described in the *DPDK API Reference*. -Vector PMD for IXGBE --------------------- - -Vector PMD uses Intel® SIMD instructions to optimize packet I/O. -It improves load/store bandwidth efficiency of L1 data cache by using a wider SSE/AVX register 1 (1). -The wider register gives space to hold multiple packet buffers so as to save instruction number when processing bulk of packets. - -There is no change to PMD API. The RX/TX handler are the only two entries for vPMD packet I/O. -They are transparently registered at runtime RX/TX execution if all condition checks pass. - -1. To date, only an SSE version of IX GBE vPMD is available. - To ensure that vPMD is in the binary code, ensure that the option CONFIG_RTE_IXGBE_INC_VECTOR=y is in the configure file. - -Some constraints apply as pre-conditions for specific optimizations on bulk packet transfers. -The following sections explain RX and TX constraints in the vPMD. - -RX Constraints -~~~~~~~~~~~~~~ - -Prerequisites and Pre-conditions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The following prerequisites apply: - -* To enable vPMD to work for RX, bulk allocation for Rx must be allowed. - -* The RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y configuration MACRO must be set before compiling the code. - -Ensure that the following pre-conditions are satisfied: - -* rxq->rx_free_thresh >= RTE_PMD_IXGBE_RX_MAX_BURST - -* rxq->rx_free_thresh < rxq->nb_rx_desc - -* (rxq->nb_rx_desc % rxq->rx_free_thresh) == 0 - -* rxq->nb_rx_desc < (IXGBE_MAX_RING_DESC - RTE_PMD_IXGBE_RX_MAX_BURST) - -These conditions are checked in the code. - -Scattered packets are not supported in this mode. -If an incoming packet is greater than the maximum acceptable length of one "mbuf" data size (by default, the size is 2 KB), -vPMD for RX would be disabled. - -By default, IXGBE_MAX_RING_DESC is set to 4096 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32. - -Feature not Supported by RX Vector PMD -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Some features are not supported when trying to increase the throughput in vPMD. -They are: - -* IEEE1588 - -* FDIR - -* Header split - -* RX checksum off load - -Other features are supported using optional MACRO configuration. They include: - -* HW VLAN strip - -* HW extend dual VLAN - -* Enabled by RX_OLFLAGS (RTE_IXGBE_RX_OLFLAGS_DISABLE=n) - - -To guarantee the constraint, configuration flags in dev_conf.rxmode will be checked: - -* hw_vlan_strip - -* hw_vlan_extend - -* hw_ip_checksum - -* header_split - -* dev_conf - -fdir_conf->mode will also be checked. - -RX Burst Size -^^^^^^^^^^^^^ - -As vPMD is focused on high throughput, it assumes that the RX burst size is equal to or greater than 32 per burst. -It returns zero if using nb_pkt < 32 as the expected packet number in the receive handler. - -TX Constraint -~~~~~~~~~~~~~ - -Prerequisite -^^^^^^^^^^^^ - -The only prerequisite is related to tx_rs_thresh. -The tx_rs_thresh value must be greater than or equal to RTE_PMD_IXGBE_TX_MAX_BURST, -but less or equal to RTE_IXGBE_TX_MAX_FREE_BUF_SZ. -Consequently, by default the tx_rs_thresh value is in the range 32 to 64. - -Feature not Supported by RX Vector PMD -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -TX vPMD only works when txq_flags is set to IXGBE_SIMPLE_FLAGS. - -This means that it does not support TX multi-segment, VLAN offload and TX csum offload. -The following MACROs are used for these three features: - -* ETH_TXQ_FLAGS_NOMULTSEGS - -* ETH_TXQ_FLAGS_NOVLANOFFL - -* ETH_TXQ_FLAGS_NOXSUMSCTP - -* ETH_TXQ_FLAGS_NOXSUMUDP - -* ETH_TXQ_FLAGS_NOXSUMTCP - - -Sample Application Notes -~~~~~~~~~~~~~~~~~~~~~~~~ - -testpmd -^^^^^^^ - -By default, using CONFIG_RTE_IXGBE_RX_OLFLAGS_DISABLE=n: - -.. code-block:: console - - ./x86_64-native-linuxapp-gcc/app/testpmd -c 300 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01 - -When CONFIG_RTE_IXGBE_RX_OLFLAGS_DISABLE=y, better performance can be achieved: - -.. code-block:: console - - ./x86_64-native-linuxapp-gcc/app/testpmd -c 300 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01 --disable-hw-vlan - -If scatter gather lists are not required, set CONFIG_RTE_MBUF_SCATTER_GATHER=n for better throughput. - -l3fwd -^^^^^ - -When running l3fwd with vPMD, there is one thing to note. -In the configuration, ensure that port_conf.rxmode.hw_ip_checksum=0. -Otherwise, by default, RX vPMD is disabled. - -load_balancer -^^^^^^^^^^^^^ - -As in the case of l3fwd, set configure port_conf.rxmode.hw_ip_checksum=0 to enable vPMD. -In addition, for improved performance, use -bsz "(32,32),(64,64),(32,32)" in load_balancer to avoid using the default burst size of 144. +Extended Statistics API +~~~~~~~~~~~~~~~~~~~~~~~ + +The extended statistics API allows each individual PMD to expose a unique set +of statistics. Accessing these from application programs is done via two +functions: + +* ``rte_eth_xstats_get``: Fills in an array of ``struct rte_eth_xstat`` + with extended statistics. +* ``rte_eth_xstats_get_names``: Fills in an array of + ``struct rte_eth_xstat_name`` with extended statistic name lookup + information. + +Each ``struct rte_eth_xstat`` contains an identifier and value pair, and +each ``struct rte_eth_xstat_name`` contains a string. Each identifier +within the ``struct rte_eth_xstat`` lookup array must have a corresponding +entry in the ``struct rte_eth_xstat_name`` lookup array. Within the latter +the index of the entry is the identifier the string is associated with. +These identifiers, as well as the number of extended statistic exposed, must +remain constant during runtime. Note that extended statistic identifiers are +driver-specific, and hence might not be the same for different ports. + +A naming scheme exists for the strings exposed to clients of the API. This is +to allow scraping of the API for statistics of interest. The naming scheme uses +strings split by a single underscore ``_``. The scheme is as follows: + +* direction +* detail 1 +* detail 2 +* detail n +* unit + +Examples of common statistics xstats strings, formatted to comply to the scheme +proposed above: + +* ``rx_bytes`` +* ``rx_crc_errors`` +* ``tx_multicast_packets`` + +The scheme, although quite simple, allows flexibility in presenting and reading +information from the statistic strings. The following example illustrates the +naming scheme:``rx_packets``. In this example, the string is split into two +components. The first component ``rx`` indicates that the statistic is +associated with the receive side of the NIC. The second component ``packets`` +indicates that the unit of measure is packets. + +A more complicated example: ``tx_size_128_to_255_packets``. In this example, +``tx`` indicates transmission, ``size`` is the first detail, ``128`` etc are +more details, and ``packets`` indicates that this is a packet counter. + +Some additions in the metadata scheme are as follows: + +* If the first part does not match ``rx`` or ``tx``, the statistic does not + have an affinity with either receive of transmit. + +* If the first letter of the second part is ``q`` and this ``q`` is followed + by a number, this statistic is part of a specific queue. + +An example where queue numbers are used is as follows: ``tx_q7_bytes`` which +indicates this statistic applies to queue number 7, and represents the number +of transmitted bytes on that queue.