X-Git-Url: http://git.droids-corp.org/?a=blobdiff_plain;f=doc%2Fguides%2Fprog_guide%2Fring_lib.rst;h=54e0bb4b686d6348525d43d1048e3c600809a84d;hb=81b0fbb85b3a69912f3fa9be8ed73538ac9df68b;hp=8547b384e3306d44829ed620658076ecc242f87b;hpb=ba9e05cb6b002016b01adf4e8700f206f3d04fd6;p=dpdk.git diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst index 8547b384e3..54e0bb4b68 100644 --- a/doc/guides/prog_guide/ring_lib.rst +++ b/doc/guides/prog_guide/ring_lib.rst @@ -1,32 +1,5 @@ -.. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of Intel Corporation nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2010-2014 Intel Corporation. .. _Ring_Library: @@ -38,7 +11,9 @@ Instead of having a linked list of infinite size, the rte_ring has the following * FIFO -* Maximum size is fixed, the pointers are stored in a table +* Maximum size is fixed, the objects are stored in a table + +* Objects can be pointers or elements of multiple of 4 byte size * Lockless implementation @@ -56,29 +31,28 @@ Instead of having a linked list of infinite size, the rte_ring has the following The advantages of this data structure over a linked list queue are as follows: -* Faster; only requires a single Compare-And-Swap instruction of sizeof(void \*) instead of several double-Compare-And-Swap instructions. +* Faster; only requires a single 32 bit Compare-And-Swap instruction instead of several pointer size Compare-And-Swap instructions. * Simpler than a full lockless queue. * Adapted to bulk enqueue/dequeue operations. - As pointers are stored in a table, a dequeue of several objects will not produce as many cache misses as in a linked queue. + As objects are stored in a table, a dequeue of several objects will not produce as many cache misses as in a linked queue. Also, a bulk dequeue of many objects does not cost more than a dequeue of a simple object. The disadvantages: * Size is fixed -* Having many rings costs more in terms of memory than a linked list queue. An empty ring contains at least N pointers. +* Having many rings costs more in terms of memory than a linked list queue. An empty ring contains at least N objects. A simplified representation of a Ring is shown in with consumer and producer head and tail pointers to objects stored in the data structure. -.. _pg_figure_4: +.. _figure_ring1: -**Figure 4. Ring Structure** +.. figure:: img/ring1.* -.. image5_png has been replaced + Ring Structure -|ring1| References for Ring Implementation in FreeBSD* ---------------------------------------------- @@ -103,21 +77,6 @@ Name A ring is identified by a unique name. It is not possible to create two rings with the same name (rte_ring_create() returns NULL if this is attempted). -Water Marking -~~~~~~~~~~~~~ - -The ring can have a high water mark (threshold). -Once an enqueue operation reaches the high water mark, the producer is notified, if the water mark is configured. - -This mechanism can be used, for example, to exert a back pressure on I/O to inform the LAN to PAUSE. - -Debug -~~~~~ - -When debug is enabled (CONFIG_RTE_LIBRTE_RING_DEBUG is set), -the library stores some per-ring statistic counters about the number of enqueues/dequeues. -These statistics are per-core to avoid concurrent accesses or atomic operations. - Use Cases --------- @@ -155,20 +114,28 @@ The prod_next local variable points to the next element of the table, or several If there is not enough room in the ring (this is detected by checking cons_tail), it returns an error. -.. image6_png has been replaced -|ring-enqueue1| +.. _figure_ring-enqueue1: + +.. figure:: img/ring-enqueue1.* + + Enqueue first step + Enqueue Second Step ^^^^^^^^^^^^^^^^^^^ The second step is to modify *ring->prod_head* in ring structure to point to the same location as prod_next. -A pointer to the added object is copied in the ring (obj4). +The added object is copied in the ring (obj4). -.. image7_png has been replaced -|ring-enqueue2| +.. _figure_ring-enqueue2: + +.. figure:: img/ring-enqueue2.* + + Enqueue second step + Enqueue Last Step ^^^^^^^^^^^^^^^^^ @@ -176,9 +143,13 @@ Enqueue Last Step Once the object is added in the ring, ring->prod_tail in the ring structure is modified to point to the same location as *ring->prod_head*. The enqueue operation is finished. -.. image8_png has been replaced -|ring-enqueue3| +.. _figure_ring-enqueue3: + +.. figure:: img/ring-enqueue3.* + + Enqueue last step + Single Consumer Dequeue ~~~~~~~~~~~~~~~~~~~~~~~ @@ -196,20 +167,28 @@ The cons_next local variable points to the next element of the table, or several If there are not enough objects in the ring (this is detected by checking prod_tail), it returns an error. -.. image9_png has been replaced -|ring-dequeue1| +.. _figure_ring-dequeue1: + +.. figure:: img/ring-dequeue1.* + + Dequeue last step + Dequeue Second Step ^^^^^^^^^^^^^^^^^^^ The second step is to modify ring->cons_head in the ring structure to point to the same location as cons_next. -The pointer to the dequeued object (obj1) is copied in the pointer given by the user. +The dequeued object (obj1) is copied in the pointer given by the user. + -.. image10_png has been replaced +.. _figure_ring-dequeue2: + +.. figure:: img/ring-dequeue2.* + + Dequeue second step -|ring-dequeue2| Dequeue Last Step ^^^^^^^^^^^^^^^^^ @@ -217,9 +196,13 @@ Dequeue Last Step Finally, ring->cons_tail in the ring structure is modified to point to the same location as ring->cons_head. The dequeue operation is finished. -.. image11_png has been replaced -|ring-dequeue3| +.. _figure_ring-dequeue3: + +.. figure:: img/ring-dequeue3.* + + Dequeue last step + Multiple Producers Enqueue ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -229,8 +212,8 @@ In this example, only the producer head and tail (prod_head and prod_tail) are m The initial state is to have a prod_head and prod_tail pointing at the same location. -MC Enqueue First Step -^^^^^^^^^^^^^^^^^^^^^ +Multiple Producers Enqueue First Step +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ On both cores, *ring->prod_head* and ring->cons_tail are copied in local variables. The prod_next local variable points to the next element of the table, @@ -238,12 +221,16 @@ or several elements after in the case of bulk enqueue. If there is not enough room in the ring (this is detected by checking cons_tail), it returns an error. -.. image12_png has been replaced -|ring-mp-enqueue1| +.. _figure_ring-mp-enqueue1: -MC Enqueue Second Step -^^^^^^^^^^^^^^^^^^^^^^ +.. figure:: img/ring-mp-enqueue1.* + + Multiple producer enqueue first step + + +Multiple Producers Enqueue Second Step +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The second step is to modify ring->prod_head in the ring structure to point to the same location as prod_next. This operation is done using a Compare And Swap (CAS) instruction, which does the following operations atomically: @@ -256,48 +243,64 @@ This operation is done using a Compare And Swap (CAS) instruction, which does th In the figure, the operation succeeded on core 1, and step one restarted on core 2. -.. image13_png has been replaced -|ring-mp-enqueue2| +.. _figure_ring-mp-enqueue2: + +.. figure:: img/ring-mp-enqueue2.* + + Multiple producer enqueue second step -MC Enqueue Third Step -^^^^^^^^^^^^^^^^^^^^^ + +Multiple Producers Enqueue Third Step +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The CAS operation is retried on core 2 with success. The core 1 updates one element of the ring(obj4), and the core 2 updates another one (obj5). -.. image14_png has been replaced -|ring-mp-enqueue3| +.. _figure_ring-mp-enqueue3: + +.. figure:: img/ring-mp-enqueue3.* + + Multiple producer enqueue third step -MC Enqueue Fourth Step -^^^^^^^^^^^^^^^^^^^^^^ + +Multiple Producers Enqueue Fourth Step +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Each core now wants to update ring->prod_tail. A core can only update it if ring->prod_tail is equal to the prod_head local variable. This is only true on core 1. The operation is finished on core 1. -.. image15_png has been replaced -|ring-mp-enqueue4| +.. _figure_ring-mp-enqueue4: + +.. figure:: img/ring-mp-enqueue4.* -MC Enqueue Last Step -^^^^^^^^^^^^^^^^^^^^ + Multiple producer enqueue fourth step + + +Multiple Producers Enqueue Last Step +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Once ring->prod_tail is updated by core 1, core 2 is allowed to update it too. The operation is also finished on core 2. -.. image16_png has been replaced -|ring-mp-enqueue5| +.. _figure_ring-mp-enqueue5: + +.. figure:: img/ring-mp-enqueue5.* + + Multiple producer enqueue last step + Modulo 32-bit Indexes ~~~~~~~~~~~~~~~~~~~~~ In the preceding figures, the prod_head, prod_tail, cons_head and cons_tail indexes are represented by arrows. In the actual implementation, these values are not between 0 and size(ring)-1 as would be assumed. -The indexes are between 0 and 2^32 -1, and we mask their value when we access the pointer table (the ring itself). +The indexes are between 0 and 2^32 -1, and we mask their value when we access the object table (the ring itself). 32-bit modulo also implies that operations on indexes (such as, add/subtract) will automatically do 2^32 modulo if the result overflows the 32-bit number range. @@ -309,15 +312,23 @@ The following are two examples that help to explain how indexes are used in a ri In addition, the four indexes are defined as unsigned 16-bit integers, as opposed to unsigned 32-bit integers in the more realistic case. -.. image17_png has been replaced -|ring-modulo1| +.. _figure_ring-modulo1: + +.. figure:: img/ring-modulo1.* + + Modulo 32-bit indexes - Example 1 + This ring contains 11000 entries. -.. image18_png has been replaced -|ring-modulo2| +.. _figure_ring-modulo2: + +.. figure:: img/ring-modulo2.* + + Modulo 32-bit indexes - Example 2 + This ring contains 12536 entries. @@ -338,39 +349,155 @@ even if only the first term of subtraction has overflowed: uint32_t entries = (prod_tail - cons_head); uint32_t free_entries = (mask + cons_tail -prod_head); -References ----------- +Producer/consumer synchronization modes +--------------------------------------- - * `bufring.h in FreeBSD `_ (version 8) +rte_ring supports different synchronization modes for producers and consumers. +These modes can be specified at ring creation/init time via ``flags`` +parameter. +That should help users to configure ring in the most suitable way for his +specific usage scenarios. +Currently supported modes: - * `bufring.c in FreeBSD `_ (version 8) +.. _Ring_Library_MPMC_Mode: - * `Linux Lockless Ring Buffer Design `_ +MP/MC (default one) +~~~~~~~~~~~~~~~~~~~ -.. |ring1| image:: img/ring1.* +Multi-producer (/multi-consumer) mode. This is a default enqueue (/dequeue) +mode for the ring. In this mode multiple threads can enqueue (/dequeue) +objects to (/from) the ring. For 'classic' DPDK deployments (with one thread +per core) this is usually the most suitable and fastest synchronization mode. +As a well known limitation - it can perform quite pure on some overcommitted +scenarios. -.. |ring-enqueue1| image:: img/ring-enqueue1.* +.. _Ring_Library_SPSC_Mode: -.. |ring-enqueue2| image:: img/ring-enqueue2.* +SP/SC +~~~~~ +Single-producer (/single-consumer) mode. In this mode only one thread at a time +is allowed to enqueue (/dequeue) objects to (/from) the ring. + +.. _Ring_Library_MT_RTS_Mode: + +MP_RTS/MC_RTS +~~~~~~~~~~~~~ + +Multi-producer (/multi-consumer) with Relaxed Tail Sync (RTS) mode. +The main difference from the original MP/MC algorithm is that +tail value is increased not by every thread that finished enqueue/dequeue, +but only by the last one. +That allows threads to avoid spinning on ring tail value, +leaving actual tail value change to the last thread at a given instance. +That technique helps to avoid the Lock-Waiter-Preemption (LWP) problem on tail +update and improves average enqueue/dequeue times on overcommitted systems. +To achieve that RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation: +one for head update, second for tail update. +In comparison the original MP/MC algorithm requires one 32-bit CAS +for head update and waiting/spinning on tail value. + +.. _Ring_Library_MT_HTS_Mode: + +MP_HTS/MC_HTS +~~~~~~~~~~~~~ + +Multi-producer (/multi-consumer) with Head/Tail Sync (HTS) mode. +In that mode enqueue/dequeue operation is fully serialized: +at any given moment only one enqueue/dequeue operation can proceed. +This is achieved by allowing a thread to proceed with changing ``head.value`` +only when ``head.value == tail.value``. +Both head and tail values are updated atomically (as one 64-bit value). +To achieve that 64-bit CAS is used by head update routine. +That technique also avoids the Lock-Waiter-Preemption (LWP) problem on tail +update and helps to improve ring enqueue/dequeue behavior in overcommitted +scenarios. Another advantage of fully serialized producer/consumer - +it provides the ability to implement MT safe peek API for rte_ring. + +Ring Peek API +------------- -.. |ring-enqueue3| image:: img/ring-enqueue3.* +For ring with serialized producer/consumer (HTS sync mode) it is possible +to split public enqueue/dequeue API into two phases: -.. |ring-dequeue1| image:: img/ring-dequeue1.* +* enqueue/dequeue start -.. |ring-dequeue2| image:: img/ring-dequeue2.* +* enqueue/dequeue finish -.. |ring-dequeue3| image:: img/ring-dequeue3.* +That allows user to inspect objects in the ring without removing them +from it (aka MT safe peek) and reserve space for the objects in the ring +before actual enqueue. +Note that this API is available only for two sync modes: -.. |ring-mp-enqueue1| image:: img/ring-mp-enqueue1.* +* Single Producer/Single Consumer (SP/SC) -.. |ring-mp-enqueue2| image:: img/ring-mp-enqueue2.* +* Multi-producer/Multi-consumer with Head/Tail Sync (HTS) -.. |ring-mp-enqueue3| image:: img/ring-mp-enqueue3.* +It is a user responsibility to create/init ring with appropriate sync modes +selected. As an example of usage: + +.. code-block:: c -.. |ring-mp-enqueue4| image:: img/ring-mp-enqueue4.* + /* read 1 elem from the ring: */ + uint32_t n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL); + if (n != 0) { + /* examine object */ + if (object_examine(obj) == KEEP) + /* decided to keep it in the ring. */ + rte_ring_dequeue_finish(ring, 0); + else + /* decided to remove it from the ring. */ + rte_ring_dequeue_finish(ring, n); + } -.. |ring-mp-enqueue5| image:: img/ring-mp-enqueue5.* +Note that between ``_start_`` and ``_finish_`` none other thread can proceed +with enqueue(/dequeue) operation till ``_finish_`` completes. -.. |ring-modulo1| image:: img/ring-modulo1.* +Ring Peek Zero Copy API +----------------------- -.. |ring-modulo2| image:: img/ring-modulo2.* +Along with the advantages of the peek APIs, zero copy APIs provide the ability +to copy the data to the ring memory directly without the need for temporary +storage (for ex: array of mbufs on the stack). + +These APIs make it possible to split public enqueue/dequeue API into 3 phases: + +* enqueue/dequeue start + +* copy data to/from the ring + +* enqueue/dequeue finish + +Note that this API is available only for two sync modes: + +* Single Producer/Single Consumer (SP/SC) + +* Multi-producer/Multi-consumer with Head/Tail Sync (HTS) + +It is a user responsibility to create/init ring with appropriate sync modes. +Following is an example of usage: + +.. code-block:: c + + /* Reserve space on the ring */ + n = rte_ring_enqueue_zc_burst_start(r, 32, &zcd, NULL); + /* Pkt I/O core polls packets from the NIC */ + if (n != 0) { + nb_rx = rte_eth_rx_burst(portid, queueid, zcd->ptr1, zcd->n1); + if (nb_rx == zcd->n1 && n != zcd->n1) + nb_rx += rte_eth_rx_burst(portid, queueid, zcd->ptr2, + n - zcd->n1); + /* Provide packets to the packet processing cores */ + rte_ring_enqueue_zc_finish(r, nb_rx); + } + +Note that between ``_start_`` and ``_finish_`` no other thread can proceed +with enqueue(/dequeue) operation till ``_finish_`` completes. + +References +---------- + + * `bufring.h in FreeBSD `_ (version 8) + + * `bufring.c in FreeBSD `_ (version 8) + + * `Linux Lockless Ring Buffer Design `_