doc/guides/prog_guide/generic_receive_offload_lib.rst

   1 ..  BSD LICENSE
   2     Copyright(c) 2017 Intel Corporation. All rights reserved.
   3     All rights reserved.
   4
   5     Redistribution and use in source and binary forms, with or without
   6     modification, are permitted provided that the following conditions
   7     are met:
   8
   9     * Redistributions of source code must retain the above copyright
  10     notice, this list of conditions and the following disclaimer.
  11     * Redistributions in binary form must reproduce the above copyright
  12     notice, this list of conditions and the following disclaimer in
  13     the documentation and/or other materials provided with the
  14     distribution.
  15     * Neither the name of Intel Corporation nor the names of its
  16     contributors may be used to endorse or promote products derived
  17     from this software without specific prior written permission.
  18
  19     THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  20     "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  21     LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  22     A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  23     OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  24     SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
  25     LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  26     DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  27     THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  28     (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  29     OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  30
  31 Generic Receive Offload Library
  32 ===============================
  33
  34 Generic Receive Offload (GRO) is a widely used SW-based offloading
  35 technique to reduce per-packet processing overheads. By reassembling
  36 small packets into larger ones, GRO enables applications to process
  37 fewer large packets directly, thus reducing the number of packets to
  38 be processed. To benefit DPDK-based applications, like Open vSwitch,
  39 DPDK also provides own GRO implementation. In DPDK, GRO is implemented
  40 as a standalone library. Applications explicitly use the GRO library to
  41 reassemble packets.
  42
  43 Overview
  44 --------
  45
  46 In the GRO library, there are many GRO types which are defined by packet
  47 types. One GRO type is in charge of process one kind of packets. For
  48 example, TCP/IPv4 GRO processes TCP/IPv4 packets.
  49
  50 Each GRO type has a reassembly function, which defines own algorithm and
  51 table structure to reassemble packets. We assign input packets to the
  52 corresponding GRO functions by MBUF->packet_type.
  53
  54 The GRO library doesn't check if input packets have correct checksums and
  55 doesn't re-calculate checksums for merged packets. The GRO library
  56 assumes the packets are complete (i.e., MF==0 && frag_off==0), when IP
  57 fragmentation is possible (i.e., DF==0). Additionally, it complies RFC
  58 6864 to process the IPv4 ID field.
  59
  60 Currently, the GRO library provides GRO supports for TCP/IPv4 packets.
  61
  62 Two Sets of API
  63 ---------------
  64
  65 For different usage scenarios, the GRO library provides two sets of API.
  66 The one is called the lightweight mode API, which enables applications to
  67 merge a small number of packets rapidly; the other is called the
  68 heavyweight mode API, which provides fine-grained controls to
  69 applications and supports to merge a large number of packets.
  70
  71 Lightweight Mode API
  72 ~~~~~~~~~~~~~~~~~~~~
  73
  74 The lightweight mode only has one function ``rte_gro_reassemble_burst()``,
  75 which process N packets at a time. Using the lightweight mode API to
  76 merge packets is very simple. Calling ``rte_gro_reassemble_burst()`` is
  77 enough. The GROed packets are returned to applications as soon as it
  78 finishes.
  79
  80 In ``rte_gro_reassemble_burst()``, table structures of different GRO
  81 types are allocated in the stack. This design simplifies applications'
  82 operations. However, limited by the stack size, the maximum number of
  83 packets that ``rte_gro_reassemble_burst()`` can process in an invocation
  84 should be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``.
  85
  86 Heavyweight Mode API
  87 ~~~~~~~~~~~~~~~~~~~~
  88
  89 Compared with the lightweight mode, using the heavyweight mode API is
  90 relatively complex. Firstly, applications need to create a GRO context
  91 by ``rte_gro_ctx_create()``. ``rte_gro_ctx_create()`` allocates tables
  92 structures in the heap and stores their pointers in the GRO context.
  93 Secondly, applications use ``rte_gro_reassemble()`` to merge packets.
  94 If input packets have invalid parameters, ``rte_gro_reassemble()``
  95 returns them to applications. For example, packets of unsupported GRO
  96 types or TCP SYN packets are returned. Otherwise, the input packets are
  97 either merged with the existed packets in the tables or inserted into the
  98 tables. Finally, applications use ``rte_gro_timeout_flush()`` to flush
  99 packets from the tables, when they want to get the GROed packets.
 100
 101 Note that all update/lookup operations on the GRO context are not thread
 102 safe. So if different processes or threads want to access the same
 103 context object simultaneously, some external syncing mechanisms must be
 104 used.
 105
 106 Reassembly Algorithm
 107 --------------------
 108
 109 The reassembly algorithm is used for reassembling packets. In the GRO
 110 library, different GRO types can use different algorithms. In this
 111 section, we will introduce an algorithm, which is used by TCP/IPv4 GRO.
 112
 113 Challenges
 114 ~~~~~~~~~~
 115
 116 The reassembly algorithm determines the efficiency of GRO. There are two
 117 challenges in the algorithm design:
 118
 119 - a high cost algorithm/implementation would cause packet dropping in a
 120   high speed network.
 121
 122 - packet reordering makes it hard to merge packets. For example, Linux
 123   GRO fails to merge packets when encounters packet reordering.
 124
 125 The above two challenges require our algorithm is:
 126
 127 - lightweight enough to scale fast networking speed
 128
 129 - capable of handling packet reordering
 130
 131 In DPDK GRO, we use a key-based algorithm to address the two challenges.
 132
 133 Key-based Reassembly Algorithm
 134 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 135
 136 :numref:`figure_gro-key-algorithm` illustrates the procedure of the
 137 key-based algorithm. Packets are classified into "flows" by some header
 138 fields (we call them as "key"). To process an input packet, the algorithm
 139 searches for a matched "flow" (i.e., the same value of key) for the
 140 packet first, then checks all packets in the "flow" and tries to find a
 141 "neighbor" for it. If find a "neighbor", merge the two packets together.
 142 If can't find a "neighbor", store the packet into its "flow". If can't
 143 find a matched "flow", insert a new "flow" and store the packet into the
 144 "flow".
 145
 146 .. note::
 147         Packets in the same "flow" that can't merge are always caused
 148         by packet reordering.
 149
 150 The key-based algorithm has two characters:
 151
 152 - classifying packets into "flows" to accelerate packet aggregation is
 153   simple (address challenge 1).
 154
 155 - storing out-of-order packets makes it possible to merge later (address
 156   challenge 2).
 157
 158 .. _figure_gro-key-algorithm:
 159
 160 .. figure:: img/gro-key-algorithm.*
 161    :align: center
 162
 163    Key-based Reassembly Algorithm
 164
 165 TCP/IPv4 GRO
 166 ------------
 167
 168 The table structure used by TCP/IPv4 GRO contains two arrays: flow array
 169 and item array. The flow array keeps flow information, and the item array
 170 keeps packet information.
 171
 172 Header fields used to define a TCP/IPv4 flow include:
 173
 174 - source and destination: Ethernet and IP address, TCP port
 175
 176 - TCP acknowledge number
 177
 178 TCP/IPv4 packets whose FIN, SYN, RST, URG, PSH, ECE or CWR bit is set
 179 won't be processed.
 180
 181 Header fields deciding if two packets are neighbors include:
 182
 183 - TCP sequence number
 184
 185 - IPv4 ID. The IPv4 ID fields of the packets, whose DF bit is 0, should
 186   be increased by 1.
 187
 188 .. note::
 189         We comply RFC 6864 to process the IPv4 ID field. Specifically,
 190         we check IPv4 ID fields for the packets whose DF bit is 0 and
 191         ignore IPv4 ID fields for the packets whose DF bit is 1.
 192         Additionally, packets which have different value of DF bit can't
 193         be merged.