doc/guides/prog_guide/mbuf_lib.rst

   1 ..  SPDX-License-Identifier: BSD-3-Clause
   2     Copyright(c) 2010-2014 Intel Corporation.
   3
   4 .. _Mbuf_Library:
   5
   6 Mbuf Library
   7 ============
   8
   9 The mbuf library provides the ability to allocate and free buffers (mbufs)
  10 that may be used by the DPDK application to store message buffers.
  11 The message buffers are stored in a mempool, using the :ref:`Mempool Library <Mempool_Library>`.
  12
  13 A rte_mbuf struct generally carries network packet buffers, but it can actually
  14 be any data (control data, events, ...).
  15 The rte_mbuf header structure is kept as small as possible and currently uses
  16 just two cache lines, with the most frequently used fields being on the first
  17 of the two cache lines.
  18
  19 Design of Packet Buffers
  20 ------------------------
  21
  22 For the storage of the packet data (including protocol headers), two approaches were considered:
  23
  24 #.  Embed metadata within a single memory buffer the structure followed by a fixed size area for the packet data.
  25
  26 #.  Use separate memory buffers for the metadata structure and for the packet data.
  27
  28 The advantage of the first method is that it only needs one operation to allocate/free the whole memory representation of a packet.
  29 On the other hand, the second method is more flexible and allows
  30 the complete separation of the allocation of metadata structures from the allocation of packet data buffers.
  31
  32 The first method was chosen for the DPDK.
  33 The metadata contains control information such as message type, length,
  34 offset to the start of the data and a pointer for additional mbuf structures allowing buffer chaining.
  35
  36 Message buffers that are used to carry network packets can handle buffer chaining
  37 where multiple buffers are required to hold the complete packet.
  38 This is the case for jumbo frames that are composed of many mbufs linked together through their next field.
  39
  40 For a newly allocated mbuf, the area at which the data begins in the message buffer is
  41 RTE_PKTMBUF_HEADROOM bytes after the beginning of the buffer, which is cache aligned.
  42 Message buffers may be used to carry control information, packets, events,
  43 and so on between different entities in the system.
  44 Message buffers may also use their buffer pointers to point to other message buffer data sections or other structures.
  45
  46 :numref:`figure_mbuf1` and :numref:`figure_mbuf2` show some of these scenarios.
  47
  48 .. _figure_mbuf1:
  49
  50 .. figure:: img/mbuf1.*
  51
  52    An mbuf with One Segment
  53
  54
  55 .. _figure_mbuf2:
  56
  57 .. figure:: img/mbuf2.*
  58
  59    An mbuf with Three Segments
  60
  61
  62 The Buffer Manager implements a fairly standard set of buffer access functions to manipulate network packets.
  63
  64 Buffers Stored in Memory Pools
  65 ------------------------------
  66
  67 The Buffer Manager uses the :ref:`Mempool Library <Mempool_Library>` to allocate buffers.
  68 Therefore, it ensures that the packet header is interleaved optimally across the channels and ranks for L3 processing.
  69 An mbuf contains a field indicating the pool that it originated from.
  70 When calling rte_pktmbuf_free(m), the mbuf returns to its original pool.
  71
  72 Constructors
  73 ------------
  74
  75 Packet mbuf constructors are provided by the API.
  76 The rte_pktmbuf_init() function initializes some fields in the mbuf structure that
  77 are not modified by the user once created (mbuf type, origin pool, buffer start address, and so on).
  78 This function is given as a callback function to the rte_mempool_create() function at pool creation time.
  79
  80 Allocating and Freeing mbufs
  81 ----------------------------
  82
  83 Allocating a new mbuf requires the user to specify the mempool from which the mbuf should be taken.
  84 For any newly-allocated mbuf, it contains one segment, with a length of 0.
  85 The offset to data is initialized to have some bytes of headroom in the buffer (RTE_PKTMBUF_HEADROOM).
  86
  87 Freeing a mbuf means returning it into its original mempool.
  88 The content of an mbuf is not modified when it is stored in a pool (as a free mbuf).
  89 Fields initialized by the constructor do not need to be re-initialized at mbuf allocation.
  90
  91 When freeing a packet mbuf that contains several segments, all of them are freed and returned to their original mempool.
  92
  93 Manipulating mbufs
  94 ------------------
  95
  96 This library provides some functions for manipulating the data in a packet mbuf. For instance:
  97
  98     *  Get data length
  99
 100     *  Get a pointer to the start of data
 101
 102     *  Prepend data before data
 103
 104     *   Append data after data
 105
 106     *   Remove data at the beginning of the buffer (rte_pktmbuf_adj())
 107
 108     *   Remove data at the end of the buffer (rte_pktmbuf_trim()) Refer to the *DPDK API Reference* for details.
 109
 110 Meta Information
 111 ----------------
 112
 113 Some information is retrieved by the network driver and stored in an mbuf to make processing easier.
 114 For instance, the VLAN, the RSS hash result (see :ref:`Poll Mode Driver <Poll_Mode_Driver>`)
 115 and a flag indicating that the checksum was computed by hardware.
 116
 117 An mbuf also contains the input port (where it comes from), and the number of segment mbufs in the chain.
 118
 119 For chained buffers, only the first mbuf of the chain stores this meta information.
 120
 121 For instance, this is the case on RX side for the IEEE1588 packet
 122 timestamp mechanism, the VLAN tagging and the IP checksum computation.
 123
 124 On TX side, it is also possible for an application to delegate some
 125 processing to the hardware if it supports it. For instance, the
 126 PKT_TX_IP_CKSUM flag allows to offload the computation of the IPv4
 127 checksum.
 128
 129 The following examples explain how to configure different TX offloads on
 130 a vxlan-encapsulated tcp packet:
 131 ``out_eth/out_ip/out_udp/vxlan/in_eth/in_ip/in_tcp/payload``
 132
 133 - calculate checksum of out_ip::
 134
 135     mb->l2_len = len(out_eth)
 136     mb->l3_len = len(out_ip)
 137     mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM
 138     set out_ip checksum to 0 in the packet
 139
 140   This is supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM.
 141
 142 - calculate checksum of out_ip and out_udp::
 143
 144     mb->l2_len = len(out_eth)
 145     mb->l3_len = len(out_ip)
 146     mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM
 147     set out_ip checksum to 0 in the packet
 148     set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum()
 149
 150   This is supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM
 151   and DEV_TX_OFFLOAD_UDP_CKSUM.
 152
 153 - calculate checksum of in_ip::
 154
 155     mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth)
 156     mb->l3_len = len(in_ip)
 157     mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM
 158     set in_ip checksum to 0 in the packet
 159
 160   This is similar to case 1), but l2_len is different. It is supported
 161   on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM.
 162   Note that it can only work if outer L4 checksum is 0.
 163
 164 - calculate checksum of in_ip and in_tcp::
 165
 166     mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth)
 167     mb->l3_len = len(in_ip)
 168     mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_TCP_CKSUM
 169     set in_ip checksum to 0 in the packet
 170     set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum()
 171
 172   This is similar to case 2), but l2_len is different. It is supported
 173   on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM and
 174   DEV_TX_OFFLOAD_TCP_CKSUM.
 175   Note that it can only work if outer L4 checksum is 0.
 176
 177 - segment inner TCP::
 178
 179     mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth)
 180     mb->l3_len = len(in_ip)
 181     mb->l4_len = len(in_tcp)
 182     mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM |
 183       PKT_TX_TCP_SEG;
 184     set in_ip checksum to 0 in the packet
 185     set in_tcp checksum to pseudo header without including the IP
 186       payload length using rte_ipv4_phdr_cksum()
 187
 188   This is supported on hardware advertising DEV_TX_OFFLOAD_TCP_TSO.
 189   Note that it can only work if outer L4 checksum is 0.
 190
 191 - calculate checksum of out_ip, in_ip, in_tcp::
 192
 193     mb->outer_l2_len = len(out_eth)
 194     mb->outer_l3_len = len(out_ip)
 195     mb->l2_len = len(out_udp + vxlan + in_eth)
 196     mb->l3_len = len(in_ip)
 197     mb->ol_flags |= PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CKSUM  | \
 198       PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
 199     set out_ip checksum to 0 in the packet
 200     set in_ip checksum to 0 in the packet
 201     set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum()
 202
 203   This is supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM,
 204   DEV_TX_OFFLOAD_UDP_CKSUM and DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM.
 205
 206 The list of flags and their precise meaning is described in the mbuf API
 207 documentation (rte_mbuf.h). Also refer to the testpmd source code
 208 (specifically the csumonly.c file) for details.
 209
 210 Dynamic fields and flags
 211 ~~~~~~~~~~~~~~~~~~~~~~~~
 212
 213 The size of the mbuf is constrained and limited;
 214 while the amount of metadata to save for each packet is quite unlimited.
 215 The most basic networking information already find their place
 216 in the existing mbuf fields and flags.
 217
 218 If new features need to be added, the new fields and flags should fit
 219 in the "dynamic space", by registering some room in the mbuf structure:
 220
 221 dynamic field
 222    named area in the mbuf structure,
 223    with a given size (at least 1 byte) and alignment constraint.
 224
 225 dynamic flag
 226    named bit in the mbuf structure,
 227    stored in the field ``ol_flags``.
 228
 229 The dynamic fields and flags are managed with the functions ``rte_mbuf_dyn*``.
 230
 231 It is not possible to unregister fields or flags.
 232
 233 .. _direct_indirect_buffer:
 234
 235 Direct and Indirect Buffers
 236 ---------------------------
 237
 238 A direct buffer is a buffer that is completely separate and self-contained.
 239 An indirect buffer behaves like a direct buffer but for the fact that the buffer pointer and
 240 data offset in it refer to data in another direct buffer.
 241 This is useful in situations where packets need to be duplicated or fragmented,
 242 since indirect buffers provide the means to reuse the same packet data across multiple buffers.
 243
 244 A buffer becomes indirect when it is "attached" to a direct buffer using the rte_pktmbuf_attach() function.
 245 Each buffer has a reference counter field and whenever an indirect buffer is attached to the direct buffer,
 246 the reference counter on the direct buffer is incremented.
 247 Similarly, whenever the indirect buffer is detached, the reference counter on the direct buffer is decremented.
 248 If the resulting reference counter is equal to 0, the direct buffer is freed since it is no longer in use.
 249
 250 There are a few things to remember when dealing with indirect buffers.
 251 First of all, an indirect buffer is never attached to another indirect buffer.
 252 Attempting to attach buffer A to indirect buffer B that is attached to C, makes rte_pktmbuf_attach() automatically attach A to C, effectively cloning B.
 253 Secondly, for a buffer to become indirect, its reference counter must be equal to 1,
 254 that is, it must not be already referenced by another indirect buffer.
 255 Finally, it is not possible to reattach an indirect buffer to the direct buffer (unless it is detached first).
 256
 257 While the attach/detach operations can be invoked directly using the recommended rte_pktmbuf_attach() and rte_pktmbuf_detach() functions,
 258 it is suggested to use the higher-level rte_pktmbuf_clone() function,
 259 which takes care of the correct initialization of an indirect buffer and can clone buffers with multiple segments.
 260
 261 Since indirect buffers are not supposed to actually hold any data,
 262 the memory pool for indirect buffers should be configured to indicate the reduced memory consumption.
 263 Examples of the initialization of a memory pool for indirect buffers (as well as use case examples for indirect buffers)
 264 can be found in several of the sample applications, for example, the IPv4 Multicast sample application.
 265
 266 Debug
 267 -----
 268
 269 In debug mode, the functions of the mbuf library perform sanity checks before any operation (such as, buffer corruption,
 270 bad type, and so on).
 271
 272 Use Cases
 273 ---------
 274
 275 All networking application should use mbufs to transport network packets.