From: Mark Kavanagh Date: Sat, 7 Oct 2017 14:56:44 +0000 (+0800) Subject: doc: add GSO programmer's guide X-Git-Tag: spdx-start~1585 X-Git-Url: http://git.droids-corp.org/?a=commitdiff_plain;h=f6010c7655cc;p=dpdk.git doc: add GSO programmer's guide Add programmer's guide doc to explain the design and use of the GSO library. Signed-off-by: Mark Kavanagh Signed-off-by: Jiayu Hu Acked-by: John McNamara Acked-by: Konstantin Ananyev --- diff --git a/MAINTAINERS b/MAINTAINERS index 2f3e106829..8268ae10d0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -655,6 +655,12 @@ M: Jiayu Hu F: lib/librte_gro/ F: doc/guides/prog_guide/generic_receive_offload_lib.rst +Generic Segmentation Offload +M: Jiayu Hu +M: Mark Kavanagh +F: lib/librte_gso/ +F: doc/guides/prog_guide/generic_segmentation_offload_lib.rst + Distributor M: Bruce Richardson M: David Hunt diff --git a/doc/guides/prog_guide/generic_segmentation_offload_lib.rst b/doc/guides/prog_guide/generic_segmentation_offload_lib.rst new file mode 100644 index 0000000000..5e78f1625e --- /dev/null +++ b/doc/guides/prog_guide/generic_segmentation_offload_lib.rst @@ -0,0 +1,256 @@ +.. BSD LICENSE + Copyright(c) 2017 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Generic Segmentation Offload Library +==================================== + +Overview +-------- +Generic Segmentation Offload (GSO) is a widely used software implementation of +TCP Segmentation Offload (TSO), which reduces per-packet processing overhead. +Much like TSO, GSO gains performance by enabling upper layer applications to +process a smaller number of large packets (e.g. MTU size of 64KB), instead of +processing higher numbers of small packets (e.g. MTU size of 1500B), thus +reducing per-packet overhead. + +For example, GSO allows guest kernel stacks to transmit over-sized TCP segments +that far exceed the kernel interface's MTU; this eliminates the need to segment +packets within the guest, and improves the data-to-overhead ratio of both the +guest-host link, and PCI bus. The expectation of the guest network stack in this +scenario is that segmentation of egress frames will take place either in the NIC +HW, or where that hardware capability is unavailable, either in the host +application, or network stack. + +Bearing that in mind, the GSO library enables DPDK applications to segment +packets in software. Note however, that GSO is implemented as a standalone +library, and not via a 'fallback' mechanism (i.e. for when TSO is unsupported +in the underlying hardware); that is, applications must explicitly invoke the +GSO library to segment packets. The size of GSO segments ``(segsz)`` is +configurable by the application. + +Limitations +----------- + +#. The GSO library doesn't check if input packets have correct checksums. + +#. In addition, the GSO library doesn't re-calculate checksums for segmented + packets (that task is left to the application). + +#. IP fragments are unsupported by the GSO library. + +#. The egress interface's driver must support multi-segment packets. + +#. Currently, the GSO library supports the following IPv4 packet types: + + - TCP + - VxLAN + - GRE + + See `Supported GSO Packet Types`_ for further details. + +Packet Segmentation +------------------- + +The ``rte_gso_segment()`` function is the GSO library's primary +segmentation API. + +Before performing segmentation, an application must create a GSO context object +``(struct rte_gso_ctx)``, which provides the library with some of the +information required to understand how the packet should be segmented. Refer to +`How to Segment a Packet`_ for additional details on same. Once the GSO context +has been created, and populated, the application can then use the +``rte_gso_segment()`` function to segment packets. + +The GSO library typically stores each segment that it creates in two parts: the +first part contains a copy of the original packet's headers, while the second +part contains a pointer to an offset within the original packet. This mechanism +is explained in more detail in `GSO Output Segment Format`_. + +The GSO library supports both single- and multi-segment input mbufs. + +GSO Output Segment Format +~~~~~~~~~~~~~~~~~~~~~~~~~ +To reduce the number of expensive memcpy operations required when segmenting a +packet, the GSO library typically stores each segment that it creates as a +two-part mbuf (technically, this is termed a 'two-segment' mbuf; however, since +the elements produced by the API are also called 'segments', for clarity the +term 'part' is used here instead). + +The first part of each output segment is a direct mbuf and contains a copy of +the original packet's headers, which must be prepended to each output segment. +These headers are copied from the original packet into each output segment. + +The second part of each output segment, represents a section of data from the +original packet, i.e. a data segment. Rather than copy the data directly from +the original packet into the output segment (which would impact performance +considerably), the second part of each output segment is an indirect mbuf, +which contains no actual data, but simply points to an offset within the +original packet. + +The combination of the 'header' segment and the 'data' segment constitutes a +single logical output GSO segment of the original packet. This is illustrated +in :numref:`figure_gso-output-segment-format`. + +.. _figure_gso-output-segment-format: + +.. figure:: img/gso-output-segment-format.svg + :align: center + + Two-part GSO output segment + +In one situation, the output segment may contain additional 'data' segments. +This only occurs when: + +- the input packet on which GSO is to be performed is represented by a + multi-segment mbuf. + +- the output segment is required to contain data that spans the boundaries + between segments of the input multi-segment mbuf. + +The GSO library traverses each segment of the input packet, and produces +numerous output segments; for optimal performance, the number of output +segments is kept to a minimum. Consequently, the GSO library maximizes the +amount of data contained within each output segment; i.e. each output segment +``segsz`` bytes of data. The only exception to this is in the case of the very +final output segment; if ``pkt_len`` % ``segsz``, then the final segment is +smaller than the rest. + +In order for an output segment to meet its MSS, it may need to include data from +multiple input segments. Due to the nature of indirect mbufs (each indirect mbuf +can point to only one direct mbuf), the solution here is to add another indirect +mbuf to the output segment; this additional segment then points to the next +input segment. If necessary, this chaining process is repeated, until the sum of +all of the data 'contained' in the output segment reaches ``segsz``. This +ensures that the amount of data contained within each output segment is uniform, +with the possible exception of the last segment, as previously described. + +:numref:`figure_gso-three-seg-mbuf` illustrates an example of a three-part +output segment. In this example, the output segment needs to include data from +the end of one input segment, and the beginning of another. To achieve this, +an additional indirect mbuf is chained to the second part of the output segment, +and is attached to the next input segment (i.e. it points to the data in the +next input segment). + +.. _figure_gso-three-seg-mbuf: + +.. figure:: img/gso-three-seg-mbuf.svg + :align: center + + Three-part GSO output segment + +Supported GSO Packet Types +-------------------------- + +TCP/IPv4 GSO +~~~~~~~~~~~~ +TCP/IPv4 GSO supports segmentation of suitably large TCP/IPv4 packets, which +may also contain an optional VLAN tag. + +VxLAN GSO +~~~~~~~~~ +VxLAN packets GSO supports segmentation of suitably large VxLAN packets, +which contain an outer IPv4 header, inner TCP/IPv4 headers, and optional +inner and/or outer VLAN tag(s). + +GRE GSO +~~~~~~~ +GRE GSO supports segmentation of suitably large GRE packets, which contain +an outer IPv4 header, inner TCP/IPv4 headers, and an optional VLAN tag. + +How to Segment a Packet +----------------------- + +To segment an outgoing packet, an application must: + +#. First create a GSO context ``(struct rte_gso_ctx)``; this contains: + + - a pointer to the mbuf pool for allocating the direct buffers, which are + used to store the GSO segments' packet headers. + + - a pointer to the mbuf pool for allocating indirect buffers, which are + used to locate GSO segments' packet payloads. + +.. note:: + + An application may use the same pool for both direct and indirect + buffers. However, since each indirect mbuf simply stores a pointer, the + application may reduce its memory consumption by creating a separate memory + pool, containing smaller elements, for the indirect pool. + + - the size of each output segment, including packet headers and payload, + measured in bytes. + + - the bit mask of required GSO types. The GSO library uses the same macros as + those that describe a physical device's TX offloading capabilities (i.e. + ``DEV_TX_OFFLOAD_*_TSO``) for gso_types. For example, if an application + wants to segment TCP/IPv4 packets, it should set gso_types to + ``DEV_TX_OFFLOAD_TCP_TSO``. The only other supported values currently + supported for gso_types are ``DEV_TX_OFFLOAD_VXLAN_TNL_TSO``, and + ``DEV_TX_OFFLOAD_GRE_TNL_TSO``; a combination of these macros is also + allowed. + + - a flag, that indicates whether the IPv4 headers of output segments should + contain fixed or incremental ID values. + +2. Set the appropriate ol_flags in the mbuf. + + - The GSO library use the value of an mbuf's ``ol_flags`` attribute to + to determine how a packet should be segmented. It is the application's + responsibility to ensure that these flags are set. + + - For example, in order to segment TCP/IPv4 packets, the application should + add the ``PKT_TX_IPV4`` and ``PKT_TX_TCP_SEG`` flags to the mbuf's + ol_flags. + + - If checksum calculation in hardware is required, the application should + also add the ``PKT_TX_TCP_CKSUM`` and ``PKT_TX_IP_CKSUM`` flags. + +#. Check if the packet should be processed. Packets with one of the + following properties are not processed and are returned immediately: + + - Packet length is less than ``segsz`` (i.e. GSO is not required). + + - Packet type is not supported by GSO library (see + `Supported GSO Packet Types`_). + + - Application has not enabled GSO support for the packet type. + + - Packet's ol_flags have been incorrectly set. + +#. Allocate space in which to store the output GSO segments. If the amount of + space allocated by the application is insufficient, segmentation will fail. + +#. Invoke the GSO segmentation API, ``rte_gso_segment()``. + +#. If required, update the L3 and L4 checksums of the newly-created segments. + For tunneled packets, the outer IPv4 headers' checksums should also be + updated. Alternatively, the application may offload checksum calculation + to HW. + diff --git a/doc/guides/prog_guide/img/gso-output-segment-format.svg b/doc/guides/prog_guide/img/gso-output-segment-format.svg new file mode 100644 index 0000000000..bdb5ec3325 --- /dev/null +++ b/doc/guides/prog_guide/img/gso-output-segment-format.svg @@ -0,0 +1,313 @@ + + + + + + + + + + Page-1 + + + Sheet.3 + + + + Sheet.4 + + + + Sheet.5 + + + + Sheet.6 + + + + Sheet.7 + + + + Sheet.10 + + + + Sheet.11 + + + + Sheet.12 + Payload 0 + + + + Payload 0 + + Sheet.13 + Payload 1 + + + + Payload 1 + + Sheet.14 + Payload 2 + + + + Payload 2 + + Sheet.15 + + + + Sheet.16 + + + + Sheet.17 + + + + Sheet.18 + Header + + + + Header + + Sheet.19 + + + + Sheet.20 + + + + Sheet.21 + Header + + + + Header + + Sheet.22 + + + + Sheet.23 + + + + Sheet.24 + Payload 1 + + + + Payload 1 + + Sheet.25 + + + + Sheet.26 + + + + Sheet.27 + Indirect mbuf + + + + Indirect mbuf + + Sheet.28 + (pointer to data) + + + + (pointer to data) + + Sheet.29 + + + + Sheet.30 + Memory copy + + + + Memory copy + + Sheet.31 + No Memory Copy + + + + No Memory Copy + + Sheet.32 + Logical output segment + + + + Logical output segment + + Sheet.36 + Two-part output segment + + + + Two-part output segment + + Sheet.37 + + + + Sheet.38 + + + + Sheet.39 + + + + Sheet.40 + + + + Sheet.41 + Direct mbuf + + + + Direct mbuf + + Sheet.42 + (copy of headers) + + + + (copy of headers) + + Sheet.43 + next + + + + next + + Sheet.44 + + + + Sheet.45 + segsz + + + + segsz + + Sheet.46 + Input packet + + + + Input packet + + diff --git a/doc/guides/prog_guide/img/gso-three-seg-mbuf.svg b/doc/guides/prog_guide/img/gso-three-seg-mbuf.svg new file mode 100644 index 0000000000..f18a327d17 --- /dev/null +++ b/doc/guides/prog_guide/img/gso-three-seg-mbuf.svg @@ -0,0 +1,477 @@ + + + + + GSO three-part output segment + + + + + + Page-1 + + + + + Sheet.111 + + + + Sheet.110 + + + + Sheet.4 + + + + Sheet.5 + + + + Sheet.6 + + + + Sheet.7 + + + + Sheet.10 + + + + Sheet.11 + + + + Sheet.12 + Payload 0 + + + + Payload 0 + + Sheet.13 + Payload 1 + + + + Payload 1 + + Sheet.15 + + + + Sheet.16 + + + + Sheet.17 + Header + + + + Header + + Sheet.23 + + + + Sheet.24 + + + + Sheet.25 + Header + + + + Header + + Sheet.31 + + + + Sheet.32 + + + + Sheet.33 + Payload 1 + + + + Payload 1 + + Sheet.35 + Logical output segment + + + + Logical output segment + + Sheet.38 + Three-part output segment + + + + Three-part output segment + + Sheet.39 + + + + Sheet.40 + + + + Sheet.46 + + + + Sheet.47 + + + + Sheet.48 + Direct mbuf + + + + Direct mbuf + + Sheet.51 + (copy of headers) + + + + (copy of headers) + + Sheet.53 + next + + + + next + + Sheet.54 + + + + Sheet.55 + segsz + + + + segsz + + Sheet.56 + + + + Sheet.57 + + + + Sheet.58 + + + + Sheet.59 + Payload 1 + + + + Payload 1 + + Sheet.60 + Payload 2 + + + + Payload 2 + + Sheet.63 + Multi-segment input packet + + + + Multi-segment input packet + + Sheet.70 + + + + Sheet.71 + + + + Sheet.72 + Indirect mbuf + + + + Indirect mbuf + + Sheet.75 + (pointer to data) + + + + (pointer to data) + + Sheet.77 + next + + + + next + + Sheet.78 + + + + Sheet.79 + + + + Sheet.80 + + + + Sheet.81 + pkt_len + + + + pkt_len + + Sheet.82 + % segsz + + + + % segsz + + Sheet.34 + + + + Sheet.85 + + + + Sheet.87 + 1 + + + + 1 + + Sheet.88 + + + + Sheet.90 + 2 + + + + 2 + + Sheet.95 + next + + + + next + + Sheet.97 + + + + Sheet.100 + + + + Sheet.104 + + + + Sheet.105 + Indirect mbuf + + + + Indirect mbuf + + Sheet.106 + (pointer to data) + + + + (pointer to data) + + Sheet.107 + + + + Sheet.108 + 3 + + + + 3 + + Sheet.109 + (pointer to data) + + + + (pointer to data) + + diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst index 7ff514407f..b5ad6b8385 100644 --- a/doc/guides/prog_guide/index.rst +++ b/doc/guides/prog_guide/index.rst @@ -57,6 +57,7 @@ Programmer's Guide reorder_lib ip_fragment_reassembly_lib generic_receive_offload_lib + generic_segmentation_offload_lib pdump_lib multi_proc_support kernel_nic_interface