doc/guides/sample_app_ug/load_balancer.rst

   1 ..  BSD LICENSE
   2     Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
   3     All rights reserved.
   4
   5     Redistribution and use in source and binary forms, with or without
   6     modification, are permitted provided that the following conditions
   7     are met:
   8
   9     * Redistributions of source code must retain the above copyright
  10     notice, this list of conditions and the following disclaimer.
  11     * Redistributions in binary form must reproduce the above copyright
  12     notice, this list of conditions and the following disclaimer in
  13     the documentation and/or other materials provided with the
  14     distribution.
  15     * Neither the name of Intel Corporation nor the names of its
  16     contributors may be used to endorse or promote products derived
  17     from this software without specific prior written permission.
  18
  19     THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  20     "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  21     LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  22     A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  23     OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  24     SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
  25     LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  26     DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  27     THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  28     (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  29     OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  30
  31 Load Balancer Sample Application
  32 ================================
  33
  34 The Load Balancer sample application demonstrates the concept of isolating the packet I/O task
  35 from the application-specific workload.
  36 Depending on the performance target,
  37 a number of logical cores (lcores) are dedicated to handle the interaction with the NIC ports (I/O lcores),
  38 while the rest of the lcores are dedicated to performing the application processing (worker lcores).
  39 The worker lcores are totally oblivious to the intricacies of the packet I/O activity and
  40 use the NIC-agnostic interface provided by software rings to exchange packets with the I/O cores.
  41
  42 Overview
  43 --------
  44
  45 The architecture of the Load Balance application is presented in the following figure.
  46
  47 .. _figure_load_bal_app_arch:
  48
  49 .. figure:: img/load_bal_app_arch.*
  50
  51    Load Balancer Application Architecture
  52
  53
  54 For the sake of simplicity, the diagram illustrates a specific case of two I/O RX and two I/O TX lcores off loading the packet I/O
  55 overhead incurred by four NIC ports from four worker cores, with each I/O lcore handling RX/TX for two NIC ports.
  56
  57 I/O RX Logical Cores
  58 ~~~~~~~~~~~~~~~~~~~~
  59
  60 Each I/O RX lcore performs packet RX from its assigned NIC RX rings and then distributes the received packets to the worker threads.
  61 The application allows each I/O RX lcore to communicate with any of the worker threads,
  62 therefore each (I/O RX lcore, worker lcore) pair is connected through a dedicated single producer - single consumer software ring.
  63
  64 The worker lcore to handle the current packet is determined by reading a predefined 1-byte field from the input packet:
  65
  66 worker_id = packet[load_balancing_field] % n_workers
  67
  68 Since all the packets that are part of the same traffic flow are expected to have the same value for the load balancing field,
  69 this scheme also ensures that all the packets that are part of the same traffic flow are directed to the same worker lcore (flow affinity)
  70 in the same order they enter the system (packet ordering).
  71
  72 I/O TX Logical Cores
  73 ~~~~~~~~~~~~~~~~~~~~
  74
  75 Each I/O lcore owns the packet TX for a predefined set of NIC ports. To enable each worker thread to send packets to any NIC TX port,
  76 the application creates a software ring for each (worker lcore, NIC TX port) pair,
  77 with each I/O TX core handling those software rings that are associated with NIC ports that it handles.
  78
  79 Worker Logical Cores
  80 ~~~~~~~~~~~~~~~~~~~~
  81
  82 Each worker lcore reads packets from its set of input software rings and
  83 routes them to the NIC ports for transmission by dispatching them to output software rings.
  84 The routing logic is LPM based, with all the worker threads sharing the same LPM rules.
  85
  86 Compiling the Application
  87 -------------------------
  88
  89 The sequence of steps used to build the application is:
  90
  91 #.  Export the required environment variables:
  92
  93     .. code-block:: console
  94
  95         export RTE_SDK=<Path to the DPDK installation folder>
  96         export RTE_TARGET=x86_64-native-linuxapp-gcc
  97
  98 #.  Build the application executable file:
  99
 100     .. code-block:: console
 101
 102         cd ${RTE_SDK}/examples/load_balancer make
 103
 104     For more details on how to build the DPDK libraries and sample applications,
 105     please refer to the *DPDK Getting Started Guide.*
 106
 107 Running the Application
 108 -----------------------
 109
 110 To successfully run the application,
 111 the command line used to start the application has to be in sync with the traffic flows configured on the traffic generator side.
 112
 113 For examples of application command lines and traffic generator flows, please refer to the DPDK Test Report.
 114 For more details on how to set up and run the sample applications provided with DPDK package,
 115 please refer to the *DPDK Getting Started Guide*.
 116
 117 Explanation
 118 -----------
 119
 120 Application Configuration
 121 ~~~~~~~~~~~~~~~~~~~~~~~~~
 122
 123 The application run-time configuration is done through the application command line parameters.
 124 Any parameter that is not specified as mandatory is optional,
 125 with the default value hard-coded in the main.h header file from the application folder.
 126
 127 The list of application command line parameters is listed below:
 128
 129 #.  --rx "(PORT, QUEUE, LCORE), ...": The list of NIC RX ports and queues handled by the I/O RX lcores.
 130     This parameter also implicitly defines the list of I/O RX lcores. This is a mandatory parameter.
 131
 132 #.  --tx "(PORT, LCORE), ... ": The list of NIC TX ports handled by the I/O TX lcores.
 133     This parameter also implicitly defines the list of I/O TX lcores.
 134     This is a mandatory parameter.
 135
 136 #.  --w "LCORE, ...": The list of the worker lcores. This is a mandatory parameter.
 137
 138 #.  --lpm "IP / PREFIX => PORT; ...": The list of LPM rules used by the worker lcores for packet forwarding.
 139     This is a mandatory parameter.
 140
 141 #.  --rsz "A, B, C, D": Ring sizes:
 142
 143     #.  A = The size (in number of buffer descriptors) of each of the NIC RX rings read by the I/O RX lcores.
 144
 145     #.  B = The size (in number of elements) of each of the software rings used by the I/O RX lcores to send packets to worker lcores.
 146
 147     #.  C = The size (in number of elements) of each of the software rings used by the worker lcores to send packets to I/O TX lcores.
 148
 149     #.  D = The size (in number of buffer descriptors) of each of the NIC TX rings written by I/O TX lcores.
 150
 151 #.  --bsz "(A, B), (C, D), (E, F)": Burst sizes:
 152
 153     #.  A = The I/O RX lcore read burst size from NIC RX.
 154
 155     #.  B = The I/O RX lcore write burst size to the output software rings.
 156
 157     #.  C = The worker lcore read burst size from the input software rings.
 158
 159     #.  D = The worker lcore write burst size to the output software rings.
 160
 161     #.  E = The I/O TX lcore read burst size from the input software rings.
 162
 163     #.  F = The I/O TX lcore write burst size to the NIC TX.
 164
 165 #.  --pos-lb POS: The position of the 1-byte field within the input packet used by the I/O RX lcores
 166     to identify the worker lcore for the current packet.
 167     This field needs to be within the first 64 bytes of the input packet.
 168
 169 The infrastructure of software rings connecting I/O lcores and worker lcores is built by the application
 170 as a result of the application configuration provided by the user through the application command line parameters.
 171
 172 A specific lcore performing the I/O RX role for a specific set of NIC ports can also perform the I/O TX role
 173 for the same or a different set of NIC ports.
 174 A specific lcore cannot perform both the I/O role (either RX or TX) and the worker role during the same session.
 175
 176 Example:
 177
 178 .. code-block:: console
 179
 180     ./load_balancer -c 0xf8 -n 4 -- --rx "(0,0,3),(1,0,3)" --tx "(0,3),(1,3)" --w "4,5,6,7" --lpm "1.0.0.0/24=>0; 1.0.1.0/24=>1;" --pos-lb 29
 181
 182 There is a single I/O lcore (lcore 3) that handles RX and TX for two NIC ports (ports 0 and 1) that
 183 handles packets to/from four worker lcores (lcores 4, 5, 6 and 7) that
 184 are assigned worker IDs 0 to 3 (worker ID for lcore 4 is 0, for lcore 5 is 1, for lcore 6 is 2 and for lcore 7 is 3).
 185
 186 Assuming that all the input packets are IPv4 packets with no VLAN label and the source IP address of the current packet is A.B.C.D,
 187 the worker lcore for the current packet is determined by byte D (which is byte 29).
 188 There are two LPM rules that are used by each worker lcore to route packets to the output NIC ports.
 189
 190 The following table illustrates the packet flow through the system for several possible traffic flows:
 191
 192 +------------+----------------+-----------------+------------------------------+--------------+
 193 | **Flow #** | **Source**     | **Destination** | **Worker ID (Worker lcore)** | **Output**   |
 194 |            | **IP Address** | **IP Address**  |                              | **NIC Port** |
 195 |            |                |                 |                              |              |
 196 +============+================+=================+==============================+==============+
 197 | 1          | 0.0.0.0        | 1.0.0.1         | 0 (4)                        | 0            |
 198 |            |                |                 |                              |              |
 199 +------------+----------------+-----------------+------------------------------+--------------+
 200 | 2          | 0.0.0.1        | 1.0.1.2         | 1 (5)                        | 1            |
 201 |            |                |                 |                              |              |
 202 +------------+----------------+-----------------+------------------------------+--------------+
 203 | 3          | 0.0.0.14       | 1.0.0.3         | 2 (6)                        | 0            |
 204 |            |                |                 |                              |              |
 205 +------------+----------------+-----------------+------------------------------+--------------+
 206 | 4          | 0.0.0.15       | 1.0.1.4         | 3 (7)                        | 1            |
 207 |            |                |                 |                              |              |
 208 +------------+----------------+-----------------+------------------------------+--------------+
 209
 210 NUMA Support
 211 ~~~~~~~~~~~~
 212
 213 The application has built-in performance enhancements for the NUMA case:
 214
 215 #.  One buffer pool per each CPU socket.
 216
 217 #.  One LPM table per each CPU socket.
 218
 219 #.  Memory for the NIC RX or TX rings is allocated on the same socket with the lcore handling the respective ring.
 220
 221 In the case where multiple CPU sockets are used in the system,
 222 it is recommended to enable at least one lcore to fulfill the I/O role for the NIC ports that
 223 are directly attached to that CPU socket through the PCI Express* bus.
 224 It is always recommended to handle the packet I/O with lcores from the same CPU socket as the NICs.
 225
 226 Depending on whether the I/O RX lcore (same CPU socket as NIC RX),
 227 the worker lcore and the I/O TX lcore (same CPU socket as NIC TX) handling a specific input packet,
 228 are on the same or different CPU sockets, the following run-time scenarios are possible:
 229
 230 #.  AAA: The packet is received, processed and transmitted without going across CPU sockets.
 231
 232 #.  AAB: The packet is received and processed on socket A,
 233     but as it has to be transmitted on a NIC port connected to socket B,
 234     the packet is sent to socket B through software rings.
 235
 236 #.  ABB: The packet is received on socket A, but as it has to be processed by a worker lcore on socket B,
 237     the packet is sent to socket B through software rings.
 238     The packet is transmitted by a NIC port connected to the same CPU socket as the worker lcore that processed it.
 239
 240 #.  ABC: The packet is received on socket A, it is processed by an lcore on socket B,
 241     then it has to be transmitted out by a NIC connected to socket C.
 242     The performance price for crossing the CPU socket boundary is paid twice for this packet.