2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
5 Redistribution and use in source and binary forms, with or without
6 modification, are permitted provided that the following conditions
9 * Redistributions of source code must retain the above copyright
10 notice, this list of conditions and the following disclaimer.
11 * Redistributions in binary form must reproduce the above copyright
12 notice, this list of conditions and the following disclaimer in
13 the documentation and/or other materials provided with the
15 * Neither the name of Intel Corporation nor the names of its
16 contributors may be used to endorse or promote products derived
17 from this software without specific prior written permission.
19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31 Quota and Watermark Sample Application
32 ======================================
34 The Quota and Watermark sample application is a simple example of packet processing using Intel® Data Plane Development Kit (Intel® DPDK) that
35 showcases the use of a quota as the maximum number of packets enqueue/dequeue at a time and low and high watermarks
36 to signal low and high ring usage respectively.
38 Additionally, it shows how ring watermarks can be used to feedback congestion notifications to data producers by
39 temporarily stopping processing overloaded rings and sending Ethernet flow control frames.
41 This sample application is split in two parts:
43 * qw - The core quota and watermark sample application
45 * qwctl - A command line tool to alter quota and watermarks while qw is running
50 The Quota and Watermark sample application performs forwarding for each packet that is received on a given port.
51 The destination port is the adjacent port from the enabled port mask, that is,
52 if the first four ports are enabled (port mask 0xf), ports 0 and 1 forward into each other,
53 and ports 2 and 3 forward into each other.
54 The MAC addresses of the forwarded Ethernet frames are not affected.
56 Internally, packets are pulled from the ports by the master logical core and put on a variable length processing pipeline,
57 each stage of which being connected by rings, as shown in Figure 12.
61 **Figure 12. Pipeline Overview**
63 .. image15_png has been renamed
67 An adjustable quota value controls how many packets are being moved through the pipeline per enqueue and dequeue.
68 Adjustable watermark values associated with the rings control a back-off mechanism that
69 tries to prevent the pipeline from being overloaded by:
71 * Stopping enqueuing on rings for which the usage has crossed the high watermark threshold
73 * Sending Ethernet pause frames
75 * Only resuming enqueuing on a ring once its usage goes below a global low watermark threshold
77 This mechanism allows congestion notifications to go up the ring pipeline and
78 eventually lead to an Ethernet flow control frame being send to the source.
80 On top of serving as an example of quota and watermark usage,
81 this application can be used to benchmark ring based processing pipelines performance using a traffic- generator,
82 as shown in Figure 13.
86 **Figure 13. Ring-based Processing Pipeline Performance Setup**
88 .. image16_png has been renamed
90 |ring_pipeline_perf_setup|
92 Compiling the Application
93 -------------------------
95 #. Go to the example directory:
97 .. code-block:: console
99 export RTE_SDK=/path/to/rte_sdk
100 cd ${RTE_SDK}/examples/quota_watermark
102 #. Set the target (a default target is used if not specified). For example:
104 .. code-block:: console
106 export RTE_TARGET=x86_64-native-linuxapp-gcc
108 See the *Intel® DPDK Getting Started Guide* for possible RTE_TARGET values.
110 #. Build the application:
112 .. code-block:: console
116 Running the Application
117 -----------------------
119 The core application, qw, has to be started first.
121 Once it is up and running, one can alter quota and watermarks while it runs using the control application, qwctl.
123 Running the Core Application
124 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
126 The application requires a single command line option:
128 .. code-block:: console
130 ./qw/build/qw [EAL options] -- -p PORTMASK
134 -p PORTMASK: A hexadecimal bitmask of the ports to configure
136 To run the application in a linuxapp environment with four logical cores and ports 0 and 2,
137 issue the following command:
139 .. code-block:: console
141 ./qw/build/qw -c f -n 4 -- -p 5
143 Refer to the *Intel® DPDK Getting Started Guide* for general information on running applications and
144 the Environment Abstraction Layer (EAL) options.
146 Running the Control Application
147 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149 The control application requires a number of command line options:
151 .. code-block:: console
153 ./qwctl/build/qwctl [EAL options] --proc-type=secondary
155 The --proc-type=secondary option is necessary for the EAL to properly initialize the control application to
156 use the same huge pages as the core application and thus be able to access its rings.
158 To run the application in a linuxapp environment on logical core 0, issue the following command:
160 .. code-block:: console
162 ./qwctl/build/qwctl -c 1 -n 4 --proc-type=secondary
164 Refer to the *Intel® DPDK Getting Started* Guide for general information on running applications and
165 the Environment Abstraction Layer (EAL) options.
167 qwctl is an interactive command line that let the user change variables in a running instance of qw.
168 The help command gives a list of available commands:
170 .. code-block:: console
177 The following sections provide a quick guide to the application's source code.
179 Core Application - qw
180 ~~~~~~~~~~~~~~~~~~~~~
182 EAL and Drivers Setup
183 ^^^^^^^^^^^^^^^^^^^^^
185 The EAL arguments are parsed at the beginning of the MAIN() function:
189 ret = rte_eal_init(argc, argv);
191 rte_exit(EXIT_FAILURE, "Cannot initialize EAL\n");
196 Then, a call to init_dpdk(), defined in init.c, is made to initialize the poll mode drivers:
205 /* Bind the drivers to usable devices */
207 ret = rte_eal_pci_probe();
209 rte_exit(EXIT_FAILURE, "rte_eal_pci_probe(): error %d\n", ret);
211 if (rte_eth_dev_count() < 2)
212 rte_exit(EXIT_FAILURE, "Not enough ethernet port available\n");
215 To fully understand this code, it is recommended to study the chapters that relate to the *Poll Mode Driver*
216 in the *Intel® DPDK Getting Started Guide* and the *Intel® DPDK API Reference*.
218 Shared Variables Setup
219 ^^^^^^^^^^^^^^^^^^^^^^
221 The quota and low_watermark shared variables are put into an rte_memzone using a call to setup_shared_variables():
226 setup_shared_variables(void)
228 const struct rte_memzone *qw_memzone;
230 qw_memzone = rte_memzone_reserve(QUOTA_WATERMARK_MEMZONE_NAME, 2 * sizeof(int), rte_socket_id(), RTE_MEMZONE_2MB);
232 if (qw_memzone == NULL)
233 rte_exit(EXIT_FAILURE, "%s\n", rte_strerror(rte_errno));
235 quota = qw_memzone->addr;
236 low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int);
239 These two variables are initialized to a default value in MAIN() and
240 can be changed while qw is running using the qwctl control program.
242 Application Arguments
243 ^^^^^^^^^^^^^^^^^^^^^
245 The qw application only takes one argument: a port mask that specifies which ports should be used by the application.
246 At least two ports are needed to run the application and there should be an even number of ports given in the port mask.
248 The port mask parsing is done in parse_qw_args(), defined in args.c.
250 Mbuf Pool Initialization
251 ^^^^^^^^^^^^^^^^^^^^^^^^
253 Once the application's arguments are parsed, an mbuf pool is created.
254 It contains a set of mbuf objects that are used by the driver and the application to store network packets:
258 /* Create a pool of mbuf to store packets */
260 mbuf_pool = rte_mempool_create("mbuf_pool", MBUF_PER_POOL, MBUF_SIZE, 32, sizeof(struct rte_pktmbuf_pool_private),
261 rte_pktmbuf_pool_init, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0);
263 if (mbuf_pool == NULL)
264 rte_panic("%s\n", rte_strerror(rte_errno));
266 The rte_mempool is a generic structure used to handle pools of objects.
267 In this case, it is necessary to create a pool that will be used by the driver,
268 which expects to have some reserved space in the mempool structure, sizeof(struct rte_pktmbuf_pool_private) bytes.
270 The number of allocated pkt mbufs is MBUF_PER_POOL, with a size of MBUF_SIZE each.
271 A per-lcore cache of 32 mbufs is kept.
272 The memory is allocated in on the master lcore's socket, but it is possible to extend this code to allocate one mbuf pool per socket.
274 Two callback pointers are also given to the rte_mempool_create() function:
276 * The first callback pointer is to rte_pktmbuf_pool_init() and is used to initialize the private data of the mempool,
277 which is needed by the driver.
278 This function is provided by the mbuf API, but can be copied and extended by the developer.
280 * The second callback pointer given to rte_mempool_create() is the mbuf initializer.
282 The default is used, that is, rte_pktmbuf_init(), which is provided in the rte_mbuf library.
283 If a more complex application wants to extend the rte_pktmbuf structure for its own needs,
284 a new function derived from rte_pktmbuf_init() can be created.
286 Ports Configuration and Pairing
287 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
289 Each port in the port mask is configured and a corresponding ring is created in the master lcore's array of rings.
290 This ring is the first in the pipeline and will hold the packets directly coming from the port.
294 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++)
295 if (is_bit_set(port_id, portmask)) {
296 configure_eth_port(port_id);
297 init_ring(master_lcore_id, port_id);
302 The configure_eth_port() and init_ring() functions are used to configure a port and a ring respectively and are defined in init.c.
303 They make use of the Intel® DPDK APIs defined in rte_eth.h and rte_ring.h.
305 pair_ports() builds the port_pairs[] array so that its key-value pairs are a mapping between reception and transmission ports.
306 It is defined in init.c.
308 Logical Cores Assignment
309 ^^^^^^^^^^^^^^^^^^^^^^^^
311 The application uses the master logical core to poll all the ports for new packets and enqueue them on a ring associated with the port.
313 Each logical core except the last runs pipeline_stage() after a ring for each used port is initialized on that core.
314 pipeline_stage() on core X dequeues packets from core X-1's rings and enqueue them on its own rings. See Figure 14.
318 /* Start pipeline_stage() on all the available slave lcore but the last */
320 for (lcore_id = 0 ; lcore_id < last_lcore_id; lcore_id++) {
321 if (rte_lcore_is_enabled(lcore_id) && lcore_id != master_lcore_id) {
322 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++)
323 if (is_bit_set(port_id, portmask))
324 init_ring(lcore_id, port_id);
326 rte_eal_remote_launch(pipeline_stage, NULL, lcore_id);
330 The last available logical core runs send_stage(),
331 which is the last stage of the pipeline dequeuing packets from the last ring in the pipeline and
332 sending them out on the destination port setup by pair_ports().
336 /* Start send_stage() on the last slave core */
338 rte_eal_remote_launch(send_stage, NULL, last_lcore_id);
340 Receive, Process and Transmit Packets
341 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
345 Figure 14 shows where each thread in the pipeline is.
346 It should be used as a reference while reading the rest of this section.
348 **Figure 14. Threads and Pipelines**
350 .. image17_png has been renamed
354 In the receive_stage() function running on the master logical core,
355 the main task is to read ingress packets from the RX ports and enqueue them
356 on the port's corresponding first ring in the pipeline.
357 This is done using the following code:
361 lcore_id = rte_lcore_id();
363 /* Process each port round robin style */
365 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
366 if (!is_bit_set(port_id, portmask))
369 ring = rings[lcore_id][port_id];
371 if (ring_state[port_id] != RING_READY) {
372 if (rte_ring_count(ring) > *low_watermark)
375 ring_state[port_id] = RING_READY;
378 /* Enqueue received packets on the RX ring */
380 nb_rx_pkts = rte_eth_rx_burst(port_id, 0, pkts, *quota);
382 ret = rte_ring_enqueue_bulk(ring, (void *) pkts, nb_rx_pkts);
383 if (ret == -EDQUOT) {
384 ring_state[port_id] = RING_OVERLOADED;
385 send_pause_frame(port_id, 1337);
389 For each port in the port mask, the corresponding ring's pointer is fetched into ring and that ring's state is checked:
391 * If it is in the RING_READY state, \*quota packets are grabbed from the port and put on the ring.
392 Should this operation make the ring's usage cross its high watermark,
393 the ring is marked as overloaded and an Ethernet flow control frame is sent to the source.
395 * If it is not in the RING_READY state, this port is ignored until the ring's usage crosses the \*low_watermark value.
397 The pipeline_stage() function's task is to process and move packets from the preceding pipeline stage.
398 This thread is running on most of the logical cores to create and arbitrarily long pipeline.
402 lcore_id = rte_lcore_id();
404 previous_lcore_id = get_previous_lcore_id(lcore_id);
406 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
407 if (!is_bit_set(port_id, portmask))
410 tx = rings[lcore_id][port_id];
411 rx = rings[previous_lcore_id][port_id];
412 if (ring_state[port_id] != RING_READY) {
413 if (rte_ring_count(tx) > *low_watermark)
416 ring_state[port_id] = RING_READY;
419 /* Dequeue up to quota mbuf from rx */
421 nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts, *quota);
423 if (unlikely(nb_dq_pkts < 0))
426 /* Enqueue them on tx */
428 ret = rte_ring_enqueue_bulk(tx, pkts, nb_dq_pkts);
430 ring_state[port_id] = RING_OVERLOADED;
433 The thread's logic works mostly like receive_stage(),
434 except that packets are moved from ring to ring instead of port to ring.
436 In this example, no actual processing is done on the packets,
437 but pipeline_stage() is an ideal place to perform any processing required by the application.
439 Finally, the send_stage() function's task is to read packets from the last ring in a pipeline and
440 send them on the destination port defined in the port_pairs[] array.
441 It is running on the last available logical core only.
445 lcore_id = rte_lcore_id();
447 previous_lcore_id = get_previous_lcore_id(lcore_id);
449 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
450 if (!is_bit_set(port_id, portmask)) continue;
452 dest_port_id = port_pairs[port_id];
453 tx = rings[previous_lcore_id][port_id];
455 if (rte_ring_empty(tx)) continue;
457 /* Dequeue packets from tx and send them */
459 nb_dq_pkts = rte_ring_dequeue_burst(tx, (void *) tx_pkts, *quota);
460 nb_tx_pkts = rte_eth_tx_burst(dest_port_id, 0, tx_pkts, nb_dq_pkts);
463 For each port in the port mask, up to \*quota packets are pulled from the last ring in its pipeline and
464 sent on the destination port paired with the current port.
466 Control Application - qwctl
467 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
469 The qwctl application uses the rte_cmdline library to provide the user with an interactive command line that
470 can be used to modify and inspect parameters in a running qw application.
471 Those parameters are the global quota and low_watermark value as well as each ring's built-in high watermark.
476 The available commands are defined in commands.c.
478 It is advised to use the cmdline sample application user guide as a reference for everything related to the rte_cmdline library.
480 Accessing Shared Variables
481 ^^^^^^^^^^^^^^^^^^^^^^^^^^
483 The setup_shared_variables() function retrieves the shared variables quota and
484 low_watermark from the rte_memzone previously created by qw.
489 setup_shared_variables(void)
491 const struct rte_memzone *qw_memzone;
493 qw_memzone = rte_memzone_lookup(QUOTA_WATERMARK_MEMZONE_NAME);
494 if (qw_memzone == NULL)
495 rte_exit(EXIT_FAILURE, "Could't find memzone\n");
497 quota = qw_memzone->addr;
499 low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int);
502 .. |pipeline_overview| image:: img/pipeline_overview.png
504 .. |ring_pipeline_perf_setup| image:: img/ring_pipeline_perf_setup.png
506 .. |threads_pipelines| image:: img/threads_pipelines.png