2 Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
5 Redistribution and use in source and binary forms, with or without
6 modification, are permitted provided that the following conditions
9 * Redistributions of source code must retain the above copyright
10 notice, this list of conditions and the following disclaimer.
11 * Redistributions in binary form must reproduce the above copyright
12 notice, this list of conditions and the following disclaimer in
13 the documentation and/or other materials provided with the
15 * Neither the name of Intel Corporation nor the names of its
16 contributors may be used to endorse or promote products derived
17 from this software without specific prior written permission.
19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31 Quota and Watermark Sample Application
32 ======================================
34 The Quota and Watermark sample application is a simple example of packet
35 processing using Data Plane Development Kit (DPDK) that showcases the use
36 of a quota as the maximum number of packets enqueue/dequeue at a time and
37 low and high thresholds, or watermarks, to signal low and high ring usage
40 Additionally, it shows how the thresholds can be used to feedback congestion notifications to data producers by
41 temporarily stopping processing overloaded rings and sending Ethernet flow control frames.
43 This sample application is split in two parts:
45 * qw - The core quota and watermark sample application
47 * qwctl - A command line tool to alter quota and watermarks while qw is running
52 The Quota and Watermark sample application performs forwarding for each packet that is received on a given port.
53 The destination port is the adjacent port from the enabled port mask, that is,
54 if the first four ports are enabled (port mask 0xf), ports 0 and 1 forward into each other,
55 and ports 2 and 3 forward into each other.
56 The MAC addresses of the forwarded Ethernet frames are not affected.
58 Internally, packets are pulled from the ports by the master logical core and put on a variable length processing pipeline,
59 each stage of which being connected by rings, as shown in :numref:`figure_pipeline_overview`.
61 .. _figure_pipeline_overview:
63 .. figure:: img/pipeline_overview.*
68 An adjustable quota value controls how many packets are being moved through the pipeline per enqueue and dequeue.
69 Adjustable threshold values associated with the rings control a back-off mechanism that
70 tries to prevent the pipeline from being overloaded by:
72 * Stopping enqueuing on rings for which the usage has crossed the high watermark threshold
74 * Sending Ethernet pause frames
76 * Only resuming enqueuing on a ring once its usage goes below a global low watermark threshold
78 This mechanism allows congestion notifications to go up the ring pipeline and
79 eventually lead to an Ethernet flow control frame being send to the source.
81 On top of serving as an example of quota and watermark usage,
82 this application can be used to benchmark ring based processing pipelines performance using a traffic- generator,
83 as shown in :numref:`figure_ring_pipeline_perf_setup`.
85 .. _figure_ring_pipeline_perf_setup:
87 .. figure:: img/ring_pipeline_perf_setup.*
89 Ring-based Processing Pipeline Performance Setup
91 Compiling the Application
92 -------------------------
94 To compile the sample application see :doc:`compiling`.
96 The application is located in the ``quota_watermark`` sub-directory.
98 Running the Application
99 -----------------------
101 The core application, qw, has to be started first.
103 Once it is up and running, one can alter quota and watermarks while it runs using the control application, qwctl.
105 Running the Core Application
106 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
108 The application requires a single command line option:
110 .. code-block:: console
112 ./qw/build/qw [EAL options] -- -p PORTMASK
116 -p PORTMASK: A hexadecimal bitmask of the ports to configure
118 To run the application in a linuxapp environment with four logical cores and ports 0 and 2,
119 issue the following command:
121 .. code-block:: console
123 ./qw/build/qw -l 0-3 -n 4 -- -p 5
125 Refer to the *DPDK Getting Started Guide* for general information on running applications and
126 the Environment Abstraction Layer (EAL) options.
128 Running the Control Application
129 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
131 The control application requires a number of command line options:
133 .. code-block:: console
135 ./qwctl/build/qwctl [EAL options] --proc-type=secondary
137 The --proc-type=secondary option is necessary for the EAL to properly initialize the control application to
138 use the same huge pages as the core application and thus be able to access its rings.
140 To run the application in a linuxapp environment on logical core 0, issue the following command:
142 .. code-block:: console
144 ./qwctl/build/qwctl -l 0 -n 4 --proc-type=secondary
146 Refer to the *DPDK Getting Started* Guide for general information on running applications and
147 the Environment Abstraction Layer (EAL) options.
149 qwctl is an interactive command line that let the user change variables in a running instance of qw.
150 The help command gives a list of available commands:
152 .. code-block:: console
159 The following sections provide a quick guide to the application's source code.
161 Core Application - qw
162 ~~~~~~~~~~~~~~~~~~~~~
164 EAL and Drivers Setup
165 ^^^^^^^^^^^^^^^^^^^^^
167 The EAL arguments are parsed at the beginning of the main() function:
171 ret = rte_eal_init(argc, argv);
173 rte_exit(EXIT_FAILURE, "Cannot initialize EAL\n");
178 Then, a call to init_dpdk(), defined in init.c, is made to initialize the poll mode drivers:
187 /* Bind the drivers to usable devices */
189 ret = rte_pci_probe();
191 rte_exit(EXIT_FAILURE, "rte_pci_probe(): error %d\n", ret);
193 if (rte_eth_dev_count() < 2)
194 rte_exit(EXIT_FAILURE, "Not enough Ethernet port available\n");
197 To fully understand this code, it is recommended to study the chapters that relate to the *Poll Mode Driver*
198 in the *DPDK Getting Started Guide* and the *DPDK API Reference*.
200 Shared Variables Setup
201 ^^^^^^^^^^^^^^^^^^^^^^
203 The quota and high and low watermark shared variables are put into an rte_memzone using a call to setup_shared_variables():
208 setup_shared_variables(void)
210 const struct rte_memzone *qw_memzone;
212 qw_memzone = rte_memzone_reserve(QUOTA_WATERMARK_MEMZONE_NAME,
213 3 * sizeof(int), rte_socket_id(), 0);
214 if (qw_memzone == NULL)
215 rte_exit(EXIT_FAILURE, "%s\n", rte_strerror(rte_errno));
217 quota = qw_memzone->addr;
218 low_watermark = (unsigned int *) qw_memzone->addr + 1;
219 high_watermark = (unsigned int *) qw_memzone->addr + 2;
222 These three variables are initialized to a default value in main() and
223 can be changed while qw is running using the qwctl control program.
225 Application Arguments
226 ^^^^^^^^^^^^^^^^^^^^^
228 The qw application only takes one argument: a port mask that specifies which ports should be used by the application.
229 At least two ports are needed to run the application and there should be an even number of ports given in the port mask.
231 The port mask parsing is done in parse_qw_args(), defined in args.c.
233 Mbuf Pool Initialization
234 ^^^^^^^^^^^^^^^^^^^^^^^^
236 Once the application's arguments are parsed, an mbuf pool is created.
237 It contains a set of mbuf objects that are used by the driver and the application to store network packets:
241 /* Create a pool of mbuf to store packets */
242 mbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", MBUF_PER_POOL, 32, 0,
243 MBUF_DATA_SIZE, rte_socket_id());
245 if (mbuf_pool == NULL)
246 rte_panic("%s\n", rte_strerror(rte_errno));
248 The rte_mempool is a generic structure used to handle pools of objects.
249 In this case, it is necessary to create a pool that will be used by the driver.
251 The number of allocated pkt mbufs is MBUF_PER_POOL, with a data room size
252 of MBUF_DATA_SIZE each.
253 A per-lcore cache of 32 mbufs is kept.
254 The memory is allocated in on the master lcore's socket, but it is possible to extend this code to allocate one mbuf pool per socket.
256 The rte_pktmbuf_pool_create() function uses the default mbuf pool and mbuf
257 initializers, respectively rte_pktmbuf_pool_init() and rte_pktmbuf_init().
258 An advanced application may want to use the mempool API to create the
259 mbuf pool with more control.
261 Ports Configuration and Pairing
262 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
264 Each port in the port mask is configured and a corresponding ring is created in the master lcore's array of rings.
265 This ring is the first in the pipeline and will hold the packets directly coming from the port.
269 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++)
270 if (is_bit_set(port_id, portmask)) {
271 configure_eth_port(port_id);
272 init_ring(master_lcore_id, port_id);
277 The configure_eth_port() and init_ring() functions are used to configure a port and a ring respectively and are defined in init.c.
278 They make use of the DPDK APIs defined in rte_eth.h and rte_ring.h.
280 pair_ports() builds the port_pairs[] array so that its key-value pairs are a mapping between reception and transmission ports.
281 It is defined in init.c.
283 Logical Cores Assignment
284 ^^^^^^^^^^^^^^^^^^^^^^^^
286 The application uses the master logical core to poll all the ports for new packets and enqueue them on a ring associated with the port.
288 Each logical core except the last runs pipeline_stage() after a ring for each used port is initialized on that core.
289 pipeline_stage() on core X dequeues packets from core X-1's rings and enqueue them on its own rings. See :numref:`figure_threads_pipelines`.
293 /* Start pipeline_stage() on all the available slave lcore but the last */
295 for (lcore_id = 0 ; lcore_id < last_lcore_id; lcore_id++) {
296 if (rte_lcore_is_enabled(lcore_id) && lcore_id != master_lcore_id) {
297 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++)
298 if (is_bit_set(port_id, portmask))
299 init_ring(lcore_id, port_id);
301 rte_eal_remote_launch(pipeline_stage, NULL, lcore_id);
305 The last available logical core runs send_stage(),
306 which is the last stage of the pipeline dequeuing packets from the last ring in the pipeline and
307 sending them out on the destination port setup by pair_ports().
311 /* Start send_stage() on the last slave core */
313 rte_eal_remote_launch(send_stage, NULL, last_lcore_id);
315 Receive, Process and Transmit Packets
316 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
318 .. _figure_threads_pipelines:
320 .. figure:: img/threads_pipelines.*
322 Threads and Pipelines
325 In the receive_stage() function running on the master logical core,
326 the main task is to read ingress packets from the RX ports and enqueue them
327 on the port's corresponding first ring in the pipeline.
328 This is done using the following code:
332 lcore_id = rte_lcore_id();
334 /* Process each port round robin style */
336 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
337 if (!is_bit_set(port_id, portmask))
340 ring = rings[lcore_id][port_id];
342 if (ring_state[port_id] != RING_READY) {
343 if (rte_ring_count(ring) > *low_watermark)
346 ring_state[port_id] = RING_READY;
349 /* Enqueue received packets on the RX ring */
350 nb_rx_pkts = rte_eth_rx_burst(port_id, 0, pkts,
352 ret = rte_ring_enqueue_bulk(ring, (void *) pkts,
354 if (RING_SIZE - free > *high_watermark) {
355 ring_state[port_id] = RING_OVERLOADED;
356 send_pause_frame(port_id, 1337);
362 * Return mbufs to the pool,
363 * effectively dropping packets
365 for (i = 0; i < nb_rx_pkts; i++)
366 rte_pktmbuf_free(pkts[i]);
370 For each port in the port mask, the corresponding ring's pointer is fetched into ring and that ring's state is checked:
372 * If it is in the RING_READY state, \*quota packets are grabbed from the port and put on the ring.
373 Should this operation make the ring's usage cross its high watermark,
374 the ring is marked as overloaded and an Ethernet flow control frame is sent to the source.
376 * If it is not in the RING_READY state, this port is ignored until the ring's usage crosses the \*low_watermark value.
378 The pipeline_stage() function's task is to process and move packets from the preceding pipeline stage.
379 This thread is running on most of the logical cores to create and arbitrarily long pipeline.
383 lcore_id = rte_lcore_id();
385 previous_lcore_id = get_previous_lcore_id(lcore_id);
387 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
388 if (!is_bit_set(port_id, portmask))
391 tx = rings[lcore_id][port_id];
392 rx = rings[previous_lcore_id][port_id];
394 if (ring_state[port_id] != RING_READY) {
395 if (rte_ring_count(tx) > *low_watermark)
398 ring_state[port_id] = RING_READY;
401 /* Dequeue up to quota mbuf from rx */
402 nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts,
404 if (unlikely(nb_dq_pkts < 0))
407 /* Enqueue them on tx */
408 ret = rte_ring_enqueue_bulk(tx, pkts,
410 if (RING_SIZE - free > *high_watermark)
411 ring_state[port_id] = RING_OVERLOADED;
416 * Return mbufs to the pool,
417 * effectively dropping packets
419 for (i = 0; i < nb_dq_pkts; i++)
420 rte_pktmbuf_free(pkts[i]);
424 The thread's logic works mostly like receive_stage(),
425 except that packets are moved from ring to ring instead of port to ring.
427 In this example, no actual processing is done on the packets,
428 but pipeline_stage() is an ideal place to perform any processing required by the application.
430 Finally, the send_stage() function's task is to read packets from the last ring in a pipeline and
431 send them on the destination port defined in the port_pairs[] array.
432 It is running on the last available logical core only.
436 lcore_id = rte_lcore_id();
438 previous_lcore_id = get_previous_lcore_id(lcore_id);
440 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
441 if (!is_bit_set(port_id, portmask)) continue;
443 dest_port_id = port_pairs[port_id];
444 tx = rings[previous_lcore_id][port_id];
446 if (rte_ring_empty(tx)) continue;
448 /* Dequeue packets from tx and send them */
450 nb_dq_pkts = rte_ring_dequeue_burst(tx, (void *) tx_pkts, *quota);
451 nb_tx_pkts = rte_eth_tx_burst(dest_port_id, 0, tx_pkts, nb_dq_pkts);
454 For each port in the port mask, up to \*quota packets are pulled from the last ring in its pipeline and
455 sent on the destination port paired with the current port.
457 Control Application - qwctl
458 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
460 The qwctl application uses the rte_cmdline library to provide the user with an interactive command line that
461 can be used to modify and inspect parameters in a running qw application.
462 Those parameters are the global quota and low_watermark value as well as each ring's built-in high watermark.
467 The available commands are defined in commands.c.
469 It is advised to use the cmdline sample application user guide as a reference for everything related to the rte_cmdline library.
471 Accessing Shared Variables
472 ^^^^^^^^^^^^^^^^^^^^^^^^^^
474 The setup_shared_variables() function retrieves the shared variables quota and
475 low_watermark from the rte_memzone previously created by qw.
480 setup_shared_variables(void)
482 const struct rte_memzone *qw_memzone;
484 qw_memzone = rte_memzone_lookup(QUOTA_WATERMARK_MEMZONE_NAME);
485 if (qw_memzone == NULL)
486 rte_exit(EXIT_FAILURE, "Couldn't find memzone\n");
488 quota = qw_memzone->addr;
490 low_watermark = (unsigned int *) qw_memzone->addr + 1;
491 high_watermark = (unsigned int *) qw_memzone->addr + 2;