1 .. SPDX-License-Identifier: BSD-3-Clause
2 Copyright(c) 2019 Intel Corporation.
4 .. include:: <isonum.txt>
6 Packet copying using Intel\ |reg| QuickData Technology
7 ======================================================
12 This sample is intended as a demonstration of the basic components of a DPDK
13 forwarding application and example of how to use IOAT driver API to make
16 Also while forwarding, the MAC addresses are affected as follows:
18 * The source MAC address is replaced by the TX port MAC address
20 * The destination MAC address is replaced by 02:00:00:00:00:TX_PORT_ID
22 This application can be used to compare performance of using software packet
23 copy with copy done using a DMA device for different sizes of packets.
24 The example will print out statistics each second. The stats shows
25 received/send packets and packets dropped or failed to copy.
27 Compiling the Application
28 -------------------------
30 To compile the sample application see :doc:`compiling`.
32 The application is located in the ``ioat`` sub-directory.
35 Running the Application
36 -----------------------
38 In order to run the hardware copy application, the copying device
39 needs to be bound to user-space IO driver.
41 Refer to the "IOAT Rawdev Driver" chapter in the "Rawdev Drivers" document
42 for information on using the driver.
44 The application requires a number of command line options:
46 .. code-block:: console
48 ./<build_dir>/examples/dpdk-ioat [EAL options] -- [-p MASK] [-q NQ] [-s RS] [-c <sw|hw>]
53 * p MASK: A hexadecimal bitmask of the ports to configure (default is all)
55 * q NQ: Number of Rx queues used per port equivalent to CBDMA channels
56 per port (default is 1)
58 * c CT: Performed packet copy type: software (sw) or hardware using
59 DMA (hw) (default is hw)
61 * s RS: Size of IOAT rawdev ring for hardware copy mode or rte_ring for
62 software copy mode (default is 2048)
64 * --[no-]mac-updating: Whether MAC address of packets should be changed
65 or not (default is mac-updating)
67 The application can be launched in various configurations depending on
68 provided parameters. The app can use up to 2 lcores: one of them receives
69 incoming traffic and makes a copy of each packet. The second lcore then
70 updates MAC address and sends the copy. If one lcore per port is used,
71 both operations are done sequentially. For each configuration an additional
72 lcore is needed since the main lcore does not handle traffic but is
73 responsible for configuration, statistics printing and safe shutdown of
74 all ports and devices.
76 The application can use a maximum of 8 ports.
78 To run the application in a Linux environment with 3 lcores (the main lcore,
79 plus two forwarding cores), a single port (port 0), software copying and MAC
80 updating issue the command:
82 .. code-block:: console
84 $ ./<build_dir>/examples/dpdk-ioat -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw
86 To run the application in a Linux environment with 2 lcores (the main lcore,
87 plus one forwarding core), 2 ports (ports 0 and 1), hardware copying and no MAC
88 updating issue the command:
90 .. code-block:: console
92 $ ./<build_dir>/examples/dpdk-ioat -l 0-1 -n 1 -- -p 0x3 --no-mac-updating -c hw
94 Refer to the *DPDK Getting Started Guide* for general information on
95 running applications and the Environment Abstraction Layer (EAL) options.
100 The following sections provide an explanation of the main components of the
103 All DPDK library functions used in the sample code are prefixed with
104 ``rte_`` and are explained in detail in the *DPDK API Documentation*.
110 The ``main()`` function performs the initialization and calls the execution
111 threads for each lcore.
113 The first task is to initialize the Environment Abstraction Layer (EAL).
114 The ``argc`` and ``argv`` arguments are provided to the ``rte_eal_init()``
115 function. The value returned is the number of parsed arguments:
120 ret = rte_eal_init(argc, argv);
122 rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n");
125 The ``main()`` also allocates a mempool to hold the mbufs (Message Buffers)
126 used by the application:
130 nb_mbufs = RTE_MAX(rte_eth_dev_count_avail() * (nb_rxd + nb_txd
131 + MAX_PKT_BURST + rte_lcore_count() * MEMPOOL_CACHE_SIZE),
134 /* Create the mbuf pool */
135 ioat_pktmbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", nb_mbufs,
136 MEMPOOL_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE,
138 if (ioat_pktmbuf_pool == NULL)
139 rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n");
141 Mbufs are the packet buffer structure used by DPDK. They are explained in
142 detail in the "Mbuf Library" section of the *DPDK Programmer's Guide*.
144 The ``main()`` function also initializes the ports:
148 /* Initialise each port */
149 RTE_ETH_FOREACH_DEV(portid) {
150 port_init(portid, ioat_pktmbuf_pool);
153 Each port is configured using ``port_init()`` function. The Ethernet
154 ports are configured with local settings using the ``rte_eth_dev_configure()``
155 function and the ``port_conf`` struct. The RSS is enabled so that
156 multiple Rx queues could be used for packet receiving and copying by
157 multiple CBDMA channels per port:
161 /* configuring port to use RSS for multiple RX queues */
162 static const struct rte_eth_conf port_conf = {
164 .mq_mode = ETH_MQ_RX_RSS,
165 .max_rx_pkt_len = RTE_ETHER_MAX_LEN
170 .rss_hf = ETH_RSS_PROTO_MASK,
175 For this example the ports are set up with the number of Rx queues provided
176 with -q option and 1 Tx queue using the ``rte_eth_rx_queue_setup()``
177 and ``rte_eth_tx_queue_setup()`` functions.
179 The Ethernet port is then started:
183 ret = rte_eth_dev_start(portid);
185 rte_exit(EXIT_FAILURE, "rte_eth_dev_start:err=%d, port=%u\n",
189 Finally the Rx port is set in promiscuous mode:
193 rte_eth_promiscuous_enable(portid);
196 After that each port application assigns resources needed.
200 check_link_status(ioat_enabled_port_mask);
203 rte_exit(EXIT_FAILURE,
204 "All available ports are disabled. Please set portmask.\n");
207 /* Check if there is enough lcores for all ports. */
208 cfg.nb_lcores = rte_lcore_count() - 1;
209 if (cfg.nb_lcores < 1)
210 rte_exit(EXIT_FAILURE,
211 "There should be at least one worker lcore.\n");
215 if (copy_mode == COPY_MODE_IOAT_NUM) {
217 } else /* copy_mode == COPY_MODE_SW_NUM */ {
221 Depending on mode set (whether copy should be done by software or by hardware)
222 special structures are assigned to each port. If software copy was chosen,
223 application have to assign ring structures for packet exchanging between lcores
233 for (i = 0; i < cfg.nb_ports; i++) {
236 snprintf(ring_name, 20, "rx_to_tx_ring_%u", i);
237 /* Create ring for inter core communication */
238 cfg.ports[i].rx_to_tx_ring = rte_ring_create(
239 ring_name, ring_size,
240 rte_socket_id(), RING_F_SP_ENQ);
242 if (cfg.ports[i].rx_to_tx_ring == NULL)
243 rte_exit(EXIT_FAILURE, "%s\n",
244 rte_strerror(rte_errno));
249 When using hardware copy each Rx queue of the port is assigned an
250 IOAT device (``assign_rawdevs()``) using IOAT Rawdev Driver API
258 uint16_t nb_rawdev = 0, rdev_id = 0;
261 for (i = 0; i < cfg.nb_ports; i++) {
262 for (j = 0; j < cfg.ports[i].nb_queues; j++) {
263 struct rte_rawdev_info rdev_info = { 0 };
266 if (rdev_id == rte_rawdev_count())
268 rte_rawdev_info_get(rdev_id++, &rdev_info, 0);
269 } while (strcmp(rdev_info.driver_name,
270 IOAT_PMD_RAWDEV_NAME_STR) != 0);
272 cfg.ports[i].ioat_ids[j] = rdev_id - 1;
273 configure_rawdev_queue(cfg.ports[i].ioat_ids[j]);
278 if (nb_rawdev < cfg.nb_ports * cfg.ports[0].nb_queues)
279 rte_exit(EXIT_FAILURE,
280 "Not enough IOAT rawdevs (%u) for all queues (%u).\n",
281 nb_rawdev, cfg.nb_ports * cfg.ports[0].nb_queues);
282 RTE_LOG(INFO, IOAT, "Number of used rawdevs: %u.\n", nb_rawdev);
286 The initialization of hardware device is done by ``rte_rawdev_configure()``
287 function using ``rte_rawdev_info`` struct. After configuration the device is
288 started using ``rte_rawdev_start()`` function. Each of the above operations
289 is done in ``configure_rawdev_queue()``.
294 configure_rawdev_queue(uint32_t dev_id)
296 struct rte_ioat_rawdev_config dev_config = { .ring_size = ring_size };
297 struct rte_rawdev_info info = { .dev_private = &dev_config };
299 if (rte_rawdev_configure(dev_id, &info, sizeof(dev_config)) != 0) {
300 rte_exit(EXIT_FAILURE,
301 "Error with rte_rawdev_configure()\n");
303 if (rte_rawdev_start(dev_id) != 0) {
304 rte_exit(EXIT_FAILURE,
305 "Error with rte_rawdev_start()\n");
309 If initialization is successful, memory for hardware device
310 statistics is allocated.
312 Finally ``main()`` function starts all packet handling lcores and starts
313 printing stats in a loop on the main lcore. The application can be
314 interrupted and closed using ``Ctrl-C``. The main lcore waits for
315 all worker lcores to finish, deallocates resources and exits.
317 The processing lcores launching function are described below.
319 The Lcores Launching Functions
320 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
322 As described above, ``main()`` function invokes ``start_forwarding_cores()``
323 function in order to start processing for each lcore:
327 static void start_forwarding_cores(void)
329 uint32_t lcore_id = rte_lcore_id();
331 RTE_LOG(INFO, IOAT, "Entering %s on lcore %u\n",
332 __func__, rte_lcore_id());
334 if (cfg.nb_lcores == 1) {
335 lcore_id = rte_get_next_lcore(lcore_id, true, true);
336 rte_eal_remote_launch((lcore_function_t *)rxtx_main_loop,
338 } else if (cfg.nb_lcores > 1) {
339 lcore_id = rte_get_next_lcore(lcore_id, true, true);
340 rte_eal_remote_launch((lcore_function_t *)rx_main_loop,
343 lcore_id = rte_get_next_lcore(lcore_id, true, true);
344 rte_eal_remote_launch((lcore_function_t *)tx_main_loop, NULL,
349 The function launches Rx/Tx processing functions on configured lcores
350 using ``rte_eal_remote_launch()``. The configured ports, their number
351 and number of assigned lcores are stored in user-defined
352 ``rxtx_transmission_config`` struct:
356 struct rxtx_transmission_config {
357 struct rxtx_port_config ports[RTE_MAX_ETHPORTS];
362 The structure is initialized in 'main()' function with the values
363 corresponding to ports and lcores configuration provided by the user.
365 The Lcores Processing Functions
366 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
368 For receiving packets on each port, the ``ioat_rx_port()`` function is used.
369 The function receives packets on each configured Rx queue. Depending on the
370 mode the user chose, it will enqueue packets to IOAT rawdev channels and
371 then invoke copy process (hardware copy), or perform software copy of each
372 packet using ``pktmbuf_sw_copy()`` function and enqueue them to an rte_ring:
376 /* Receive packets on one port and enqueue to IOAT rawdev or rte_ring. */
378 ioat_rx_port(struct rxtx_port_config *rx_config)
380 uint32_t nb_rx, nb_enq, i, j;
381 struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
382 for (i = 0; i < rx_config->nb_queues; i++) {
384 nb_rx = rte_eth_rx_burst(rx_config->rxtx_port, i,
385 pkts_burst, MAX_PKT_BURST);
390 port_statistics.rx[rx_config->rxtx_port] += nb_rx;
392 if (copy_mode == COPY_MODE_IOAT_NUM) {
393 /* Perform packet hardware copy */
394 nb_enq = ioat_enqueue_packets(pkts_burst,
395 nb_rx, rx_config->ioat_ids[i]);
397 rte_ioat_perform_ops(rx_config->ioat_ids[i]);
399 /* Perform packet software copy, free source packets */
401 struct rte_mbuf *pkts_burst_copy[MAX_PKT_BURST];
403 ret = rte_mempool_get_bulk(ioat_pktmbuf_pool,
404 (void *)pkts_burst_copy, nb_rx);
406 if (unlikely(ret < 0))
407 rte_exit(EXIT_FAILURE,
408 "Unable to allocate memory.\n");
410 for (j = 0; j < nb_rx; j++)
411 pktmbuf_sw_copy(pkts_burst[j],
414 rte_mempool_put_bulk(ioat_pktmbuf_pool,
415 (void *)pkts_burst, nb_rx);
417 nb_enq = rte_ring_enqueue_burst(
418 rx_config->rx_to_tx_ring,
419 (void *)pkts_burst_copy, nb_rx, NULL);
421 /* Free any not enqueued packets. */
422 rte_mempool_put_bulk(ioat_pktmbuf_pool,
423 (void *)&pkts_burst_copy[nb_enq],
427 port_statistics.copy_dropped[rx_config->rxtx_port] +=
432 The packets are received in burst mode using ``rte_eth_rx_burst()``
433 function. When using hardware copy mode the packets are enqueued in
434 copying device's buffer using ``ioat_enqueue_packets()`` which calls
435 ``rte_ioat_enqueue_copy()``. When all received packets are in the
436 buffer the copy operations are started by calling ``rte_ioat_perform_ops()``.
437 Function ``rte_ioat_enqueue_copy()`` operates on physical address of
438 the packet. Structure ``rte_mbuf`` contains only physical address to
439 start of the data buffer (``buf_iova``). Thus the address is adjusted
440 by ``addr_offset`` value in order to get the address of ``rearm_data``
441 member of ``rte_mbuf``. That way both the packet data and metadata can
442 be copied in a single operation. This method can be used because the mbufs
443 are direct mbufs allocated by the apps. If another app uses external buffers,
444 or indirect mbufs, then multiple copy operations must be used.
449 ioat_enqueue_packets(struct rte_mbuf **pkts,
450 uint32_t nb_rx, uint16_t dev_id)
454 struct rte_mbuf *pkts_copy[MAX_PKT_BURST];
456 const uint64_t addr_offset = RTE_PTR_DIFF(pkts[0]->buf_addr,
457 &pkts[0]->rearm_data);
459 ret = rte_mempool_get_bulk(ioat_pktmbuf_pool,
460 (void *)pkts_copy, nb_rx);
462 if (unlikely(ret < 0))
463 rte_exit(EXIT_FAILURE, "Unable to allocate memory.\n");
465 for (i = 0; i < nb_rx; i++) {
466 /* Perform data copy */
467 ret = rte_ioat_enqueue_copy(dev_id,
470 pkts_copy[i]->buf_iova
472 rte_pktmbuf_data_len(pkts[i])
475 (uintptr_t)pkts_copy[i],
483 /* Free any not enqueued packets. */
484 rte_mempool_put_bulk(ioat_pktmbuf_pool, (void *)&pkts[i], nb_rx - i);
485 rte_mempool_put_bulk(ioat_pktmbuf_pool, (void *)&pkts_copy[i],
492 All completed copies are processed by ``ioat_tx_port()`` function. When using
493 hardware copy mode the function invokes ``rte_ioat_completed_ops()``
494 on each assigned IOAT channel to gather copied packets. If software copy
495 mode is used the function dequeues copied packets from the rte_ring. Then each
496 packet MAC address is changed if it was enabled. After that copies are sent
497 in burst mode using `` rte_eth_tx_burst()``.
502 /* Transmit packets from IOAT rawdev/rte_ring for one port. */
504 ioat_tx_port(struct rxtx_port_config *tx_config)
506 uint32_t i, j, nb_dq = 0;
507 struct rte_mbuf *mbufs_src[MAX_PKT_BURST];
508 struct rte_mbuf *mbufs_dst[MAX_PKT_BURST];
510 for (i = 0; i < tx_config->nb_queues; i++) {
511 if (copy_mode == COPY_MODE_IOAT_NUM) {
512 /* Deque the mbufs from IOAT device. */
513 nb_dq = rte_ioat_completed_ops(
514 tx_config->ioat_ids[i], MAX_PKT_BURST,
515 (void *)mbufs_src, (void *)mbufs_dst);
517 /* Deque the mbufs from rx_to_tx_ring. */
518 nb_dq = rte_ring_dequeue_burst(
519 tx_config->rx_to_tx_ring, (void *)mbufs_dst,
520 MAX_PKT_BURST, NULL);
526 if (copy_mode == COPY_MODE_IOAT_NUM)
527 rte_mempool_put_bulk(ioat_pktmbuf_pool,
528 (void *)mbufs_src, nb_dq);
530 /* Update macs if enabled */
532 for (j = 0; j < nb_dq; j++)
533 update_mac_addrs(mbufs_dst[j],
534 tx_config->rxtx_port);
537 const uint16_t nb_tx = rte_eth_tx_burst(
538 tx_config->rxtx_port, 0,
539 (void *)mbufs_dst, nb_dq);
541 port_statistics.tx[tx_config->rxtx_port] += nb_tx;
543 /* Free any unsent packets. */
544 if (unlikely(nb_tx < nb_dq))
545 rte_mempool_put_bulk(ioat_pktmbuf_pool,
546 (void *)&mbufs_dst[nb_tx],
551 The Packet Copying Functions
552 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
554 In order to perform packet copy there is a user-defined function
555 ``pktmbuf_sw_copy()`` used. It copies a whole packet by copying
556 metadata from source packet to new mbuf, and then copying a data
557 chunk of source packet. Both memory copies are done using
563 pktmbuf_sw_copy(struct rte_mbuf *src, struct rte_mbuf *dst)
565 /* Copy packet metadata */
566 rte_memcpy(&dst->rearm_data,
568 offsetof(struct rte_mbuf, cacheline1)
569 - offsetof(struct rte_mbuf, rearm_data));
571 /* Copy packet data */
572 rte_memcpy(rte_pktmbuf_mtod(dst, char *),
573 rte_pktmbuf_mtod(src, char *), src->data_len);
576 The metadata in this example is copied from ``rearm_data`` member of
577 ``rte_mbuf`` struct up to ``cacheline1``.
579 In order to understand why software packet copying is done as shown
580 above please refer to the "Mbuf Library" section of the
581 *DPDK Programmer's Guide*.