2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
5 Redistribution and use in source and binary forms, with or without
6 modification, are permitted provided that the following conditions
9 * Redistributions of source code must retain the above copyright
10 notice, this list of conditions and the following disclaimer.
11 * Redistributions in binary form must reproduce the above copyright
12 notice, this list of conditions and the following disclaimer in
13 the documentation and/or other materials provided with the
15 * Neither the name of Intel Corporation nor the names of its
16 contributors may be used to endorse or promote products derived
17 from this software without specific prior written permission.
19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31 Kernel NIC Interface Sample Application
32 =======================================
34 The Kernel NIC Interface (KNI) is a DPDK control plane solution that
35 allows userspace applications to exchange packets with the kernel networking stack.
36 To accomplish this, DPDK userspace applications use an IOCTL call
37 to request the creation of a KNI virtual device in the Linux* kernel.
38 The IOCTL call provides interface information and the DPDK's physical address space,
39 which is re-mapped into the kernel address space by the KNI kernel loadable module
40 that saves the information to a virtual device context.
41 The DPDK creates FIFO queues for packet ingress and egress
42 to the kernel module for each device allocated.
44 The KNI kernel loadable module is a standard net driver,
45 which upon receiving the IOCTL call access the DPDK's FIFO queue to
46 receive/transmit packets from/to the DPDK userspace application.
47 The FIFO queues contain pointers to data packets in the DPDK. This:
49 * Provides a faster mechanism to interface with the kernel net stack and eliminates system calls
51 * Facilitates the DPDK using standard Linux* userspace net tools (tcpdump, ftp, and so on)
53 * Eliminate the copy_to_user and copy_from_user operations on packets.
55 The Kernel NIC Interface sample application is a simple example that demonstrates the use
56 of the DPDK to create a path for packets to go through the Linux* kernel.
57 This is done by creating one or more kernel net devices for each of the DPDK ports.
58 The application allows the use of standard Linux tools (ethtool, ifconfig, tcpdump) with the DPDK ports and
59 also the exchange of packets between the DPDK application and the Linux* kernel.
64 The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used,
65 and allocates one or more KNI device for each physical NIC port with kernel module's support.
66 For a physical NIC port, one thread reads from the port and writes to KNI devices,
67 and another thread reads from KNI devices and writes the data unmodified to the physical NIC port.
68 It is recommended to configure one KNI device for each physical NIC port.
69 If configured with more than one KNI devices for a physical NIC port,
70 it is just for performance testing, or it can work together with VMDq support in future.
72 The packet flow through the Kernel NIC Interface application is as shown in the following figure.
74 .. _figure_kernel_nic:
76 .. figure:: img/kernel_nic.*
78 Kernel NIC Application Packet Flow
81 Compiling the Application
82 -------------------------
84 Compile the application as follows:
86 #. Go to the example directory:
88 .. code-block:: console
90 export RTE_SDK=/path/to/rte_sdk
91 cd ${RTE_SDK}/examples/kni
93 #. Set the target (a default target is used if not specified)
97 This application is intended as a linuxapp only.
99 .. code-block:: console
101 export RTE_TARGET=x86_64-native-linuxapp-gcc
103 #. Build the application:
105 .. code-block:: console
109 Loading the Kernel Module
110 -------------------------
112 Loading the KNI kernel module without any parameter is the typical way a DPDK application
113 gets packets into and out of the kernel net stack.
114 This way, only one kernel thread is created for all KNI devices for packet receiving in kernel side:
116 .. code-block:: console
120 Pinning the kernel thread to a specific core can be done using a taskset command such as following:
122 .. code-block:: console
124 #taskset -p 100000 `pgrep --fl kni_thread | awk '{print $1}'`
126 This command line tries to pin the specific kni_thread on the 20th lcore (lcore numbering starts at 0),
127 which means it needs to check if that lcore is available on the board.
128 This command must be sent after the application has been launched, as insmod does not start the kni thread.
130 For optimum performance,
131 the lcore in the mask must be selected to be on the same socket as the lcores used in the KNI application.
133 To provide flexibility of performance, the kernel module of the KNI,
134 located in the kmod sub-directory of the DPDK target directory,
135 can be loaded with parameter of kthread_mode as follows:
137 * #insmod rte_kni.ko kthread_mode=single
139 This mode will create only one kernel thread for all KNI devices for packet receiving in kernel side.
140 By default, it is in this single kernel thread mode.
141 It can set core affinity for this kernel thread by using Linux command taskset.
143 * #insmod rte_kni.ko kthread_mode =multiple
145 This mode will create a kernel thread for each KNI device for packet receiving in kernel side.
146 The core affinity of each kernel thread is set when creating the KNI device.
147 The lcore ID for each kernel thread is provided in the command line of launching the application.
148 Multiple kernel thread mode can provide scalable higher performance.
150 To measure the throughput in a loopback mode, the kernel module of the KNI,
151 located in the kmod sub-directory of the DPDK target directory,
152 can be loaded with parameters as follows:
154 * #insmod rte_kni.ko lo_mode=lo_mode_fifo
156 This loopback mode will involve ring enqueue/dequeue operations in kernel space.
158 * #insmod rte_kni.ko lo_mode=lo_mode_fifo_skb
160 This loopback mode will involve ring enqueue/dequeue operations and sk buffer copies in kernel space.
162 Running the Application
163 -----------------------
165 The application requires a number of command line options:
167 .. code-block:: console
169 kni [EAL options] -- -P -p PORTMASK --config="(port,lcore_rx,lcore_tx[,lcore_kthread,...])[,port,lcore_rx,lcore_tx[,lcore_kthread,...]]"
173 * -P: Set all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address.
174 Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted.
176 * -p PORTMASK: Hexadecimal bitmask of ports to configure.
178 * --config="(port,lcore_rx, lcore_tx[,lcore_kthread, ...]) [, port,lcore_rx, lcore_tx[,lcore_kthread, ...]]":
179 Determines which lcores of RX, TX, kernel thread are mapped to which ports.
181 Refer to *DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options.
183 The -c coremask or -l corelist parameter of the EAL options should include the lcores indicated by the lcore_rx and lcore_tx,
184 but does not need to include lcores indicated by lcore_kthread as they are used to pin the kernel thread on.
185 The -p PORTMASK parameter should include the ports indicated by the port in --config, neither more nor less.
187 The lcore_kthread in --config can be configured none, one or more lcore IDs.
188 In multiple kernel thread mode, if configured none, a KNI device will be allocated for each port,
189 while no specific lcore affinity will be set for its kernel thread.
190 If configured one or more lcore IDs, one or more KNI devices will be allocated for each port,
191 while specific lcore affinity will be set for its kernel thread.
192 In single kernel thread mode, if configured none, a KNI device will be allocated for each port.
193 If configured one or more lcore IDs,
194 one or more KNI devices will be allocated for each port while
195 no lcore affinity will be set as there is only one kernel thread for all KNI devices.
197 For example, to run the application with two ports served by six lcores, one lcore of RX, one lcore of TX,
198 and one lcore of kernel thread for each port:
200 .. code-block:: console
202 ./build/kni -l 4-7 -n 4 -- -P -p 0x3 -config="(0,4,6,8),(1,5,7,9)"
207 Once the KNI application is started, one can use different Linux* commands to manage the net interfaces.
208 If more than one KNI devices configured for a physical port,
209 only the first KNI device will be paired to the physical device.
210 Operations on other KNI devices will not affect the physical port handled in user space application.
212 Assigning an IP address:
214 .. code-block:: console
216 #ifconfig vEth0_0 192.168.0.1
218 Displaying the NIC registers:
220 .. code-block:: console
224 Dumping the network traffic:
226 .. code-block:: console
230 When the DPDK userspace application is closed, all the KNI devices are deleted from Linux*.
235 The following sections provide some explanation of code.
240 Setup of mbuf pool, driver and queues is similar to the setup done in the :doc:`l2_forward_real_virtual`..
241 In addition, one or more kernel NIC interfaces are allocated for each
242 of the configured ports according to the command line parameters.
244 The code for allocating the kernel NIC interfaces for a specific port is as follows:
249 kni_alloc(uint8_t port_id)
253 struct rte_kni_conf conf;
254 struct kni_port_params **params = kni_port_params_array;
256 if (port_id >= RTE_MAX_ETHPORTS || !params[port_id])
259 params[port_id]->nb_kni = params[port_id]->nb_lcore_k ? params[port_id]->nb_lcore_k : 1;
261 for (i = 0; i < params[port_id]->nb_kni; i++) {
263 /* Clear conf at first */
265 memset(&conf, 0, sizeof(conf));
266 if (params[port_id]->nb_lcore_k) {
267 snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u_%u", port_id, i);
268 conf.core_id = params[port_id]->lcore_k[i];
271 snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u", port_id);
272 conf.group_id = (uint16_t)port_id;
273 conf.mbuf_size = MAX_PACKET_SZ;
276 * The first KNI device associated to a port
277 * is the master, for multiple kernel thread
282 struct rte_kni_ops ops;
283 struct rte_eth_dev_info dev_info;
285 memset(&dev_info, 0, sizeof(dev_info)); rte_eth_dev_info_get(port_id, &dev_info);
287 conf.addr = dev_info.pci_dev->addr;
288 conf.id = dev_info.pci_dev->id;
290 memset(&ops, 0, sizeof(ops));
292 ops.port_id = port_id;
293 ops.change_mtu = kni_change_mtu;
294 ops.config_network_if = kni_config_network_interface;
296 kni = rte_kni_alloc(pktmbuf_pool, &conf, &ops);
298 kni = rte_kni_alloc(pktmbuf_pool, &conf, NULL);
301 rte_exit(EXIT_FAILURE, "Fail to create kni for "
302 "port: %d\n", port_id);
304 params[port_id]->kni[i] = kni;
309 The other step in the initialization process that is unique to this sample application
310 is the association of each port with lcores for RX, TX and kernel threads.
312 * One lcore to read from the port and write to the associated one or more KNI devices
314 * Another lcore to read from one or more KNI devices and write to the port
316 * Other lcores for pinning the kernel threads on one by one
318 This is done by using the`kni_port_params_array[]` array, which is indexed by the port ID.
319 The code is as follows:
321 .. code-block:: console
324 parse_config(const char *arg)
326 const char *p, *p0 = arg;
333 _NUM_FLD = KNI_MAX_KTHREAD + 3,
336 char *str_fld[_NUM_FLD];
337 unsigned long int_fld[_NUM_FLD];
338 uint8_t port_id, nb_kni_port_params = 0;
340 memset(&kni_port_params_array, 0, sizeof(kni_port_params_array));
342 while (((p = strchr(p0, '(')) != NULL) && nb_kni_port_params < RTE_MAX_ETHPORTS) {
344 if ((p0 = strchr(p, ')')) == NULL)
349 if (size >= sizeof(s)) {
350 printf("Invalid config parameters\n");
354 snprintf(s, sizeof(s), "%.*s", size, p);
355 nb_token = rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ',');
357 if (nb_token <= FLD_LCORE_TX) {
358 printf("Invalid config parameters\n");
362 for (i = 0; i < nb_token; i++) {
364 int_fld[i] = strtoul(str_fld[i], &end, 0);
365 if (errno != 0 || end == str_fld[i]) {
366 printf("Invalid config parameters\n");
372 port_id = (uint8_t)int_fld[i++];
374 if (port_id >= RTE_MAX_ETHPORTS) {
375 printf("Port ID %u could not exceed the maximum %u\n", port_id, RTE_MAX_ETHPORTS);
379 if (kni_port_params_array[port_id]) {
380 printf("Port %u has been configured\n", port_id);
384 kni_port_params_array[port_id] = (struct kni_port_params*)rte_zmalloc("KNI_port_params", sizeof(struct kni_port_params), RTE_CACHE_LINE_SIZE);
385 kni_port_params_array[port_id]->port_id = port_id;
386 kni_port_params_array[port_id]->lcore_rx = (uint8_t)int_fld[i++];
387 kni_port_params_array[port_id]->lcore_tx = (uint8_t)int_fld[i++];
389 if (kni_port_params_array[port_id]->lcore_rx >= RTE_MAX_LCORE || kni_port_params_array[port_id]->lcore_tx >= RTE_MAX_LCORE) {
390 printf("lcore_rx %u or lcore_tx %u ID could not "
391 "exceed the maximum %u\n",
392 kni_port_params_array[port_id]->lcore_rx, kni_port_params_array[port_id]->lcore_tx, RTE_MAX_LCORE);
396 for (j = 0; i < nb_token && j < KNI_MAX_KTHREAD; i++, j++)
397 kni_port_params_array[port_id]->lcore_k[j] = (uint8_t)int_fld[i];
398 kni_port_params_array[port_id]->nb_lcore_k = j;
407 for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
408 if (kni_port_params_array[i]) {
409 rte_free(kni_port_params_array[i]);
410 kni_port_params_array[i] = NULL;
421 After the initialization steps are completed, the main_loop() function is run on each lcore.
422 This function first checks the lcore_id against the user provided lcore_rx and lcore_tx
423 to see if this lcore is reading from or writing to kernel NIC interfaces.
425 For the case that reads from a NIC port and writes to the kernel NIC interfaces,
426 the packet reception is the same as in L2 Forwarding sample application
427 (see :ref:`l2_fwd_app_rx_tx_packets`).
428 The packet transmission is done by sending mbufs into the kernel NIC interfaces by rte_kni_tx_burst().
429 The KNI library automatically frees the mbufs after the kernel successfully copied the mbufs.
434 * Interface to burst rx and enqueue mbufs into rx_q
438 kni_ingress(struct kni_port_params *p)
440 uint8_t i, nb_kni, port_id;
442 struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
448 port_id = p->port_id;
450 for (i = 0; i < nb_kni; i++) {
451 /* Burst rx from eth */
452 nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ);
453 if (unlikely(nb_rx > PKT_BURST_SZ)) {
454 RTE_LOG(ERR, APP, "Error receiving from eth\n");
458 /* Burst tx to kni */
459 num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx);
460 kni_stats[port_id].rx_packets += num;
461 rte_kni_handle_request(p->kni[i]);
463 if (unlikely(num < nb_rx)) {
464 /* Free mbufs not tx to kni interface */
465 kni_burst_free_mbufs(&pkts_burst[num], nb_rx - num);
466 kni_stats[port_id].rx_dropped += nb_rx - num;
471 For the other case that reads from kernel NIC interfaces and writes to a physical NIC port, packets are retrieved by reading
472 mbufs from kernel NIC interfaces by `rte_kni_rx_burst()`.
473 The packet transmission is the same as in the L2 Forwarding sample application
474 (see :ref:`l2_fwd_app_rx_tx_packets`).
479 * Interface to dequeue mbufs from tx_q and burst tx
484 kni_egress(struct kni_port_params *p)
486 uint8_t i, nb_kni, port_id;
488 struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
494 port_id = p->port_id;
496 for (i = 0; i < nb_kni; i++) {
497 /* Burst rx from kni */
498 num = rte_kni_rx_burst(p->kni[i], pkts_burst, PKT_BURST_SZ);
499 if (unlikely(num > PKT_BURST_SZ)) {
500 RTE_LOG(ERR, APP, "Error receiving from KNI\n");
504 /* Burst tx to eth */
506 nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num);
508 kni_stats[port_id].tx_packets += nb_tx;
510 if (unlikely(nb_tx < num)) {
511 /* Free mbufs not tx to NIC */
512 kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx);
513 kni_stats[port_id].tx_dropped += num - nb_tx;
518 Callbacks for Kernel Requests
519 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
521 To execute specific PMD operations in user space requested by some Linux* commands,
522 callbacks must be implemented and filled in the struct rte_kni_ops structure.
523 Currently, setting a new MTU and configuring the network interface (up/ down) are supported.
527 static struct rte_kni_ops kni_ops = {
528 .change_mtu = kni_change_mtu,
529 .config_network_if = kni_config_network_interface,
532 /* Callback for request of changing MTU */
535 kni_change_mtu(uint8_t port_id, unsigned new_mtu)
538 struct rte_eth_conf conf;
540 if (port_id >= rte_eth_dev_count()) {
541 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
545 RTE_LOG(INFO, APP, "Change MTU of port %d to %u\n", port_id, new_mtu);
547 /* Stop specific port */
549 rte_eth_dev_stop(port_id);
551 memcpy(&conf, &port_conf, sizeof(conf));
555 if (new_mtu > ETHER_MAX_LEN)
556 conf.rxmode.jumbo_frame = 1;
558 conf.rxmode.jumbo_frame = 0;
560 /* mtu + length of header + length of FCS = max pkt length */
562 conf.rxmode.max_rx_pkt_len = new_mtu + KNI_ENET_HEADER_SIZE + KNI_ENET_FCS_SIZE;
564 ret = rte_eth_dev_configure(port_id, 1, 1, &conf);
566 RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id);
570 /* Restart specific port */
572 ret = rte_eth_dev_start(port_id);
574 RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id);
581 /* Callback for request of configuring network interface up/down */
584 kni_config_network_interface(uint8_t port_id, uint8_t if_up)
588 if (port_id >= rte_eth_dev_count() || port_id >= RTE_MAX_ETHPORTS) {
589 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
593 RTE_LOG(INFO, APP, "Configure network interface of %d %s\n",
595 port_id, if_up ? "up" : "down");
598 /* Configure network interface up */
599 rte_eth_dev_stop(port_id);
600 ret = rte_eth_dev_start(port_id);
601 } else /* Configure network interface down */
602 rte_eth_dev_stop(port_id);
605 RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id);