2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
5 Redistribution and use in source and binary forms, with or without
6 modification, are permitted provided that the following conditions
9 * Redistributions of source code must retain the above copyright
10 notice, this list of conditions and the following disclaimer.
11 * Redistributions in binary form must reproduce the above copyright
12 notice, this list of conditions and the following disclaimer in
13 the documentation and/or other materials provided with the
15 * Neither the name of Intel Corporation nor the names of its
16 contributors may be used to endorse or promote products derived
17 from this software without specific prior written permission.
19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31 Kernel NIC Interface Sample Application
32 =======================================
34 The Kernel NIC Interface (KNI) is an Intel® DPDK control plane solution that
35 allows userspace applications to exchange packets with the kernel networking stack.
36 To accomplish this, Intel® DPDK userspace applications use an IOCTL call
37 to request the creation of a KNI virtual device in the Linux* kernel.
38 The IOCTL call provides interface information and the Intel® DPDK's physical address space,
39 which is re-mapped into the kernel address space by the KNI kernel loadable module
40 that saves the information to a virtual device context.
41 The Intel® DPDK creates FIFO queues for packet ingress and egress
42 to the kernel module for each device allocated.
44 The KNI kernel loadable module is a standard net driver,
45 which upon receiving the IOCTL call access the Intel® DPDK's FIFO queue to
46 receive/transmit packets from/to the Intel® DPDK userspace application.
47 The FIFO queues contain pointers to data packets in the Intel® DPDK. This:
49 * Provides a faster mechanism to interface with the kernel net stack and eliminates system calls
51 * Facilitates the Intel® DPDK using standard Linux* userspace net tools (tcpdump, ftp, and so on)
53 * Eliminate the copy_to_user and copy_from_user operations on packets.
55 The Kernel NIC Interface sample application is a simple example that demonstrates the use
56 of the Intel® DPDK to create a path for packets to go through the Linux* kernel.
57 This is done by creating one or more kernel net devices for each of the Intel® DPDK ports.
58 The application allows the use of standard Linux tools (ethtool, ifconfig, tcpdump) with the Intel® DPDK ports and
59 also the exchange of packets between the Intel® DPDK application and the Linux* kernel.
64 The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used,
65 and allocates one or more KNI device for each physical NIC port with kernel module's support.
66 For a physical NIC port, one thread reads from the port and writes to KNI devices,
67 and another thread reads from KNI devices and writes the data unmodified to the physical NIC port.
68 It is recommended to configure one KNI device for each physical NIC port.
69 If configured with more than one KNI devices for a physical NIC port,
70 it is just for performance testing, or it can work together with VMDq support in future.
72 The packet flow through the Kernel NIC Interface application is as shown in the following figure.
76 **Figure 2. Kernel NIC Application Packet Flow**
78 .. image3_png has been renamed to kernel_nic.png
82 Compiling the Application
83 -------------------------
85 Compile the application as follows:
87 #. Go to the example directory:
89 .. code-block:: console
91 export RTE_SDK=/path/to/rte_sdk cd
92 ${RTE_SDK}/examples/kni
94 #. Set the target (a default target is used if not specified)
98 This application is intended as a linuxapp only.
100 .. code-block:: console
102 export RTE_TARGET=x86_64-native-linuxapp-gcc
104 #. Build the application:
106 .. code-block:: console
110 Loading the Kernel Module
111 -------------------------
113 Loading the KNI kernel module without any parameter is the typical way an Intel® DPDK application
114 gets packets into and out of the kernel net stack.
115 This way, only one kernel thread is created for all KNI devices for packet receiving in kernel side:
117 .. code-block:: console
121 Pinning the kernel thread to a specific core can be done using a taskset command such as following:
123 .. code-block:: console
125 #taskset -p 100000 `pgrep --fl kni_thread | awk '{print $1}'`
127 This command line tries to pin the specific kni_thread on the 20th lcore (lcore numbering starts at 0),
128 which means it needs to check if that lcore is available on the board.
129 This command must be sent after the application has been launched, as insmod does not start the kni thread.
131 For optimum performance,
132 the lcore in the mask must be selected to be on the same socket as the lcores used in the KNI application.
134 To provide flexibility of performance, the kernel module of the KNI,
135 located in the kmod sub-directory of the Intel® DPDK target directory,
136 can be loaded with parameter of kthread_mode as follows:
138 * #insmod rte_kni.ko kthread_mode=single
140 This mode will create only one kernel thread for all KNI devices for packet receiving in kernel side.
141 By default, it is in this single kernel thread mode.
142 It can set core affinity for this kernel thread by using Linux command taskset.
144 * #insmod rte_kni.ko kthread_mode =multiple
146 This mode will create a kernel thread for each KNI device for packet receiving in kernel side.
147 The core affinity of each kernel thread is set when creating the KNI device.
148 The lcore ID for each kernel thread is provided in the command line of launching the application.
149 Multiple kernel thread mode can provide scalable higher performance.
151 To measure the throughput in a loopback mode, the kernel module of the KNI,
152 located in the kmod sub-directory of the Intel® DPDK target directory,
153 can be loaded with parameters as follows:
155 * #insmod rte_kni.ko lo_mode=lo_mode_fifo
157 This loopback mode will involve ring enqueue/dequeue operations in kernel space.
159 * #insmod rte_kni.ko lo_mode=lo_mode_fifo_skb
161 This loopback mode will involve ring enqueue/dequeue operations and sk buffer copies in kernel space.
163 Running the Application
164 -----------------------
166 The application requires a number of command line options:
168 .. code-block:: console
170 kni [EAL options] -- -P -p PORTMASK --config="(port,lcore_rx,lcore_tx[,lcore_kthread,...])[,port,lcore_rx,lcore_tx[,lcore_kthread,...]]"
174 * -P: Set all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address.
175 Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted.
177 * -p PORTMASK: Hexadecimal bitmask of ports to configure.
179 * --config="(port,lcore_rx, lcore_tx[,lcore_kthread, ...]) [, port,lcore_rx, lcore_tx[,lcore_kthread, ...]]":
180 Determines which lcores of RX, TX, kernel thread are mapped to which ports.
182 Refer to *Intel® DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options.
184 The -c coremask parameter of the EAL options should include the lcores indicated by the lcore_rx and lcore_tx,
185 but does not need to include lcores indicated by lcore_kthread as they are used to pin the kernel thread on.
186 The -p PORTMASK parameter should include the ports indicated by the port in --config, neither more nor less.
188 The lcore_kthread in --config can be configured none, one or more lcore IDs.
189 In multiple kernel thread mode, if configured none, a KNI device will be allocated for each port,
190 while no specific lcore affinity will be set for its kernel thread.
191 If configured one or more lcore IDs, one or more KNI devices will be allocated for each port,
192 while specific lcore affinity will be set for its kernel thread.
193 In single kernel thread mode, if configured none, a KNI device will be allocated for each port.
194 If configured one or more lcore IDs,
195 one or more KNI devices will be allocated for each port while
196 no lcore affinity will be set as there is only one kernel thread for all KNI devices.
198 For example, to run the application with two ports served by six lcores, one lcore of RX, one lcore of TX,
199 and one lcore of kernel thread for each port:
201 .. code-block:: console
203 ./build/kni -c 0xf0 -n 4 -- -P -p 0x3 -config="(0,4,6,8),(1,5,7,9)"
208 Once the KNI application is started, one can use different Linux* commands to manage the net interfaces.
209 If more than one KNI devices configured for a physical port,
210 only the first KNI device will be paired to the physical device.
211 Operations on other KNI devices will not affect the physical port handled in user space application.
213 Assigning an IP address:
215 .. code-block:: console
217 #ifconfig vEth0_0 192.168.0.1
219 Displaying the NIC registers:
221 .. code-block:: console
225 Dumping the network traffic:
227 .. code-block:: console
231 When the Intel® DPDK userspace application is closed, all the KNI devices are deleted from Linux*.
236 The following sections provide some explanation of code.
241 Setup of mbuf pool, driver and queues is similar to the setup done in the L2 Forwarding sample application
242 (see Chapter 9 "L2 Forwarding Sample Application (in Real and Virtualized Environments" for details).
243 In addition, one or more kernel NIC interfaces are allocated for each
244 of the configured ports according to the command line parameters.
246 The code for creating the kernel NIC interface for a specific port is as follows:
250 kni = rte_kni_create(port, MAX_PACKET_SZ, pktmbuf_pool, &kni_ops);
252 rte_exit(EXIT_FAILURE, "Fail to create kni dev "
253 "for port: %d\n", port);
255 The code for allocating the kernel NIC interfaces for a specific port is as follows:
260 kni_alloc(uint8_t port_id)
264 struct rte_kni_conf conf;
265 struct kni_port_params **params = kni_port_params_array;
267 if (port_id >= RTE_MAX_ETHPORTS || !params[port_id])
270 params[port_id]->nb_kni = params[port_id]->nb_lcore_k ? params[port_id]->nb_lcore_k : 1;
272 for (i = 0; i < params[port_id]->nb_kni; i++) {
274 /* Clear conf at first */
276 memset(&conf, 0, sizeof(conf));
277 if (params[port_id]->nb_lcore_k) {
278 rte_snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u_%u", port_id, i);
279 conf.core_id = params[port_id]->lcore_k[i];
282 rte_snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u", port_id);
283 conf.group_id = (uint16_t)port_id;
284 conf.mbuf_size = MAX_PACKET_SZ;
287 * The first KNI device associated to a port
288 * is the master, for multiple kernel thread
293 struct rte_kni_ops ops;
294 struct rte_eth_dev_info dev_info;
296 memset(&dev_info, 0, sizeof(dev_info)); rte_eth_dev_info_get(port_id, &dev_info);
298 conf.addr = dev_info.pci_dev->addr;
299 conf.id = dev_info.pci_dev->id;
301 memset(&ops, 0, sizeof(ops));
303 ops.port_id = port_id;
304 ops.change_mtu = kni_change_mtu;
305 ops.config_network_if = kni_config_network_interface;
307 kni = rte_kni_alloc(pktmbuf_pool, &conf, &ops);
309 kni = rte_kni_alloc(pktmbuf_pool, &conf, NULL);
312 rte_exit(EXIT_FAILURE, "Fail to create kni for "
313 "port: %d\n", port_id);
315 params[port_id]->kni[i] = kni;
320 The other step in the initialization process that is unique to this sample application
321 is the association of each port with lcores for RX, TX and kernel threads.
323 * One lcore to read from the port and write to the associated one or more KNI devices
325 * Another lcore to read from one or more KNI devices and write to the port
327 * Other lcores for pinning the kernel threads on one by one
329 This is done by using the`kni_port_params_array[]` array, which is indexed by the port ID.
330 The code is as follows:
332 .. code-block:: console
335 parse_config(const char *arg)
337 const char *p, *p0 = arg;
344 _NUM_FLD = KNI_MAX_KTHREAD + 3,
347 char *str_fld[_NUM_FLD];
348 unsigned long int_fld[_NUM_FLD];
349 uint8_t port_id, nb_kni_port_params = 0;
351 memset(&kni_port_params_array, 0, sizeof(kni_port_params_array));
353 while (((p = strchr(p0, '(')) != NULL) && nb_kni_port_params < RTE_MAX_ETHPORTS) {
355 if ((p0 = strchr(p, ')')) == NULL)
360 if (size >= sizeof(s)) {
361 printf("Invalid config parameters\n");
365 rte_snprintf(s, sizeof(s), "%.*s", size, p);
366 nb_token = rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ',');
368 if (nb_token <= FLD_LCORE_TX) {
369 printf("Invalid config parameters\n");
373 for (i = 0; i < nb_token; i++) {
375 int_fld[i] = strtoul(str_fld[i], &end, 0);
376 if (errno != 0 || end == str_fld[i]) {
377 printf("Invalid config parameters\n");
383 port_id = (uint8_t)int_fld[i++];
385 if (port_id >= RTE_MAX_ETHPORTS) {
386 printf("Port ID %u could not exceed the maximum %u\n", port_id, RTE_MAX_ETHPORTS);
390 if (kni_port_params_array[port_id]) {
391 printf("Port %u has been configured\n", port_id);
395 kni_port_params_array[port_id] = (struct kni_port_params*)rte_zmalloc("KNI_port_params", sizeof(struct kni_port_params), RTE_CACHE_LINE_SIZE);
396 kni_port_params_array[port_id]->port_id = port_id;
397 kni_port_params_array[port_id]->lcore_rx = (uint8_t)int_fld[i++];
398 kni_port_params_array[port_id]->lcore_tx = (uint8_t)int_fld[i++];
400 if (kni_port_params_array[port_id]->lcore_rx >= RTE_MAX_LCORE || kni_port_params_array[port_id]->lcore_tx >= RTE_MAX_LCORE) {
401 printf("lcore_rx %u or lcore_tx %u ID could not "
402 "exceed the maximum %u\n",
403 kni_port_params_array[port_id]->lcore_rx, kni_port_params_array[port_id]->lcore_tx, RTE_MAX_LCORE);
407 for (j = 0; i < nb_token && j < KNI_MAX_KTHREAD; i++, j++)
408 kni_port_params_array[port_id]->lcore_k[j] = (uint8_t)int_fld[i];
409 kni_port_params_array[port_id]->nb_lcore_k = j;
418 for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
419 if (kni_port_params_array[i]) {
420 rte_free(kni_port_params_array[i]);
421 kni_port_params_array[i] = NULL;
432 After the initialization steps are completed, the main_loop() function is run on each lcore.
433 This function first checks the lcore_id against the user provided lcore_rx and lcore_tx
434 to see if this lcore is reading from or writing to kernel NIC interfaces.
436 For the case that reads from a NIC port and writes to the kernel NIC interfaces,
437 the packet reception is the same as in L2 Forwarding sample application
438 (see Section 9.4.6 "Receive, Process and Transmit Packets").
439 The packet transmission is done by sending mbufs into the kernel NIC interfaces by rte_kni_tx_burst().
440 The KNI library automatically frees the mbufs after the kernel successfully copied the mbufs.
445 * Interface to burst rx and enqueue mbufs into rx_q
449 kni_ingress(struct kni_port_params *p)
451 uint8_t i, nb_kni, port_id;
453 struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
459 port_id = p->port_id;
461 for (i = 0; i < nb_kni; i++) {
462 /* Burst rx from eth */
463 nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ);
464 if (unlikely(nb_rx > PKT_BURST_SZ)) {
465 RTE_LOG(ERR, APP, "Error receiving from eth\n");
469 /* Burst tx to kni */
470 num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx);
471 kni_stats[port_id].rx_packets += num;
472 rte_kni_handle_request(p->kni[i]);
474 if (unlikely(num < nb_rx)) {
475 /* Free mbufs not tx to kni interface */
476 kni_burst_free_mbufs(&pkts_burst[num], nb_rx - num);
477 kni_stats[port_id].rx_dropped += nb_rx - num;
482 For the other case that reads from kernel NIC interfaces and writes to a physical NIC port, packets are retrieved by reading
483 mbufs from kernel NIC interfaces by `rte_kni_rx_burst()`.
484 The packet transmission is the same as in the L2 Forwarding sample application
485 (see Section 9.4.6 "Receive, Process and Transmit Packet's").
490 * Interface to dequeue mbufs from tx_q and burst tx
495 kni_egress(struct kni_port_params *p)
497 uint8_t i, nb_kni, port_id;
499 struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
505 port_id = p->port_id;
507 for (i = 0; i < nb_kni; i++) {
508 /* Burst rx from kni */
509 num = rte_kni_rx_burst(p->kni[i], pkts_burst, PKT_BURST_SZ);
510 if (unlikely(num > PKT_BURST_SZ)) {
511 RTE_LOG(ERR, APP, "Error receiving from KNI\n");
515 /* Burst tx to eth */
517 nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num);
519 kni_stats[port_id].tx_packets += nb_tx;
521 if (unlikely(nb_tx < num)) {
522 /* Free mbufs not tx to NIC */
523 kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx);
524 kni_stats[port_id].tx_dropped += num - nb_tx;
529 Callbacks for Kernel Requests
530 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
532 To execute specific PMD operations in user space requested by some Linux* commands,
533 callbacks must be implemented and filled in the struct rte_kni_ops structure.
534 Currently, setting a new MTU and configuring the network interface (up/ down) are supported.
538 static struct rte_kni_ops kni_ops = {
539 .change_mtu = kni_change_mtu,
540 .config_network_if = kni_config_network_interface,
543 /* Callback for request of changing MTU */
546 kni_change_mtu(uint8_t port_id, unsigned new_mtu)
549 struct rte_eth_conf conf;
551 if (port_id >= rte_eth_dev_count()) {
552 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
556 RTE_LOG(INFO, APP, "Change MTU of port %d to %u\n", port_id, new_mtu);
558 /* Stop specific port */
560 rte_eth_dev_stop(port_id);
562 memcpy(&conf, &port_conf, sizeof(conf));
566 if (new_mtu > ETHER_MAX_LEN)
567 conf.rxmode.jumbo_frame = 1;
569 conf.rxmode.jumbo_frame = 0;
571 /* mtu + length of header + length of FCS = max pkt length */
573 conf.rxmode.max_rx_pkt_len = new_mtu + KNI_ENET_HEADER_SIZE + KNI_ENET_FCS_SIZE;
575 ret = rte_eth_dev_configure(port_id, 1, 1, &conf);
577 RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id);
581 /* Restart specific port */
583 ret = rte_eth_dev_start(port_id);
585 RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id);
592 /* Callback for request of configuring network interface up/down */
595 kni_config_network_interface(uint8_t port_id, uint8_t if_up)
599 if (port_id >= rte_eth_dev_count() || port_id >= RTE_MAX_ETHPORTS) {
600 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
604 RTE_LOG(INFO, APP, "Configure network interface of %d %s\n",
606 port_id, if_up ? "up" : "down");
609 /* Configure network interface up */
610 rte_eth_dev_stop(port_id);
611 ret = rte_eth_dev_start(port_id);
612 } else /* Configure network interface down */
613 rte_eth_dev_stop(port_id);
616 RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id);
620 .. |kernel_nic| image:: img/kernel_nic.png