2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
5 Redistribution and use in source and binary forms, with or without
6 modification, are permitted provided that the following conditions
9 * Redistributions of source code must retain the above copyright
10 notice, this list of conditions and the following disclaimer.
11 * Redistributions in binary form must reproduce the above copyright
12 notice, this list of conditions and the following disclaimer in
13 the documentation and/or other materials provided with the
15 * Neither the name of Intel Corporation nor the names of its
16 contributors may be used to endorse or promote products derived
17 from this software without specific prior written permission.
19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31 Kernel NIC Interface Sample Application
32 =======================================
34 The Kernel NIC Interface (KNI) is a DPDK control plane solution that
35 allows userspace applications to exchange packets with the kernel networking stack.
36 To accomplish this, DPDK userspace applications use an IOCTL call
37 to request the creation of a KNI virtual device in the Linux* kernel.
38 The IOCTL call provides interface information and the DPDK's physical address space,
39 which is re-mapped into the kernel address space by the KNI kernel loadable module
40 that saves the information to a virtual device context.
41 The DPDK creates FIFO queues for packet ingress and egress
42 to the kernel module for each device allocated.
44 The KNI kernel loadable module is a standard net driver,
45 which upon receiving the IOCTL call access the DPDK's FIFO queue to
46 receive/transmit packets from/to the DPDK userspace application.
47 The FIFO queues contain pointers to data packets in the DPDK. This:
49 * Provides a faster mechanism to interface with the kernel net stack and eliminates system calls
51 * Facilitates the DPDK using standard Linux* userspace net tools (tcpdump, ftp, and so on)
53 * Eliminate the copy_to_user and copy_from_user operations on packets.
55 The Kernel NIC Interface sample application is a simple example that demonstrates the use
56 of the DPDK to create a path for packets to go through the Linux* kernel.
57 This is done by creating one or more kernel net devices for each of the DPDK ports.
58 The application allows the use of standard Linux tools (ethtool, ifconfig, tcpdump) with the DPDK ports and
59 also the exchange of packets between the DPDK application and the Linux* kernel.
64 The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used,
65 and allocates one or more KNI device for each physical NIC port with kernel module's support.
66 For a physical NIC port, one thread reads from the port and writes to KNI devices,
67 and another thread reads from KNI devices and writes the data unmodified to the physical NIC port.
68 It is recommended to configure one KNI device for each physical NIC port.
69 If configured with more than one KNI devices for a physical NIC port,
70 it is just for performance testing, or it can work together with VMDq support in future.
72 The packet flow through the Kernel NIC Interface application is as shown in the following figure.
74 .. _figure_kernel_nic:
76 .. figure:: img/kernel_nic.*
78 Kernel NIC Application Packet Flow
80 Compiling the Application
81 -------------------------
83 To compile the sample application see :doc:`compiling`.
85 The application is located in the ``kni`` sub-directory.
89 This application is intended as a linuxapp only.
91 Loading the Kernel Module
92 -------------------------
94 Loading the KNI kernel module without any parameter is the typical way a DPDK application
95 gets packets into and out of the kernel net stack.
96 This way, only one kernel thread is created for all KNI devices for packet receiving in kernel side:
98 .. code-block:: console
102 Pinning the kernel thread to a specific core can be done using a taskset command such as following:
104 .. code-block:: console
106 #taskset -p 100000 `pgrep --fl kni_thread | awk '{print $1}'`
108 This command line tries to pin the specific kni_thread on the 20th lcore (lcore numbering starts at 0),
109 which means it needs to check if that lcore is available on the board.
110 This command must be sent after the application has been launched, as insmod does not start the kni thread.
112 For optimum performance,
113 the lcore in the mask must be selected to be on the same socket as the lcores used in the KNI application.
115 To provide flexibility of performance, the kernel module of the KNI,
116 located in the kmod sub-directory of the DPDK target directory,
117 can be loaded with parameter of kthread_mode as follows:
119 * #insmod rte_kni.ko kthread_mode=single
121 This mode will create only one kernel thread for all KNI devices for packet receiving in kernel side.
122 By default, it is in this single kernel thread mode.
123 It can set core affinity for this kernel thread by using Linux command taskset.
125 * #insmod rte_kni.ko kthread_mode =multiple
127 This mode will create a kernel thread for each KNI device for packet receiving in kernel side.
128 The core affinity of each kernel thread is set when creating the KNI device.
129 The lcore ID for each kernel thread is provided in the command line of launching the application.
130 Multiple kernel thread mode can provide scalable higher performance.
132 To measure the throughput in a loopback mode, the kernel module of the KNI,
133 located in the kmod sub-directory of the DPDK target directory,
134 can be loaded with parameters as follows:
136 * #insmod rte_kni.ko lo_mode=lo_mode_fifo
138 This loopback mode will involve ring enqueue/dequeue operations in kernel space.
140 * #insmod rte_kni.ko lo_mode=lo_mode_fifo_skb
142 This loopback mode will involve ring enqueue/dequeue operations and sk buffer copies in kernel space.
144 Running the Application
145 -----------------------
147 The application requires a number of command line options:
149 .. code-block:: console
151 kni [EAL options] -- -P -p PORTMASK --config="(port,lcore_rx,lcore_tx[,lcore_kthread,...])[,port,lcore_rx,lcore_tx[,lcore_kthread,...]]"
155 * -P: Set all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address.
156 Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted.
158 * -p PORTMASK: Hexadecimal bitmask of ports to configure.
160 * --config="(port,lcore_rx, lcore_tx[,lcore_kthread, ...]) [, port,lcore_rx, lcore_tx[,lcore_kthread, ...]]":
161 Determines which lcores of RX, TX, kernel thread are mapped to which ports.
163 Refer to *DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options.
165 The -c coremask or -l corelist parameter of the EAL options should include the lcores indicated by the lcore_rx and lcore_tx,
166 but does not need to include lcores indicated by lcore_kthread as they are used to pin the kernel thread on.
167 The -p PORTMASK parameter should include the ports indicated by the port in --config, neither more nor less.
169 The lcore_kthread in --config can be configured none, one or more lcore IDs.
170 In multiple kernel thread mode, if configured none, a KNI device will be allocated for each port,
171 while no specific lcore affinity will be set for its kernel thread.
172 If configured one or more lcore IDs, one or more KNI devices will be allocated for each port,
173 while specific lcore affinity will be set for its kernel thread.
174 In single kernel thread mode, if configured none, a KNI device will be allocated for each port.
175 If configured one or more lcore IDs,
176 one or more KNI devices will be allocated for each port while
177 no lcore affinity will be set as there is only one kernel thread for all KNI devices.
179 For example, to run the application with two ports served by six lcores, one lcore of RX, one lcore of TX,
180 and one lcore of kernel thread for each port:
182 .. code-block:: console
184 ./build/kni -l 4-7 -n 4 -- -P -p 0x3 --config="(0,4,6,8),(1,5,7,9)"
189 Once the KNI application is started, one can use different Linux* commands to manage the net interfaces.
190 If more than one KNI devices configured for a physical port,
191 only the first KNI device will be paired to the physical device.
192 Operations on other KNI devices will not affect the physical port handled in user space application.
194 Assigning an IP address:
196 .. code-block:: console
198 #ifconfig vEth0_0 192.168.0.1
200 Displaying the NIC registers:
202 .. code-block:: console
206 Dumping the network traffic:
208 .. code-block:: console
212 Change the MAC address:
214 .. code-block:: console
216 #ifconfig vEth0_0 hw ether 0C:01:02:03:04:08
218 When the DPDK userspace application is closed, all the KNI devices are deleted from Linux*.
223 The following sections provide some explanation of code.
228 Setup of mbuf pool, driver and queues is similar to the setup done in the :doc:`l2_forward_real_virtual`..
229 In addition, one or more kernel NIC interfaces are allocated for each
230 of the configured ports according to the command line parameters.
232 The code for allocating the kernel NIC interfaces for a specific port is as follows:
237 kni_alloc(uint16_t port_id)
241 struct rte_kni_conf conf;
242 struct kni_port_params **params = kni_port_params_array;
244 if (port_id >= RTE_MAX_ETHPORTS || !params[port_id])
247 params[port_id]->nb_kni = params[port_id]->nb_lcore_k ? params[port_id]->nb_lcore_k : 1;
249 for (i = 0; i < params[port_id]->nb_kni; i++) {
251 /* Clear conf at first */
253 memset(&conf, 0, sizeof(conf));
254 if (params[port_id]->nb_lcore_k) {
255 snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u_%u", port_id, i);
256 conf.core_id = params[port_id]->lcore_k[i];
259 snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u", port_id);
260 conf.group_id = (uint16_t)port_id;
261 conf.mbuf_size = MAX_PACKET_SZ;
264 * The first KNI device associated to a port
265 * is the master, for multiple kernel thread
270 struct rte_kni_ops ops;
271 struct rte_eth_dev_info dev_info;
273 memset(&dev_info, 0, sizeof(dev_info)); rte_eth_dev_info_get(port_id, &dev_info);
275 conf.addr = dev_info.pci_dev->addr;
276 conf.id = dev_info.pci_dev->id;
278 /* Get the interface default mac address */
279 rte_eth_macaddr_get(port_id, (struct ether_addr *)&conf.mac_addr);
281 memset(&ops, 0, sizeof(ops));
283 ops.port_id = port_id;
284 ops.change_mtu = kni_change_mtu;
285 ops.config_network_if = kni_config_network_interface;
286 ops.config_mac_address = kni_config_mac_address;
288 kni = rte_kni_alloc(pktmbuf_pool, &conf, &ops);
290 kni = rte_kni_alloc(pktmbuf_pool, &conf, NULL);
293 rte_exit(EXIT_FAILURE, "Fail to create kni for "
294 "port: %d\n", port_id);
296 params[port_id]->kni[i] = kni;
301 The other step in the initialization process that is unique to this sample application
302 is the association of each port with lcores for RX, TX and kernel threads.
304 * One lcore to read from the port and write to the associated one or more KNI devices
306 * Another lcore to read from one or more KNI devices and write to the port
308 * Other lcores for pinning the kernel threads on one by one
310 This is done by using the`kni_port_params_array[]` array, which is indexed by the port ID.
311 The code is as follows:
313 .. code-block:: console
316 parse_config(const char *arg)
318 const char *p, *p0 = arg;
325 _NUM_FLD = KNI_MAX_KTHREAD + 3,
328 char *str_fld[_NUM_FLD];
329 unsigned long int_fld[_NUM_FLD];
330 uint16_t port_id, nb_kni_port_params = 0;
332 memset(&kni_port_params_array, 0, sizeof(kni_port_params_array));
334 while (((p = strchr(p0, '(')) != NULL) && nb_kni_port_params < RTE_MAX_ETHPORTS) {
336 if ((p0 = strchr(p, ')')) == NULL)
341 if (size >= sizeof(s)) {
342 printf("Invalid config parameters\n");
346 snprintf(s, sizeof(s), "%.*s", size, p);
347 nb_token = rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ',');
349 if (nb_token <= FLD_LCORE_TX) {
350 printf("Invalid config parameters\n");
354 for (i = 0; i < nb_token; i++) {
356 int_fld[i] = strtoul(str_fld[i], &end, 0);
357 if (errno != 0 || end == str_fld[i]) {
358 printf("Invalid config parameters\n");
364 port_id = (uint8_t)int_fld[i++];
366 if (port_id >= RTE_MAX_ETHPORTS) {
367 printf("Port ID %u could not exceed the maximum %u\n", port_id, RTE_MAX_ETHPORTS);
371 if (kni_port_params_array[port_id]) {
372 printf("Port %u has been configured\n", port_id);
376 kni_port_params_array[port_id] = (struct kni_port_params*)rte_zmalloc("KNI_port_params", sizeof(struct kni_port_params), RTE_CACHE_LINE_SIZE);
377 kni_port_params_array[port_id]->port_id = port_id;
378 kni_port_params_array[port_id]->lcore_rx = (uint8_t)int_fld[i++];
379 kni_port_params_array[port_id]->lcore_tx = (uint8_t)int_fld[i++];
381 if (kni_port_params_array[port_id]->lcore_rx >= RTE_MAX_LCORE || kni_port_params_array[port_id]->lcore_tx >= RTE_MAX_LCORE) {
382 printf("lcore_rx %u or lcore_tx %u ID could not "
383 "exceed the maximum %u\n",
384 kni_port_params_array[port_id]->lcore_rx, kni_port_params_array[port_id]->lcore_tx, RTE_MAX_LCORE);
388 for (j = 0; i < nb_token && j < KNI_MAX_KTHREAD; i++, j++)
389 kni_port_params_array[port_id]->lcore_k[j] = (uint8_t)int_fld[i];
390 kni_port_params_array[port_id]->nb_lcore_k = j;
399 for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
400 if (kni_port_params_array[i]) {
401 rte_free(kni_port_params_array[i]);
402 kni_port_params_array[i] = NULL;
413 After the initialization steps are completed, the main_loop() function is run on each lcore.
414 This function first checks the lcore_id against the user provided lcore_rx and lcore_tx
415 to see if this lcore is reading from or writing to kernel NIC interfaces.
417 For the case that reads from a NIC port and writes to the kernel NIC interfaces,
418 the packet reception is the same as in L2 Forwarding sample application
419 (see :ref:`l2_fwd_app_rx_tx_packets`).
420 The packet transmission is done by sending mbufs into the kernel NIC interfaces by rte_kni_tx_burst().
421 The KNI library automatically frees the mbufs after the kernel successfully copied the mbufs.
426 * Interface to burst rx and enqueue mbufs into rx_q
430 kni_ingress(struct kni_port_params *p)
432 uint8_t i, nb_kni, port_id;
434 struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
440 port_id = p->port_id;
442 for (i = 0; i < nb_kni; i++) {
443 /* Burst rx from eth */
444 nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ);
445 if (unlikely(nb_rx > PKT_BURST_SZ)) {
446 RTE_LOG(ERR, APP, "Error receiving from eth\n");
450 /* Burst tx to kni */
451 num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx);
452 kni_stats[port_id].rx_packets += num;
453 rte_kni_handle_request(p->kni[i]);
455 if (unlikely(num < nb_rx)) {
456 /* Free mbufs not tx to kni interface */
457 kni_burst_free_mbufs(&pkts_burst[num], nb_rx - num);
458 kni_stats[port_id].rx_dropped += nb_rx - num;
463 For the other case that reads from kernel NIC interfaces and writes to a physical NIC port, packets are retrieved by reading
464 mbufs from kernel NIC interfaces by `rte_kni_rx_burst()`.
465 The packet transmission is the same as in the L2 Forwarding sample application
466 (see :ref:`l2_fwd_app_rx_tx_packets`).
471 * Interface to dequeue mbufs from tx_q and burst tx
476 kni_egress(struct kni_port_params *p)
478 uint8_t i, nb_kni, port_id;
480 struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
486 port_id = p->port_id;
488 for (i = 0; i < nb_kni; i++) {
489 /* Burst rx from kni */
490 num = rte_kni_rx_burst(p->kni[i], pkts_burst, PKT_BURST_SZ);
491 if (unlikely(num > PKT_BURST_SZ)) {
492 RTE_LOG(ERR, APP, "Error receiving from KNI\n");
496 /* Burst tx to eth */
498 nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num);
500 kni_stats[port_id].tx_packets += nb_tx;
502 if (unlikely(nb_tx < num)) {
503 /* Free mbufs not tx to NIC */
504 kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx);
505 kni_stats[port_id].tx_dropped += num - nb_tx;
510 Callbacks for Kernel Requests
511 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
513 To execute specific PMD operations in user space requested by some Linux* commands,
514 callbacks must be implemented and filled in the struct rte_kni_ops structure.
515 Currently, setting a new MTU, change in MAC address, configuring promiscusous mode and
516 configuring the network interface(up/down) re supported.
517 Default implementation for following is available in rte_kni library.
518 Application may choose to not implement following callbacks:
520 - ``config_mac_address``
521 - ``config_promiscusity``
526 static struct rte_kni_ops kni_ops = {
527 .change_mtu = kni_change_mtu,
528 .config_network_if = kni_config_network_interface,
529 .config_mac_address = kni_config_mac_address,
530 .config_promiscusity = kni_config_promiscusity,
533 /* Callback for request of changing MTU */
536 kni_change_mtu(uint16_t port_id, unsigned new_mtu)
539 struct rte_eth_conf conf;
541 if (port_id >= rte_eth_dev_count()) {
542 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
546 RTE_LOG(INFO, APP, "Change MTU of port %d to %u\n", port_id, new_mtu);
548 /* Stop specific port */
550 rte_eth_dev_stop(port_id);
552 memcpy(&conf, &port_conf, sizeof(conf));
556 if (new_mtu > ETHER_MAX_LEN)
557 conf.rxmode.jumbo_frame = 1;
559 conf.rxmode.jumbo_frame = 0;
561 /* mtu + length of header + length of FCS = max pkt length */
563 conf.rxmode.max_rx_pkt_len = new_mtu + KNI_ENET_HEADER_SIZE + KNI_ENET_FCS_SIZE;
565 ret = rte_eth_dev_configure(port_id, 1, 1, &conf);
567 RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id);
571 /* Restart specific port */
573 ret = rte_eth_dev_start(port_id);
575 RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id);
582 /* Callback for request of configuring network interface up/down */
585 kni_config_network_interface(uint16_t port_id, uint8_t if_up)
589 if (port_id >= rte_eth_dev_count() || port_id >= RTE_MAX_ETHPORTS) {
590 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
594 RTE_LOG(INFO, APP, "Configure network interface of %d %s\n",
596 port_id, if_up ? "up" : "down");
599 /* Configure network interface up */
600 rte_eth_dev_stop(port_id);
601 ret = rte_eth_dev_start(port_id);
602 } else /* Configure network interface down */
603 rte_eth_dev_stop(port_id);
606 RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id);
610 /* Callback for request of configuring device mac address */
613 kni_config_mac_address(uint16_t port_id, uint8_t mac_addr[])
618 /* Callback for request of configuring promiscuous mode */
621 kni_config_promiscusity(uint16_t port_id, uint8_t to_on)