2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
5 Redistribution and use in source and binary forms, with or without
6 modification, are permitted provided that the following conditions
9 * Redistributions of source code must retain the above copyright
10 notice, this list of conditions and the following disclaimer.
11 * Redistributions in binary form must reproduce the above copyright
12 notice, this list of conditions and the following disclaimer in
13 the documentation and/or other materials provided with the
15 * Neither the name of Intel Corporation nor the names of its
16 contributors may be used to endorse or promote products derived
17 from this software without specific prior written permission.
19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31 Kernel NIC Interface Sample Application
32 =======================================
34 The Kernel NIC Interface (KNI) is a DPDK control plane solution that
35 allows userspace applications to exchange packets with the kernel networking stack.
36 To accomplish this, DPDK userspace applications use an IOCTL call
37 to request the creation of a KNI virtual device in the Linux* kernel.
38 The IOCTL call provides interface information and the DPDK's physical address space,
39 which is re-mapped into the kernel address space by the KNI kernel loadable module
40 that saves the information to a virtual device context.
41 The DPDK creates FIFO queues for packet ingress and egress
42 to the kernel module for each device allocated.
44 The KNI kernel loadable module is a standard net driver,
45 which upon receiving the IOCTL call access the DPDK's FIFO queue to
46 receive/transmit packets from/to the DPDK userspace application.
47 The FIFO queues contain pointers to data packets in the DPDK. This:
49 * Provides a faster mechanism to interface with the kernel net stack and eliminates system calls
51 * Facilitates the DPDK using standard Linux* userspace net tools (tcpdump, ftp, and so on)
53 * Eliminate the copy_to_user and copy_from_user operations on packets.
55 The Kernel NIC Interface sample application is a simple example that demonstrates the use
56 of the DPDK to create a path for packets to go through the Linux* kernel.
57 This is done by creating one or more kernel net devices for each of the DPDK ports.
58 The application allows the use of standard Linux tools (ethtool, ifconfig, tcpdump) with the DPDK ports and
59 also the exchange of packets between the DPDK application and the Linux* kernel.
64 The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used,
65 and allocates one or more KNI device for each physical NIC port with kernel module's support.
66 For a physical NIC port, one thread reads from the port and writes to KNI devices,
67 and another thread reads from KNI devices and writes the data unmodified to the physical NIC port.
68 It is recommended to configure one KNI device for each physical NIC port.
69 If configured with more than one KNI devices for a physical NIC port,
70 it is just for performance testing, or it can work together with VMDq support in future.
72 The packet flow through the Kernel NIC Interface application is as shown in the following figure.
74 .. _figure_kernel_nic:
76 .. figure:: img/kernel_nic.*
78 Kernel NIC Application Packet Flow
80 Compiling the Application
81 -------------------------
83 To compile the sample application see :doc:`compiling`.
85 The application is located in the ``kni`` sub-directory.
89 This application is intended as a linuxapp only.
91 Loading the Kernel Module
92 -------------------------
94 Loading the KNI kernel module without any parameter is the typical way a DPDK application
95 gets packets into and out of the kernel net stack.
96 This way, only one kernel thread is created for all KNI devices for packet receiving in kernel side:
98 .. code-block:: console
102 Pinning the kernel thread to a specific core can be done using a taskset command such as following:
104 .. code-block:: console
106 #taskset -p 100000 `pgrep --fl kni_thread | awk '{print $1}'`
108 This command line tries to pin the specific kni_thread on the 20th lcore (lcore numbering starts at 0),
109 which means it needs to check if that lcore is available on the board.
110 This command must be sent after the application has been launched, as insmod does not start the kni thread.
112 For optimum performance,
113 the lcore in the mask must be selected to be on the same socket as the lcores used in the KNI application.
115 To provide flexibility of performance, the kernel module of the KNI,
116 located in the kmod sub-directory of the DPDK target directory,
117 can be loaded with parameter of kthread_mode as follows:
119 * #insmod rte_kni.ko kthread_mode=single
121 This mode will create only one kernel thread for all KNI devices for packet receiving in kernel side.
122 By default, it is in this single kernel thread mode.
123 It can set core affinity for this kernel thread by using Linux command taskset.
125 * #insmod rte_kni.ko kthread_mode =multiple
127 This mode will create a kernel thread for each KNI device for packet receiving in kernel side.
128 The core affinity of each kernel thread is set when creating the KNI device.
129 The lcore ID for each kernel thread is provided in the command line of launching the application.
130 Multiple kernel thread mode can provide scalable higher performance.
132 To measure the throughput in a loopback mode, the kernel module of the KNI,
133 located in the kmod sub-directory of the DPDK target directory,
134 can be loaded with parameters as follows:
136 * #insmod rte_kni.ko lo_mode=lo_mode_fifo
138 This loopback mode will involve ring enqueue/dequeue operations in kernel space.
140 * #insmod rte_kni.ko lo_mode=lo_mode_fifo_skb
142 This loopback mode will involve ring enqueue/dequeue operations and sk buffer copies in kernel space.
144 Running the Application
145 -----------------------
147 The application requires a number of command line options:
149 .. code-block:: console
151 kni [EAL options] -- -P -p PORTMASK --config="(port,lcore_rx,lcore_tx[,lcore_kthread,...])[,port,lcore_rx,lcore_tx[,lcore_kthread,...]]"
155 * -P: Set all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address.
156 Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted.
158 * -p PORTMASK: Hexadecimal bitmask of ports to configure.
160 * --config="(port,lcore_rx, lcore_tx[,lcore_kthread, ...]) [, port,lcore_rx, lcore_tx[,lcore_kthread, ...]]":
161 Determines which lcores of RX, TX, kernel thread are mapped to which ports.
163 Refer to *DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options.
165 The -c coremask or -l corelist parameter of the EAL options should include the lcores indicated by the lcore_rx and lcore_tx,
166 but does not need to include lcores indicated by lcore_kthread as they are used to pin the kernel thread on.
167 The -p PORTMASK parameter should include the ports indicated by the port in --config, neither more nor less.
169 The lcore_kthread in --config can be configured none, one or more lcore IDs.
170 In multiple kernel thread mode, if configured none, a KNI device will be allocated for each port,
171 while no specific lcore affinity will be set for its kernel thread.
172 If configured one or more lcore IDs, one or more KNI devices will be allocated for each port,
173 while specific lcore affinity will be set for its kernel thread.
174 In single kernel thread mode, if configured none, a KNI device will be allocated for each port.
175 If configured one or more lcore IDs,
176 one or more KNI devices will be allocated for each port while
177 no lcore affinity will be set as there is only one kernel thread for all KNI devices.
179 For example, to run the application with two ports served by six lcores, one lcore of RX, one lcore of TX,
180 and one lcore of kernel thread for each port:
182 .. code-block:: console
184 ./build/kni -l 4-7 -n 4 -- -P -p 0x3 -config="(0,4,6,8),(1,5,7,9)"
189 Once the KNI application is started, one can use different Linux* commands to manage the net interfaces.
190 If more than one KNI devices configured for a physical port,
191 only the first KNI device will be paired to the physical device.
192 Operations on other KNI devices will not affect the physical port handled in user space application.
194 Assigning an IP address:
196 .. code-block:: console
198 #ifconfig vEth0_0 192.168.0.1
200 Displaying the NIC registers:
202 .. code-block:: console
206 Dumping the network traffic:
208 .. code-block:: console
212 When the DPDK userspace application is closed, all the KNI devices are deleted from Linux*.
217 The following sections provide some explanation of code.
222 Setup of mbuf pool, driver and queues is similar to the setup done in the :doc:`l2_forward_real_virtual`..
223 In addition, one or more kernel NIC interfaces are allocated for each
224 of the configured ports according to the command line parameters.
226 The code for allocating the kernel NIC interfaces for a specific port is as follows:
231 kni_alloc(uint16_t port_id)
235 struct rte_kni_conf conf;
236 struct kni_port_params **params = kni_port_params_array;
238 if (port_id >= RTE_MAX_ETHPORTS || !params[port_id])
241 params[port_id]->nb_kni = params[port_id]->nb_lcore_k ? params[port_id]->nb_lcore_k : 1;
243 for (i = 0; i < params[port_id]->nb_kni; i++) {
245 /* Clear conf at first */
247 memset(&conf, 0, sizeof(conf));
248 if (params[port_id]->nb_lcore_k) {
249 snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u_%u", port_id, i);
250 conf.core_id = params[port_id]->lcore_k[i];
253 snprintf(conf.name, RTE_KNI_NAMESIZE, "vEth%u", port_id);
254 conf.group_id = (uint16_t)port_id;
255 conf.mbuf_size = MAX_PACKET_SZ;
258 * The first KNI device associated to a port
259 * is the master, for multiple kernel thread
264 struct rte_kni_ops ops;
265 struct rte_eth_dev_info dev_info;
267 memset(&dev_info, 0, sizeof(dev_info)); rte_eth_dev_info_get(port_id, &dev_info);
269 conf.addr = dev_info.pci_dev->addr;
270 conf.id = dev_info.pci_dev->id;
272 memset(&ops, 0, sizeof(ops));
274 ops.port_id = port_id;
275 ops.change_mtu = kni_change_mtu;
276 ops.config_network_if = kni_config_network_interface;
278 kni = rte_kni_alloc(pktmbuf_pool, &conf, &ops);
280 kni = rte_kni_alloc(pktmbuf_pool, &conf, NULL);
283 rte_exit(EXIT_FAILURE, "Fail to create kni for "
284 "port: %d\n", port_id);
286 params[port_id]->kni[i] = kni;
291 The other step in the initialization process that is unique to this sample application
292 is the association of each port with lcores for RX, TX and kernel threads.
294 * One lcore to read from the port and write to the associated one or more KNI devices
296 * Another lcore to read from one or more KNI devices and write to the port
298 * Other lcores for pinning the kernel threads on one by one
300 This is done by using the`kni_port_params_array[]` array, which is indexed by the port ID.
301 The code is as follows:
303 .. code-block:: console
306 parse_config(const char *arg)
308 const char *p, *p0 = arg;
315 _NUM_FLD = KNI_MAX_KTHREAD + 3,
318 char *str_fld[_NUM_FLD];
319 unsigned long int_fld[_NUM_FLD];
320 uint16_t port_id, nb_kni_port_params = 0;
322 memset(&kni_port_params_array, 0, sizeof(kni_port_params_array));
324 while (((p = strchr(p0, '(')) != NULL) && nb_kni_port_params < RTE_MAX_ETHPORTS) {
326 if ((p0 = strchr(p, ')')) == NULL)
331 if (size >= sizeof(s)) {
332 printf("Invalid config parameters\n");
336 snprintf(s, sizeof(s), "%.*s", size, p);
337 nb_token = rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ',');
339 if (nb_token <= FLD_LCORE_TX) {
340 printf("Invalid config parameters\n");
344 for (i = 0; i < nb_token; i++) {
346 int_fld[i] = strtoul(str_fld[i], &end, 0);
347 if (errno != 0 || end == str_fld[i]) {
348 printf("Invalid config parameters\n");
354 port_id = (uint8_t)int_fld[i++];
356 if (port_id >= RTE_MAX_ETHPORTS) {
357 printf("Port ID %u could not exceed the maximum %u\n", port_id, RTE_MAX_ETHPORTS);
361 if (kni_port_params_array[port_id]) {
362 printf("Port %u has been configured\n", port_id);
366 kni_port_params_array[port_id] = (struct kni_port_params*)rte_zmalloc("KNI_port_params", sizeof(struct kni_port_params), RTE_CACHE_LINE_SIZE);
367 kni_port_params_array[port_id]->port_id = port_id;
368 kni_port_params_array[port_id]->lcore_rx = (uint8_t)int_fld[i++];
369 kni_port_params_array[port_id]->lcore_tx = (uint8_t)int_fld[i++];
371 if (kni_port_params_array[port_id]->lcore_rx >= RTE_MAX_LCORE || kni_port_params_array[port_id]->lcore_tx >= RTE_MAX_LCORE) {
372 printf("lcore_rx %u or lcore_tx %u ID could not "
373 "exceed the maximum %u\n",
374 kni_port_params_array[port_id]->lcore_rx, kni_port_params_array[port_id]->lcore_tx, RTE_MAX_LCORE);
378 for (j = 0; i < nb_token && j < KNI_MAX_KTHREAD; i++, j++)
379 kni_port_params_array[port_id]->lcore_k[j] = (uint8_t)int_fld[i];
380 kni_port_params_array[port_id]->nb_lcore_k = j;
389 for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
390 if (kni_port_params_array[i]) {
391 rte_free(kni_port_params_array[i]);
392 kni_port_params_array[i] = NULL;
403 After the initialization steps are completed, the main_loop() function is run on each lcore.
404 This function first checks the lcore_id against the user provided lcore_rx and lcore_tx
405 to see if this lcore is reading from or writing to kernel NIC interfaces.
407 For the case that reads from a NIC port and writes to the kernel NIC interfaces,
408 the packet reception is the same as in L2 Forwarding sample application
409 (see :ref:`l2_fwd_app_rx_tx_packets`).
410 The packet transmission is done by sending mbufs into the kernel NIC interfaces by rte_kni_tx_burst().
411 The KNI library automatically frees the mbufs after the kernel successfully copied the mbufs.
416 * Interface to burst rx and enqueue mbufs into rx_q
420 kni_ingress(struct kni_port_params *p)
422 uint8_t i, nb_kni, port_id;
424 struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
430 port_id = p->port_id;
432 for (i = 0; i < nb_kni; i++) {
433 /* Burst rx from eth */
434 nb_rx = rte_eth_rx_burst(port_id, 0, pkts_burst, PKT_BURST_SZ);
435 if (unlikely(nb_rx > PKT_BURST_SZ)) {
436 RTE_LOG(ERR, APP, "Error receiving from eth\n");
440 /* Burst tx to kni */
441 num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx);
442 kni_stats[port_id].rx_packets += num;
443 rte_kni_handle_request(p->kni[i]);
445 if (unlikely(num < nb_rx)) {
446 /* Free mbufs not tx to kni interface */
447 kni_burst_free_mbufs(&pkts_burst[num], nb_rx - num);
448 kni_stats[port_id].rx_dropped += nb_rx - num;
453 For the other case that reads from kernel NIC interfaces and writes to a physical NIC port, packets are retrieved by reading
454 mbufs from kernel NIC interfaces by `rte_kni_rx_burst()`.
455 The packet transmission is the same as in the L2 Forwarding sample application
456 (see :ref:`l2_fwd_app_rx_tx_packets`).
461 * Interface to dequeue mbufs from tx_q and burst tx
466 kni_egress(struct kni_port_params *p)
468 uint8_t i, nb_kni, port_id;
470 struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
476 port_id = p->port_id;
478 for (i = 0; i < nb_kni; i++) {
479 /* Burst rx from kni */
480 num = rte_kni_rx_burst(p->kni[i], pkts_burst, PKT_BURST_SZ);
481 if (unlikely(num > PKT_BURST_SZ)) {
482 RTE_LOG(ERR, APP, "Error receiving from KNI\n");
486 /* Burst tx to eth */
488 nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num);
490 kni_stats[port_id].tx_packets += nb_tx;
492 if (unlikely(nb_tx < num)) {
493 /* Free mbufs not tx to NIC */
494 kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx);
495 kni_stats[port_id].tx_dropped += num - nb_tx;
500 Callbacks for Kernel Requests
501 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
503 To execute specific PMD operations in user space requested by some Linux* commands,
504 callbacks must be implemented and filled in the struct rte_kni_ops structure.
505 Currently, setting a new MTU and configuring the network interface (up/ down) are supported.
509 static struct rte_kni_ops kni_ops = {
510 .change_mtu = kni_change_mtu,
511 .config_network_if = kni_config_network_interface,
514 /* Callback for request of changing MTU */
517 kni_change_mtu(uint16_t port_id, unsigned new_mtu)
520 struct rte_eth_conf conf;
522 if (port_id >= rte_eth_dev_count()) {
523 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
527 RTE_LOG(INFO, APP, "Change MTU of port %d to %u\n", port_id, new_mtu);
529 /* Stop specific port */
531 rte_eth_dev_stop(port_id);
533 memcpy(&conf, &port_conf, sizeof(conf));
537 if (new_mtu > ETHER_MAX_LEN)
538 conf.rxmode.jumbo_frame = 1;
540 conf.rxmode.jumbo_frame = 0;
542 /* mtu + length of header + length of FCS = max pkt length */
544 conf.rxmode.max_rx_pkt_len = new_mtu + KNI_ENET_HEADER_SIZE + KNI_ENET_FCS_SIZE;
546 ret = rte_eth_dev_configure(port_id, 1, 1, &conf);
548 RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id);
552 /* Restart specific port */
554 ret = rte_eth_dev_start(port_id);
556 RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id);
563 /* Callback for request of configuring network interface up/down */
566 kni_config_network_interface(uint16_t port_id, uint8_t if_up)
570 if (port_id >= rte_eth_dev_count() || port_id >= RTE_MAX_ETHPORTS) {
571 RTE_LOG(ERR, APP, "Invalid port id %d\n", port_id);
575 RTE_LOG(INFO, APP, "Configure network interface of %d %s\n",
577 port_id, if_up ? "up" : "down");
580 /* Configure network interface up */
581 rte_eth_dev_stop(port_id);
582 ret = rte_eth_dev_start(port_id);
583 } else /* Configure network interface down */
584 rte_eth_dev_stop(port_id);
587 RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id);