2 Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
5 Redistribution and use in source and binary forms, with or without
6 modification, are permitted provided that the following conditions
9 * Redistributions of source code must retain the above copyright
10 notice, this list of conditions and the following disclaimer.
11 * Redistributions in binary form must reproduce the above copyright
12 notice, this list of conditions and the following disclaimer in
13 the documentation and/or other materials provided with the
15 * Neither the name of Intel Corporation nor the names of its
16 contributors may be used to endorse or promote products derived
17 from this software without specific prior written permission.
19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31 Server-Node EFD Sample Application
32 ==================================
34 This sample application demonstrates the use of EFD library as a flow-level
35 load balancer, for more information about the EFD Library please refer to the
36 DPDK programmer's guide.
38 This sample application is a variant of the
39 :ref:`client-server sample application <multi_process_app>`
40 where a specific target node is specified for every and each flow
41 (not in a round-robin fashion as the original load balancing sample application).
46 The architecture of the EFD flow-based load balancer sample application is
47 presented in the following figure.
49 .. _figure_efd_sample_app_overview:
51 .. figure:: img/server_node_efd.*
53 Using EFD as a Flow-Level Load Balancer
55 As shown in :numref:`figure_efd_sample_app_overview`,
56 the sample application consists of a front-end node (server)
57 using the EFD library to create a load-balancing table for flows,
58 for each flow a target backend worker node is specified. The EFD table does not
59 store the flow key (unlike a regular hash table), and hence, it can
60 individually load-balance millions of flows (number of targets * maximum number
61 of flows fit in a flow table per target) while still fitting in CPU cache.
63 It should be noted that although they are referred to as nodes, the frontend
64 server and worker nodes are processes running on the same platform.
69 Upon initializing, the frontend server node (process) creates a flow
70 distributor table (based on the EFD library) which is populated with flow
71 information and its intended target node.
73 The sample application assigns a specific target node_id (process) for each of
74 the IP destination addresses as follows:
78 node_id = i % num_nodes; /* Target node id is generated */
79 ip_dst = rte_cpu_to_be_32(i); /* Specific ip destination address is
80 assigned to this target node */
82 then the pair of <key,target> is inserted into the flow distribution table.
84 The main loop of the server process receives a burst of packets, then for
85 each packet, a flow key (IP destination address) is extracted. The flow
86 distributor table is looked up and the target node id is returned. Packets are
87 then enqueued to the specified target node id.
89 It should be noted that flow distributor table is not a membership test table.
90 I.e. if the key has already been inserted the target node id will be correct,
91 but for new keys the flow distributor table will return a value (which can be
97 Upon initializing, the worker node (process) creates a flow table (a regular
98 hash table that stores the key default size 1M flows) which is populated with
99 only the flow information that is serviced at this node. This flow key is
100 essential to point out new keys that have not been inserted before.
102 The worker node's main loop is simply receiving packets then doing a hash table
103 lookup. If a match occurs then statistics are updated for flows serviced by
104 this node. If no match is found in the local hash table then this indicates
105 that this is a new flow, which is dropped.
108 Compiling the Application
109 -------------------------
111 To compile the sample application see :doc:`compiling`.
113 The application is located in the ``server_node_efd`` sub-directory.
115 Running the Application
116 -----------------------
118 The application has two binaries to be run: the front-end server
119 and the back-end node.
121 The frontend server (server) has the following command line options::
123 ./server [EAL options] -- -p PORTMASK -n NUM_NODES -f NUM_FLOWS
127 * ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure
128 * ``-n NUM_NODES:`` Number of back-end nodes that will be used
129 * ``-f NUM_FLOWS:`` Number of flows to be added in the EFD table (1 million, by default)
131 The back-end node (node) has the following command line options::
133 ./node [EAL options] -- -n NODE_ID
137 * ``-n NODE_ID:`` Node ID, which cannot be equal or higher than NUM_MODES
140 First, the server app must be launched, with the number of nodes that will be run.
141 Once it has been started, the node instances can be run, with different NODE_ID.
142 These instances have to be run as secondary processes, with ``--proc-type=secondary``
143 in the EAL options, which will attach to the primary process memory, and therefore,
144 they can access the queues created by the primary process to distribute packets.
146 To successfully run the application, the command line used to start the
147 application has to be in sync with the traffic flows configured on the traffic
150 For examples of application command lines and traffic generator flows, please
151 refer to the DPDK Test Report. For more details on how to set up and run the
152 sample applications provided with DPDK package, please refer to the
153 :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` and
154 :ref:`DPDK Getting Started Guide for FreeBSD <freebsd_gsg>`.
160 As described in previous sections, there are two processes in this example.
162 The first process, the front-end server, creates and populates the EFD table,
163 which is used to distribute packets to nodes, which the number of flows
164 specified in the command line (1 million, by default).
170 create_efd_table(void)
172 uint8_t socket_id = rte_socket_id();
175 efd_table = rte_efd_create("flow table", num_flows * 2, sizeof(uint32_t),
176 1 << socket_id, socket_id);
178 if (efd_table == NULL)
179 rte_exit(EXIT_FAILURE, "Problem creating the flow table\n");
183 populate_efd_table(void)
188 uint8_t socket_id = rte_socket_id();
191 /* Add flows in table */
192 for (i = 0; i < num_flows; i++) {
193 node_id = i % num_nodes;
195 ip_dst = rte_cpu_to_be_32(i);
196 ret = rte_efd_update(efd_table, socket_id,
197 (void *)&ip_dst, (efd_value_t)node_id);
199 rte_exit(EXIT_FAILURE, "Unable to add entry %u in "
203 printf("EFD table: Adding 0x%x keys\n", num_flows);
206 After initialization, packets are received from the enabled ports, and the IPv4
207 address from the packets is used as a key to look up in the EFD table,
208 which tells the node where the packet has to be distributed.
213 process_packets(uint32_t port_num __rte_unused, struct rte_mbuf *pkts[],
214 uint16_t rx_count, unsigned int socket_id)
218 efd_value_t data[EFD_BURST_MAX];
219 const void *key_ptrs[EFD_BURST_MAX];
221 struct ipv4_hdr *ipv4_hdr;
222 uint32_t ipv4_dst_ip[EFD_BURST_MAX];
224 for (i = 0; i < rx_count; i++) {
225 /* Handle IPv4 header.*/
226 ipv4_hdr = rte_pktmbuf_mtod_offset(pkts[i], struct ipv4_hdr *,
227 sizeof(struct ether_hdr));
228 ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
229 key_ptrs[i] = (void *)&ipv4_dst_ip[i];
232 rte_efd_lookup_bulk(efd_table, socket_id, rx_count,
233 (const void **) key_ptrs, data);
234 for (i = 0; i < rx_count; i++) {
235 node = (uint8_t) ((uintptr_t)data[i]);
237 if (node >= num_nodes) {
239 * Node is out of range, which means that
240 * flow has not been inserted
242 flow_dist_stats.drop++;
243 rte_pktmbuf_free(pkts[i]);
245 flow_dist_stats.distributed++;
246 enqueue_rx_packet(node, pkts[i]);
250 for (i = 0; i < num_nodes; i++)
254 The burst of packets received is enqueued in temporary buffers (per node),
255 and enqueued in the shared ring between the server and the node.
256 After this, a new burst of packets is received and this process is
262 flush_rx_queue(uint16_t node)
267 if (cl_rx_buf[node].count == 0)
271 if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
272 cl_rx_buf[node].count, NULL) != cl_rx_buf[node].count){
273 for (j = 0; j < cl_rx_buf[node].count; j++)
274 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
275 cl->stats.rx_drop += cl_rx_buf[node].count;
277 cl->stats.rx += cl_rx_buf[node].count;
279 cl_rx_buf[node].count = 0;
282 The second process, the back-end node, receives the packets from the shared
283 ring with the server and send them out, if they belong to the node.
285 At initialization, it attaches to the server process memory, to have
286 access to the shared ring, parameters and statistics.
290 rx_ring = rte_ring_lookup(get_rx_queue_name(node_id));
292 rte_exit(EXIT_FAILURE, "Cannot get RX ring - "
293 "is server process running?\n");
295 mp = rte_mempool_lookup(PKTMBUF_POOL_NAME);
297 rte_exit(EXIT_FAILURE, "Cannot get mempool for mbufs\n");
299 mz = rte_memzone_lookup(MZ_SHARED_INFO);
301 rte_exit(EXIT_FAILURE, "Cannot get port info structure\n");
303 tx_stats = &(info->tx_stats[node_id]);
304 filter_stats = &(info->filter_stats[node_id]);
306 Then, the hash table that contains the flows that will be handled
307 by the node is created and populated.
311 static struct rte_hash *
312 create_hash_table(const struct shared_info *info)
314 uint32_t num_flows_node = info->num_flows / info->num_nodes;
315 char name[RTE_HASH_NAMESIZE];
319 struct rte_hash_parameters hash_params = {
320 .entries = num_flows_node * 2, /* table load = 50% */
321 .key_len = sizeof(uint32_t), /* Store IPv4 dest IP address */
322 .socket_id = rte_socket_id(),
323 .hash_func_init_val = 0,
326 snprintf(name, sizeof(name), "hash_table_%d", node_id);
327 hash_params.name = name;
328 h = rte_hash_create(&hash_params);
331 rte_exit(EXIT_FAILURE,
332 "Problem creating the hash table for node %d\n",
338 populate_hash_table(const struct rte_hash *h, const struct shared_info *info)
343 uint32_t num_flows_node = 0;
344 uint64_t target_node;
346 /* Add flows in table */
347 for (i = 0; i < info->num_flows; i++) {
348 target_node = i % info->num_nodes;
349 if (target_node != node_id)
352 ip_dst = rte_cpu_to_be_32(i);
354 ret = rte_hash_add_key(h, (void *) &ip_dst);
356 rte_exit(EXIT_FAILURE, "Unable to add entry %u "
357 "in hash table\n", i);
363 printf("Hash table: Adding 0x%x keys\n", num_flows_node);
366 After initialization, packets are dequeued from the shared ring
367 (from the server) and, like in the server process,
368 the IPv4 address from the packets is used as a key to look up in the hash table.
369 If there is a hit, packet is stored in a buffer, to be eventually transmitted
370 in one of the enabled ports. If key is not there, packet is dropped, since the
371 flow is not handled by the node.
376 handle_packets(struct rte_hash *h, struct rte_mbuf **bufs, uint16_t num_packets)
378 struct ipv4_hdr *ipv4_hdr;
379 uint32_t ipv4_dst_ip[PKT_READ_SIZE];
380 const void *key_ptrs[PKT_READ_SIZE];
382 int32_t positions[PKT_READ_SIZE] = {0};
384 for (i = 0; i < num_packets; i++) {
385 /* Handle IPv4 header.*/
386 ipv4_hdr = rte_pktmbuf_mtod_offset(bufs[i], struct ipv4_hdr *,
387 sizeof(struct ether_hdr));
388 ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
389 key_ptrs[i] = &ipv4_dst_ip[i];
391 /* Check if packets belongs to any flows handled by this node */
392 rte_hash_lookup_bulk(h, key_ptrs, num_packets, positions);
394 for (i = 0; i < num_packets; i++) {
395 if (likely(positions[i] >= 0)) {
396 filter_stats->passed++;
397 transmit_packet(bufs[i]);
399 filter_stats->drop++;
400 /* Drop packet, as flow is not handled by this node */
401 rte_pktmbuf_free(bufs[i]);
406 Finally, note that both processes updates statistics, such as transmitted, received
407 and dropped packets, which are shown and refreshed by the server app.
412 do_stats_display(void)
415 const char clr[] = {27, '[', '2', 'J', '\0'};
416 const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'};
417 uint64_t port_tx[RTE_MAX_ETHPORTS], port_tx_drop[RTE_MAX_ETHPORTS];
418 uint64_t node_tx[MAX_NODES], node_tx_drop[MAX_NODES];
420 /* to get TX stats, we need to do some summing calculations */
421 memset(port_tx, 0, sizeof(port_tx));
422 memset(port_tx_drop, 0, sizeof(port_tx_drop));
423 memset(node_tx, 0, sizeof(node_tx));
424 memset(node_tx_drop, 0, sizeof(node_tx_drop));
426 for (i = 0; i < num_nodes; i++) {
427 const struct tx_stats *tx = &info->tx_stats[i];
429 for (j = 0; j < info->num_ports; j++) {
430 const uint64_t tx_val = tx->tx[info->id[j]];
431 const uint64_t drop_val = tx->tx_drop[info->id[j]];
433 port_tx[j] += tx_val;
434 port_tx_drop[j] += drop_val;
435 node_tx[i] += tx_val;
436 node_tx_drop[i] += drop_val;
440 /* Clear screen and move to top left */
441 printf("%s%s", clr, topLeft);
445 for (i = 0; i < info->num_ports; i++)
446 printf("Port %u: '%s'\t", (unsigned int)info->id[i],
447 get_printable_mac_addr(info->id[i]));
449 for (i = 0; i < info->num_ports; i++) {
450 printf("Port %u - rx: %9"PRIu64"\t"
452 (unsigned int)info->id[i], info->rx_stats.rx[i],
456 printf("\nSERVER\n");
458 printf("distributed: %9"PRIu64", drop: %9"PRIu64"\n",
459 flow_dist_stats.distributed, flow_dist_stats.drop);
463 for (i = 0; i < num_nodes; i++) {
464 const unsigned long long rx = nodes[i].stats.rx;
465 const unsigned long long rx_drop = nodes[i].stats.rx_drop;
466 const struct filter_stats *filter = &info->filter_stats[i];
468 printf("Node %2u - rx: %9llu, rx_drop: %9llu\n"
469 " tx: %9"PRIu64", tx_drop: %9"PRIu64"\n"
470 " filter_passed: %9"PRIu64", "
471 "filter_drop: %9"PRIu64"\n",
472 i, rx, rx_drop, node_tx[i], node_tx_drop[i],
473 filter->passed, filter->drop);