doc/guides/sample_app_ug/server_node_efd.rst

   1 ..  BSD LICENSE
   2     Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
   3     All rights reserved.
   4
   5     Redistribution and use in source and binary forms, with or without
   6     modification, are permitted provided that the following conditions
   7     are met:
   8
   9     * Redistributions of source code must retain the above copyright
  10     notice, this list of conditions and the following disclaimer.
  11     * Redistributions in binary form must reproduce the above copyright
  12     notice, this list of conditions and the following disclaimer in
  13     the documentation and/or other materials provided with the
  14     distribution.
  15     * Neither the name of Intel Corporation nor the names of its
  16     contributors may be used to endorse or promote products derived
  17     from this software without specific prior written permission.
  18
  19     THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  20     "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  21     LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  22     A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  23     OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  24     SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
  25     LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  26     DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  27     THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  28     (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  29     OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  30
  31 Server-Node EFD Sample Application
  32 ==================================
  33
  34 This sample application demonstrates the use of EFD library as a flow-level
  35 load balancer, for more information about the EFD Library please refer to the
  36 DPDK programmer's guide.
  37
  38 This sample application is a variant of the
  39 :ref:`client-server sample application <multi_process_app>`
  40 where a specific target node is specified for every and each flow
  41 (not in a round-robin fashion as the original load balancing sample application).
  42
  43 Overview
  44 --------
  45
  46 The architecture of the EFD flow-based load balancer sample application is
  47 presented in the following figure.
  48
  49 .. _figure_efd_sample_app_overview:
  50
  51 .. figure:: img/server_node_efd.*
  52
  53    Using EFD as a Flow-Level Load Balancer
  54
  55 As shown in :numref:`figure_efd_sample_app_overview`,
  56 the sample application consists of a front-end node (server)
  57 using the EFD library to create a load-balancing table for flows,
  58 for each flow a target backend worker node is specified. The EFD table does not
  59 store the flow key (unlike a regular hash table), and hence, it can
  60 individually load-balance millions of flows (number of targets * maximum number
  61 of flows fit in a flow table per target) while still fitting in CPU cache.
  62
  63 It should be noted that although they are referred to as nodes, the frontend
  64 server and worker nodes are processes running on the same platform.
  65
  66 Front-end Server
  67 ~~~~~~~~~~~~~~~~
  68
  69 Upon initializing, the frontend server node (process) creates a flow
  70 distributor table (based on the EFD library) which is populated with flow
  71 information and its intended target node.
  72
  73 The sample application assigns a specific target node_id (process) for each of
  74 the IP destination addresses as follows:
  75
  76 .. code-block:: c
  77
  78     node_id = i % num_nodes; /* Target node id is generated */
  79     ip_dst = rte_cpu_to_be_32(i); /* Specific ip destination address is
  80                                      assigned to this target node */
  81
  82 then the pair of <key,target> is inserted into the flow distribution table.
  83
  84 The main loop of the server process receives a burst of packets, then for
  85 each packet, a flow key (IP destination address) is extracted. The flow
  86 distributor table is looked up and the target node id is returned.  Packets are
  87 then enqueued to the specified target node id.
  88
  89 It should be noted that flow distributor table is not a membership test table.
  90 I.e. if the key has already been inserted the target node id will be correct,
  91 but for new keys the flow distributor table will return a value (which can be
  92 valid).
  93
  94 Backend Worker Nodes
  95 ~~~~~~~~~~~~~~~~~~~~
  96
  97 Upon initializing, the worker node (process) creates a flow table (a regular
  98 hash table that stores the key default size 1M flows) which is populated with
  99 only the flow information that is serviced at this node. This flow key is
 100 essential to point out new keys that have not been inserted before.
 101
 102 The worker node's main loop is simply receiving packets then doing a hash table
 103 lookup. If a match occurs then statistics are updated for flows serviced by
 104 this node. If no match is found in the local hash table then this indicates
 105 that this is a new flow, which is dropped.
 106
 107
 108 Compiling the Application
 109 -------------------------
 110
 111 The sequence of steps used to build the application is:
 112
 113 #.  Export the required environment variables:
 114
 115     .. code-block:: console
 116
 117         export RTE_SDK=/path/to/rte_sdk
 118         export RTE_TARGET=x86_64-native-linuxapp-gcc
 119
 120 #.  Build the application executable file:
 121
 122     .. code-block:: console
 123
 124         cd ${RTE_SDK}/examples/server_node_efd/
 125         make
 126
 127     For more details on how to build the DPDK libraries and sample
 128     applications,
 129     please refer to the *DPDK Getting Started Guide.*
 130
 131
 132 Running the Application
 133 -----------------------
 134
 135 The application has two binaries to be run: the front-end server
 136 and the back-end node.
 137
 138 The frontend server (server) has the following command line options::
 139
 140     ./server [EAL options] -- -p PORTMASK -n NUM_NODES -f NUM_FLOWS
 141
 142 Where,
 143
 144 * ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure
 145 * ``-n NUM_NODES:`` Number of back-end nodes that will be used
 146 * ``-f NUM_FLOWS:`` Number of flows to be added in the EFD table (1 million, by default)
 147
 148 The back-end node (node) has the following command line options::
 149
 150     ./node [EAL options] -- -n NODE_ID
 151
 152 Where,
 153
 154 * ``-n NODE_ID:`` Node ID, which cannot be equal or higher than NUM_MODES
 155
 156
 157 First, the server app must be launched, with the number of nodes that will be run.
 158 Once it has been started, the node instances can be run, with different NODE_ID.
 159 These instances have to be run as secondary processes, with ``--proc-type=secondary``
 160 in the EAL options, which will attach to the primary process memory, and therefore,
 161 they can access the queues created by the primary process to distribute packets.
 162
 163 To successfully run the application, the command line used to start the
 164 application has to be in sync with the traffic flows configured on the traffic
 165 generator side.
 166
 167 For examples of application command lines and traffic generator flows, please
 168 refer to the DPDK Test Report. For more details on how to set up and run the
 169 sample applications provided with DPDK package, please refer to the
 170 :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` and
 171 :ref:`DPDK Getting Started Guide for FreeBSD <freebsd_gsg>`.
 172
 173
 174 Explanation
 175 -----------
 176
 177 As described in previous sections, there are two processes in this example.
 178
 179 The first process, the front-end server, creates and populates the EFD table,
 180 which is used to distribute packets to nodes, which the number of flows
 181 specified in the command line (1 million, by default).
 182
 183
 184 .. code-block:: c
 185
 186     static void
 187     create_efd_table(void)
 188     {
 189         uint8_t socket_id = rte_socket_id();
 190
 191         /* create table */
 192         efd_table = rte_efd_create("flow table", num_flows * 2, sizeof(uint32_t),
 193                         1 << socket_id, socket_id);
 194
 195         if (efd_table == NULL)
 196             rte_exit(EXIT_FAILURE, "Problem creating the flow table\n");
 197     }
 198
 199     static void
 200     populate_efd_table(void)
 201     {
 202         unsigned int i;
 203         int32_t ret;
 204         uint32_t ip_dst;
 205         uint8_t socket_id = rte_socket_id();
 206         uint64_t node_id;
 207
 208         /* Add flows in table */
 209         for (i = 0; i < num_flows; i++) {
 210             node_id = i % num_nodes;
 211
 212             ip_dst = rte_cpu_to_be_32(i);
 213             ret = rte_efd_update(efd_table, socket_id,
 214                             (void *)&ip_dst, (efd_value_t)node_id);
 215             if (ret < 0)
 216                 rte_exit(EXIT_FAILURE, "Unable to add entry %u in "
 217                                     "EFD table\n", i);
 218         }
 219
 220         printf("EFD table: Adding 0x%x keys\n", num_flows);
 221     }
 222
 223 After initialization, packets are received from the enabled ports, and the IPv4
 224 address from the packets is used as a key to look up in the EFD table,
 225 which tells the node where the packet has to be distributed.
 226
 227 .. code-block:: c
 228
 229     static void
 230     process_packets(uint32_t port_num __rte_unused, struct rte_mbuf *pkts[],
 231             uint16_t rx_count, unsigned int socket_id)
 232     {
 233         uint16_t i;
 234         uint8_t node;
 235         efd_value_t data[EFD_BURST_MAX];
 236         const void *key_ptrs[EFD_BURST_MAX];
 237
 238         struct ipv4_hdr *ipv4_hdr;
 239         uint32_t ipv4_dst_ip[EFD_BURST_MAX];
 240
 241         for (i = 0; i < rx_count; i++) {
 242             /* Handle IPv4 header.*/
 243             ipv4_hdr = rte_pktmbuf_mtod_offset(pkts[i], struct ipv4_hdr *,
 244                     sizeof(struct ether_hdr));
 245             ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
 246             key_ptrs[i] = (void *)&ipv4_dst_ip[i];
 247         }
 248
 249         rte_efd_lookup_bulk(efd_table, socket_id, rx_count,
 250                     (const void **) key_ptrs, data);
 251         for (i = 0; i < rx_count; i++) {
 252             node = (uint8_t) ((uintptr_t)data[i]);
 253
 254             if (node >= num_nodes) {
 255                 /*
 256                  * Node is out of range, which means that
 257                  * flow has not been inserted
 258                  */
 259                 flow_dist_stats.drop++;
 260                 rte_pktmbuf_free(pkts[i]);
 261             } else {
 262                 flow_dist_stats.distributed++;
 263                 enqueue_rx_packet(node, pkts[i]);
 264             }
 265         }
 266
 267         for (i = 0; i < num_nodes; i++)
 268             flush_rx_queue(i);
 269     }
 270
 271 The burst of packets received is enqueued in temporary buffers (per node),
 272 and enqueued in the shared ring between the server and the node.
 273 After this, a new burst of packets is received and this process is
 274 repeated infinitely.
 275
 276 .. code-block:: c
 277
 278     static void
 279     flush_rx_queue(uint16_t node)
 280     {
 281         uint16_t j;
 282         struct node *cl;
 283
 284         if (cl_rx_buf[node].count == 0)
 285             return;
 286
 287         cl = &nodes[node];
 288         if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
 289                 cl_rx_buf[node].count, NULL) != cl_rx_buf[node].count){
 290             for (j = 0; j < cl_rx_buf[node].count; j++)
 291                 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
 292             cl->stats.rx_drop += cl_rx_buf[node].count;
 293         } else
 294             cl->stats.rx += cl_rx_buf[node].count;
 295
 296         cl_rx_buf[node].count = 0;
 297     }
 298
 299 The second process, the back-end node, receives the packets from the shared
 300 ring with the server and send them out, if they belong to the node.
 301
 302 At initialization, it attaches to the server process memory, to have
 303 access to the shared ring, parameters and statistics.
 304
 305 .. code-block:: c
 306
 307     rx_ring = rte_ring_lookup(get_rx_queue_name(node_id));
 308     if (rx_ring == NULL)
 309         rte_exit(EXIT_FAILURE, "Cannot get RX ring - "
 310                 "is server process running?\n");
 311
 312     mp = rte_mempool_lookup(PKTMBUF_POOL_NAME);
 313     if (mp == NULL)
 314         rte_exit(EXIT_FAILURE, "Cannot get mempool for mbufs\n");
 315
 316     mz = rte_memzone_lookup(MZ_SHARED_INFO);
 317     if (mz == NULL)
 318         rte_exit(EXIT_FAILURE, "Cannot get port info structure\n");
 319     info = mz->addr;
 320     tx_stats = &(info->tx_stats[node_id]);
 321     filter_stats = &(info->filter_stats[node_id]);
 322
 323 Then, the hash table that contains the flows that will be handled
 324 by the node is created and populated.
 325
 326 .. code-block:: c
 327
 328     static struct rte_hash *
 329     create_hash_table(const struct shared_info *info)
 330     {
 331         uint32_t num_flows_node = info->num_flows / info->num_nodes;
 332         char name[RTE_HASH_NAMESIZE];
 333         struct rte_hash *h;
 334
 335         /* create table */
 336         struct rte_hash_parameters hash_params = {
 337             .entries = num_flows_node * 2, /* table load = 50% */
 338             .key_len = sizeof(uint32_t), /* Store IPv4 dest IP address */
 339             .socket_id = rte_socket_id(),
 340             .hash_func_init_val = 0,
 341         };
 342
 343         snprintf(name, sizeof(name), "hash_table_%d", node_id);
 344         hash_params.name = name;
 345         h = rte_hash_create(&hash_params);
 346
 347         if (h == NULL)
 348             rte_exit(EXIT_FAILURE,
 349                     "Problem creating the hash table for node %d\n",
 350                     node_id);
 351         return h;
 352     }
 353
 354     static void
 355     populate_hash_table(const struct rte_hash *h, const struct shared_info *info)
 356     {
 357         unsigned int i;
 358         int32_t ret;
 359         uint32_t ip_dst;
 360         uint32_t num_flows_node = 0;
 361         uint64_t target_node;
 362
 363         /* Add flows in table */
 364         for (i = 0; i < info->num_flows; i++) {
 365             target_node = i % info->num_nodes;
 366             if (target_node != node_id)
 367                 continue;
 368
 369             ip_dst = rte_cpu_to_be_32(i);
 370
 371             ret = rte_hash_add_key(h, (void *) &ip_dst);
 372             if (ret < 0)
 373                 rte_exit(EXIT_FAILURE, "Unable to add entry %u "
 374                         "in hash table\n", i);
 375             else
 376                 num_flows_node++;
 377
 378         }
 379
 380         printf("Hash table: Adding 0x%x keys\n", num_flows_node);
 381     }
 382
 383 After initialization, packets are dequeued from the shared ring
 384 (from the server) and, like in the server process,
 385 the IPv4 address from the packets is used as a key to look up in the hash table.
 386 If there is a hit, packet is stored in a buffer, to be eventually transmitted
 387 in one of the enabled ports. If key is not there, packet is dropped, since the
 388 flow is not handled by the node.
 389
 390 .. code-block:: c
 391
 392     static inline void
 393     handle_packets(struct rte_hash *h, struct rte_mbuf **bufs, uint16_t num_packets)
 394     {
 395         struct ipv4_hdr *ipv4_hdr;
 396         uint32_t ipv4_dst_ip[PKT_READ_SIZE];
 397         const void *key_ptrs[PKT_READ_SIZE];
 398         unsigned int i;
 399         int32_t positions[PKT_READ_SIZE] = {0};
 400
 401         for (i = 0; i < num_packets; i++) {
 402             /* Handle IPv4 header.*/
 403             ipv4_hdr = rte_pktmbuf_mtod_offset(bufs[i], struct ipv4_hdr *,
 404                     sizeof(struct ether_hdr));
 405             ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
 406             key_ptrs[i] = &ipv4_dst_ip[i];
 407         }
 408         /* Check if packets belongs to any flows handled by this node */
 409         rte_hash_lookup_bulk(h, key_ptrs, num_packets, positions);
 410
 411         for (i = 0; i < num_packets; i++) {
 412             if (likely(positions[i] >= 0)) {
 413                 filter_stats->passed++;
 414                 transmit_packet(bufs[i]);
 415             } else {
 416                 filter_stats->drop++;
 417                 /* Drop packet, as flow is not handled by this node */
 418                 rte_pktmbuf_free(bufs[i]);
 419             }
 420         }
 421     }
 422
 423 Finally, note that both processes updates statistics, such as transmitted, received
 424 and dropped packets, which are shown and refreshed by the server app.
 425
 426 .. code-block:: c
 427
 428     static void
 429     do_stats_display(void)
 430     {
 431         unsigned int i, j;
 432         const char clr[] = {27, '[', '2', 'J', '\0'};
 433         const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'};
 434         uint64_t port_tx[RTE_MAX_ETHPORTS], port_tx_drop[RTE_MAX_ETHPORTS];
 435         uint64_t node_tx[MAX_NODES], node_tx_drop[MAX_NODES];
 436
 437         /* to get TX stats, we need to do some summing calculations */
 438         memset(port_tx, 0, sizeof(port_tx));
 439         memset(port_tx_drop, 0, sizeof(port_tx_drop));
 440         memset(node_tx, 0, sizeof(node_tx));
 441         memset(node_tx_drop, 0, sizeof(node_tx_drop));
 442
 443         for (i = 0; i < num_nodes; i++) {
 444             const struct tx_stats *tx = &info->tx_stats[i];
 445
 446             for (j = 0; j < info->num_ports; j++) {
 447                 const uint64_t tx_val = tx->tx[info->id[j]];
 448                 const uint64_t drop_val = tx->tx_drop[info->id[j]];
 449
 450                 port_tx[j] += tx_val;
 451                 port_tx_drop[j] += drop_val;
 452                 node_tx[i] += tx_val;
 453                 node_tx_drop[i] += drop_val;
 454             }
 455         }
 456
 457         /* Clear screen and move to top left */
 458         printf("%s%s", clr, topLeft);
 459
 460         printf("PORTS\n");
 461         printf("-----\n");
 462         for (i = 0; i < info->num_ports; i++)
 463             printf("Port %u: '%s'\t", (unsigned int)info->id[i],
 464                     get_printable_mac_addr(info->id[i]));
 465         printf("\n\n");
 466         for (i = 0; i < info->num_ports; i++) {
 467             printf("Port %u - rx: %9"PRIu64"\t"
 468                     "tx: %9"PRIu64"\n",
 469                     (unsigned int)info->id[i], info->rx_stats.rx[i],
 470                     port_tx[i]);
 471         }
 472
 473         printf("\nSERVER\n");
 474         printf("-----\n");
 475         printf("distributed: %9"PRIu64", drop: %9"PRIu64"\n",
 476                 flow_dist_stats.distributed, flow_dist_stats.drop);
 477
 478         printf("\nNODES\n");
 479         printf("-------\n");
 480         for (i = 0; i < num_nodes; i++) {
 481             const unsigned long long rx = nodes[i].stats.rx;
 482             const unsigned long long rx_drop = nodes[i].stats.rx_drop;
 483             const struct filter_stats *filter = &info->filter_stats[i];
 484
 485             printf("Node %2u - rx: %9llu, rx_drop: %9llu\n"
 486                     "            tx: %9"PRIu64", tx_drop: %9"PRIu64"\n"
 487                     "            filter_passed: %9"PRIu64", "
 488                     "filter_drop: %9"PRIu64"\n",
 489                     i, rx, rx_drop, node_tx[i], node_tx_drop[i],
 490                     filter->passed, filter->drop);
 491         }
 492
 493         printf("\n");
 494     }