doc/guides/sample_app_ug/server_node_efd.rst

   1 ..  BSD LICENSE
   2     Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
   3     All rights reserved.
   4
   5     Redistribution and use in source and binary forms, with or without
   6     modification, are permitted provided that the following conditions
   7     are met:
   8
   9     * Redistributions of source code must retain the above copyright
  10     notice, this list of conditions and the following disclaimer.
  11     * Redistributions in binary form must reproduce the above copyright
  12     notice, this list of conditions and the following disclaimer in
  13     the documentation and/or other materials provided with the
  14     distribution.
  15     * Neither the name of Intel Corporation nor the names of its
  16     contributors may be used to endorse or promote products derived
  17     from this software without specific prior written permission.
  18
  19     THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  20     "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  21     LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  22     A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  23     OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  24     SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
  25     LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  26     DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  27     THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  28     (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  29     OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  30
  31 Server-Node EFD Sample Application
  32 ==================================
  33
  34 This sample application demonstrates the use of EFD library as a flow-level
  35 load balancer, for more information about the EFD Library please refer to the
  36 DPDK programmer's guide.
  37
  38 This sample application is a variant of the
  39 :ref:`client-server sample application <multi_process_app>`
  40 where a specific target node is specified for every and each flow
  41 (not in a round-robin fashion as the original load balancing sample application).
  42
  43 Overview
  44 --------
  45
  46 The architecture of the EFD flow-based load balancer sample application is
  47 presented in the following figure.
  48
  49 .. _figure_efd_sample_app_overview:
  50
  51 .. figure:: img/server_node_efd.*
  52
  53    Using EFD as a Flow-Level Load Balancer
  54
  55 As shown in :numref:`figure_efd_sample_app_overview`,
  56 the sample application consists of a front-end node (server)
  57 using the EFD library to create a load-balancing table for flows,
  58 for each flow a target backend worker node is specified. The EFD table does not
  59 store the flow key (unlike a regular hash table), and hence, it can
  60 individually load-balance millions of flows (number of targets * maximum number
  61 of flows fit in a flow table per target) while still fitting in CPU cache.
  62
  63 It should be noted that although they are referred to as nodes, the frontend
  64 server and worker nodes are processes running on the same platform.
  65
  66 Front-end Server
  67 ~~~~~~~~~~~~~~~~
  68
  69 Upon initializing, the frontend server node (process) creates a flow
  70 distributor table (based on the EFD library) which is populated with flow
  71 information and its intended target node.
  72
  73 The sample application assigns a specific target node_id (process) for each of
  74 the IP destination addresses as follows:
  75
  76 .. code-block:: c
  77
  78     node_id = i % num_nodes; /* Target node id is generated */
  79     ip_dst = rte_cpu_to_be_32(i); /* Specific ip destination address is
  80                                      assigned to this target node */
  81
  82 then the pair of <key,target> is inserted into the flow distribution table.
  83
  84 The main loop of the server process receives a burst of packets, then for
  85 each packet, a flow key (IP destination address) is extracted. The flow
  86 distributor table is looked up and the target node id is returned.  Packets are
  87 then enqueued to the specified target node id.
  88
  89 It should be noted that flow distributor table is not a membership test table.
  90 I.e. if the key has already been inserted the target node id will be correct,
  91 but for new keys the flow distributor table will return a value (which can be
  92 valid).
  93
  94 Backend Worker Nodes
  95 ~~~~~~~~~~~~~~~~~~~~
  96
  97 Upon initializing, the worker node (process) creates a flow table (a regular
  98 hash table that stores the key default size 1M flows) which is populated with
  99 only the flow information that is serviced at this node. This flow key is
 100 essential to point out new keys that have not been inserted before.
 101
 102 The worker node's main loop is simply receiving packets then doing a hash table
 103 lookup. If a match occurs then statistics are updated for flows serviced by
 104 this node. If no match is found in the local hash table then this indicates
 105 that this is a new flow, which is dropped.
 106
 107
 108 Compiling the Application
 109 -------------------------
 110
 111 To compile the sample application see :doc:`compiling`.
 112
 113 The application is located in the ``server_node_efd`` sub-directory.
 114
 115 Running the Application
 116 -----------------------
 117
 118 The application has two binaries to be run: the front-end server
 119 and the back-end node.
 120
 121 The frontend server (server) has the following command line options::
 122
 123     ./server [EAL options] -- -p PORTMASK -n NUM_NODES -f NUM_FLOWS
 124
 125 Where,
 126
 127 * ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure
 128 * ``-n NUM_NODES:`` Number of back-end nodes that will be used
 129 * ``-f NUM_FLOWS:`` Number of flows to be added in the EFD table (1 million, by default)
 130
 131 The back-end node (node) has the following command line options::
 132
 133     ./node [EAL options] -- -n NODE_ID
 134
 135 Where,
 136
 137 * ``-n NODE_ID:`` Node ID, which cannot be equal or higher than NUM_MODES
 138
 139
 140 First, the server app must be launched, with the number of nodes that will be run.
 141 Once it has been started, the node instances can be run, with different NODE_ID.
 142 These instances have to be run as secondary processes, with ``--proc-type=secondary``
 143 in the EAL options, which will attach to the primary process memory, and therefore,
 144 they can access the queues created by the primary process to distribute packets.
 145
 146 To successfully run the application, the command line used to start the
 147 application has to be in sync with the traffic flows configured on the traffic
 148 generator side.
 149
 150 For examples of application command lines and traffic generator flows, please
 151 refer to the DPDK Test Report. For more details on how to set up and run the
 152 sample applications provided with DPDK package, please refer to the
 153 :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` and
 154 :ref:`DPDK Getting Started Guide for FreeBSD <freebsd_gsg>`.
 155
 156
 157 Explanation
 158 -----------
 159
 160 As described in previous sections, there are two processes in this example.
 161
 162 The first process, the front-end server, creates and populates the EFD table,
 163 which is used to distribute packets to nodes, which the number of flows
 164 specified in the command line (1 million, by default).
 165
 166
 167 .. code-block:: c
 168
 169     static void
 170     create_efd_table(void)
 171     {
 172         uint8_t socket_id = rte_socket_id();
 173
 174         /* create table */
 175         efd_table = rte_efd_create("flow table", num_flows * 2, sizeof(uint32_t),
 176                         1 << socket_id, socket_id);
 177
 178         if (efd_table == NULL)
 179             rte_exit(EXIT_FAILURE, "Problem creating the flow table\n");
 180     }
 181
 182     static void
 183     populate_efd_table(void)
 184     {
 185         unsigned int i;
 186         int32_t ret;
 187         uint32_t ip_dst;
 188         uint8_t socket_id = rte_socket_id();
 189         uint64_t node_id;
 190
 191         /* Add flows in table */
 192         for (i = 0; i < num_flows; i++) {
 193             node_id = i % num_nodes;
 194
 195             ip_dst = rte_cpu_to_be_32(i);
 196             ret = rte_efd_update(efd_table, socket_id,
 197                             (void *)&ip_dst, (efd_value_t)node_id);
 198             if (ret < 0)
 199                 rte_exit(EXIT_FAILURE, "Unable to add entry %u in "
 200                                     "EFD table\n", i);
 201         }
 202
 203         printf("EFD table: Adding 0x%x keys\n", num_flows);
 204     }
 205
 206 After initialization, packets are received from the enabled ports, and the IPv4
 207 address from the packets is used as a key to look up in the EFD table,
 208 which tells the node where the packet has to be distributed.
 209
 210 .. code-block:: c
 211
 212     static void
 213     process_packets(uint32_t port_num __rte_unused, struct rte_mbuf *pkts[],
 214             uint16_t rx_count, unsigned int socket_id)
 215     {
 216         uint16_t i;
 217         uint8_t node;
 218         efd_value_t data[EFD_BURST_MAX];
 219         const void *key_ptrs[EFD_BURST_MAX];
 220
 221         struct ipv4_hdr *ipv4_hdr;
 222         uint32_t ipv4_dst_ip[EFD_BURST_MAX];
 223
 224         for (i = 0; i < rx_count; i++) {
 225             /* Handle IPv4 header.*/
 226             ipv4_hdr = rte_pktmbuf_mtod_offset(pkts[i], struct ipv4_hdr *,
 227                     sizeof(struct ether_hdr));
 228             ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
 229             key_ptrs[i] = (void *)&ipv4_dst_ip[i];
 230         }
 231
 232         rte_efd_lookup_bulk(efd_table, socket_id, rx_count,
 233                     (const void **) key_ptrs, data);
 234         for (i = 0; i < rx_count; i++) {
 235             node = (uint8_t) ((uintptr_t)data[i]);
 236
 237             if (node >= num_nodes) {
 238                 /*
 239                  * Node is out of range, which means that
 240                  * flow has not been inserted
 241                  */
 242                 flow_dist_stats.drop++;
 243                 rte_pktmbuf_free(pkts[i]);
 244             } else {
 245                 flow_dist_stats.distributed++;
 246                 enqueue_rx_packet(node, pkts[i]);
 247             }
 248         }
 249
 250         for (i = 0; i < num_nodes; i++)
 251             flush_rx_queue(i);
 252     }
 253
 254 The burst of packets received is enqueued in temporary buffers (per node),
 255 and enqueued in the shared ring between the server and the node.
 256 After this, a new burst of packets is received and this process is
 257 repeated infinitely.
 258
 259 .. code-block:: c
 260
 261     static void
 262     flush_rx_queue(uint16_t node)
 263     {
 264         uint16_t j;
 265         struct node *cl;
 266
 267         if (cl_rx_buf[node].count == 0)
 268             return;
 269
 270         cl = &nodes[node];
 271         if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
 272                 cl_rx_buf[node].count, NULL) != cl_rx_buf[node].count){
 273             for (j = 0; j < cl_rx_buf[node].count; j++)
 274                 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
 275             cl->stats.rx_drop += cl_rx_buf[node].count;
 276         } else
 277             cl->stats.rx += cl_rx_buf[node].count;
 278
 279         cl_rx_buf[node].count = 0;
 280     }
 281
 282 The second process, the back-end node, receives the packets from the shared
 283 ring with the server and send them out, if they belong to the node.
 284
 285 At initialization, it attaches to the server process memory, to have
 286 access to the shared ring, parameters and statistics.
 287
 288 .. code-block:: c
 289
 290     rx_ring = rte_ring_lookup(get_rx_queue_name(node_id));
 291     if (rx_ring == NULL)
 292         rte_exit(EXIT_FAILURE, "Cannot get RX ring - "
 293                 "is server process running?\n");
 294
 295     mp = rte_mempool_lookup(PKTMBUF_POOL_NAME);
 296     if (mp == NULL)
 297         rte_exit(EXIT_FAILURE, "Cannot get mempool for mbufs\n");
 298
 299     mz = rte_memzone_lookup(MZ_SHARED_INFO);
 300     if (mz == NULL)
 301         rte_exit(EXIT_FAILURE, "Cannot get port info structure\n");
 302     info = mz->addr;
 303     tx_stats = &(info->tx_stats[node_id]);
 304     filter_stats = &(info->filter_stats[node_id]);
 305
 306 Then, the hash table that contains the flows that will be handled
 307 by the node is created and populated.
 308
 309 .. code-block:: c
 310
 311     static struct rte_hash *
 312     create_hash_table(const struct shared_info *info)
 313     {
 314         uint32_t num_flows_node = info->num_flows / info->num_nodes;
 315         char name[RTE_HASH_NAMESIZE];
 316         struct rte_hash *h;
 317
 318         /* create table */
 319         struct rte_hash_parameters hash_params = {
 320             .entries = num_flows_node * 2, /* table load = 50% */
 321             .key_len = sizeof(uint32_t), /* Store IPv4 dest IP address */
 322             .socket_id = rte_socket_id(),
 323             .hash_func_init_val = 0,
 324         };
 325
 326         snprintf(name, sizeof(name), "hash_table_%d", node_id);
 327         hash_params.name = name;
 328         h = rte_hash_create(&hash_params);
 329
 330         if (h == NULL)
 331             rte_exit(EXIT_FAILURE,
 332                     "Problem creating the hash table for node %d\n",
 333                     node_id);
 334         return h;
 335     }
 336
 337     static void
 338     populate_hash_table(const struct rte_hash *h, const struct shared_info *info)
 339     {
 340         unsigned int i;
 341         int32_t ret;
 342         uint32_t ip_dst;
 343         uint32_t num_flows_node = 0;
 344         uint64_t target_node;
 345
 346         /* Add flows in table */
 347         for (i = 0; i < info->num_flows; i++) {
 348             target_node = i % info->num_nodes;
 349             if (target_node != node_id)
 350                 continue;
 351
 352             ip_dst = rte_cpu_to_be_32(i);
 353
 354             ret = rte_hash_add_key(h, (void *) &ip_dst);
 355             if (ret < 0)
 356                 rte_exit(EXIT_FAILURE, "Unable to add entry %u "
 357                         "in hash table\n", i);
 358             else
 359                 num_flows_node++;
 360
 361         }
 362
 363         printf("Hash table: Adding 0x%x keys\n", num_flows_node);
 364     }
 365
 366 After initialization, packets are dequeued from the shared ring
 367 (from the server) and, like in the server process,
 368 the IPv4 address from the packets is used as a key to look up in the hash table.
 369 If there is a hit, packet is stored in a buffer, to be eventually transmitted
 370 in one of the enabled ports. If key is not there, packet is dropped, since the
 371 flow is not handled by the node.
 372
 373 .. code-block:: c
 374
 375     static inline void
 376     handle_packets(struct rte_hash *h, struct rte_mbuf **bufs, uint16_t num_packets)
 377     {
 378         struct ipv4_hdr *ipv4_hdr;
 379         uint32_t ipv4_dst_ip[PKT_READ_SIZE];
 380         const void *key_ptrs[PKT_READ_SIZE];
 381         unsigned int i;
 382         int32_t positions[PKT_READ_SIZE] = {0};
 383
 384         for (i = 0; i < num_packets; i++) {
 385             /* Handle IPv4 header.*/
 386             ipv4_hdr = rte_pktmbuf_mtod_offset(bufs[i], struct ipv4_hdr *,
 387                     sizeof(struct ether_hdr));
 388             ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
 389             key_ptrs[i] = &ipv4_dst_ip[i];
 390         }
 391         /* Check if packets belongs to any flows handled by this node */
 392         rte_hash_lookup_bulk(h, key_ptrs, num_packets, positions);
 393
 394         for (i = 0; i < num_packets; i++) {
 395             if (likely(positions[i] >= 0)) {
 396                 filter_stats->passed++;
 397                 transmit_packet(bufs[i]);
 398             } else {
 399                 filter_stats->drop++;
 400                 /* Drop packet, as flow is not handled by this node */
 401                 rte_pktmbuf_free(bufs[i]);
 402             }
 403         }
 404     }
 405
 406 Finally, note that both processes updates statistics, such as transmitted, received
 407 and dropped packets, which are shown and refreshed by the server app.
 408
 409 .. code-block:: c
 410
 411     static void
 412     do_stats_display(void)
 413     {
 414         unsigned int i, j;
 415         const char clr[] = {27, '[', '2', 'J', '\0'};
 416         const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'};
 417         uint64_t port_tx[RTE_MAX_ETHPORTS], port_tx_drop[RTE_MAX_ETHPORTS];
 418         uint64_t node_tx[MAX_NODES], node_tx_drop[MAX_NODES];
 419
 420         /* to get TX stats, we need to do some summing calculations */
 421         memset(port_tx, 0, sizeof(port_tx));
 422         memset(port_tx_drop, 0, sizeof(port_tx_drop));
 423         memset(node_tx, 0, sizeof(node_tx));
 424         memset(node_tx_drop, 0, sizeof(node_tx_drop));
 425
 426         for (i = 0; i < num_nodes; i++) {
 427             const struct tx_stats *tx = &info->tx_stats[i];
 428
 429             for (j = 0; j < info->num_ports; j++) {
 430                 const uint64_t tx_val = tx->tx[info->id[j]];
 431                 const uint64_t drop_val = tx->tx_drop[info->id[j]];
 432
 433                 port_tx[j] += tx_val;
 434                 port_tx_drop[j] += drop_val;
 435                 node_tx[i] += tx_val;
 436                 node_tx_drop[i] += drop_val;
 437             }
 438         }
 439
 440         /* Clear screen and move to top left */
 441         printf("%s%s", clr, topLeft);
 442
 443         printf("PORTS\n");
 444         printf("-----\n");
 445         for (i = 0; i < info->num_ports; i++)
 446             printf("Port %u: '%s'\t", (unsigned int)info->id[i],
 447                     get_printable_mac_addr(info->id[i]));
 448         printf("\n\n");
 449         for (i = 0; i < info->num_ports; i++) {
 450             printf("Port %u - rx: %9"PRIu64"\t"
 451                     "tx: %9"PRIu64"\n",
 452                     (unsigned int)info->id[i], info->rx_stats.rx[i],
 453                     port_tx[i]);
 454         }
 455
 456         printf("\nSERVER\n");
 457         printf("-----\n");
 458         printf("distributed: %9"PRIu64", drop: %9"PRIu64"\n",
 459                 flow_dist_stats.distributed, flow_dist_stats.drop);
 460
 461         printf("\nNODES\n");
 462         printf("-------\n");
 463         for (i = 0; i < num_nodes; i++) {
 464             const unsigned long long rx = nodes[i].stats.rx;
 465             const unsigned long long rx_drop = nodes[i].stats.rx_drop;
 466             const struct filter_stats *filter = &info->filter_stats[i];
 467
 468             printf("Node %2u - rx: %9llu, rx_drop: %9llu\n"
 469                     "            tx: %9"PRIu64", tx_drop: %9"PRIu64"\n"
 470                     "            filter_passed: %9"PRIu64", "
 471                     "filter_drop: %9"PRIu64"\n",
 472                     i, rx, rx_drop, node_tx[i], node_tx_drop[i],
 473                     filter->passed, filter->drop);
 474         }
 475
 476         printf("\n");
 477     }