doc/guides/prog_guide/efd_lib.rst

   1 ..  BSD LICENSE
   2     Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
   3     All rights reserved.
   4
   5     Redistribution and use in source and binary forms, with or without
   6     modification, are permitted provided that the following conditions
   7     are met:
   8
   9     * Redistributions of source code must retain the above copyright
  10     notice, this list of conditions and the following disclaimer.
  11     * Redistributions in binary form must reproduce the above copyright
  12     notice, this list of conditions and the following disclaimer in
  13     the documentation and/or other materials provided with the
  14     distribution.
  15     * Neither the name of Intel Corporation nor the names of its
  16     contributors may be used to endorse or promote products derived
  17     from this software without specific prior written permission.
  18
  19     THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  20     "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  21     LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  22     A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  23     OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  24     SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
  25     LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  26     DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  27     THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  28     (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  29     OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  30
  31 .. _Efd_Library:
  32
  33 Elastic Flow Distributor Library
  34 ================================
  35
  36 Introduction
  37 ------------
  38
  39 In Data Centers today, clustering and scheduling of distributed workloads
  40 is a very common task. Many workloads require a deterministic
  41 partitioning of a flat key space among a cluster of machines. When a
  42 packet enters the cluster, the ingress node will direct the packet to
  43 its handling node. For example, data-centers with disaggregated storage
  44 use storage metadata tables to forward I/O requests to the correct back end
  45 storage cluster, stateful packet inspection will use match incoming
  46 flows to signatures in flow tables to send incoming packets to their
  47 intended deep packet inspection (DPI) devices, and so on.
  48
  49 EFD is a distributor library that uses perfect hashing to determine a
  50 target/value for a given incoming flow key. It has the following
  51 advantages: first, because it uses perfect hashing it does not store the
  52 key itself and hence lookup performance is not dependent on the key
  53 size. Second, the target/value can be any arbitrary value hence the
  54 system designer and/or operator can better optimize service rates and
  55 inter-cluster network traffic locating. Third, since the storage
  56 requirement is much smaller than a hash-based flow table (i.e. better
  57 fit for CPU cache), EFD can scale to millions of flow keys. Finally,
  58 with the current optimized library implementation, performance is fully
  59 scalable with any number of CPU cores.
  60
  61 Flow Based Distribution
  62 -----------------------
  63
  64 Computation Based Schemes
  65 ~~~~~~~~~~~~~~~~~~~~~~~~~
  66
  67 Flow distribution and/or load balancing can be simply done using a
  68 stateless computation, for instance using round-robin or a simple
  69 computation based on the flow key as an input. For example, a hash
  70 function can be used to direct a certain flow to a target based on
  71 the flow key (e.g. ``h(key) mod n``) where h(key) is the hash value of the
  72 flow key and n is the number of possible targets.
  73
  74 .. _figure_efd1:
  75
  76 .. figure:: img/efd_i1.*
  77
  78   Load Balancing Using Front End Node
  79
  80 In this scheme (:numref:`figure_efd1`), the front end server/distributor/load balancer
  81 extracts the flow key from the input packet and applies a computation to determine where
  82 this flow should be directed. Intuitively, this scheme is very simple
  83 and requires no state to be kept at the front end node, and hence,
  84 storage requirements are minimum.
  85
  86 .. _figure_efd2:
  87
  88 .. figure:: img/efd_i2.*
  89
  90   Consistent Hashing
  91
  92 A widely used flow distributor that belongs to the same category of
  93 computation-based schemes is ``consistent hashing``, shown in :numref:`figure_efd2`.
  94 Target destinations (shown in red) are hashed into the same space as the flow
  95 keys (shown in blue), and keys are mapped to the nearest target in a clockwise
  96 fashion. Dynamically adding and removing targets with consistent hashing
  97 requires only K/n keys to be remapped on average, where K is the number of
  98 keys, and n is the number of targets. In contrast, in a traditional hash-based
  99 scheme, a change in the number of targets causes nearly all keys to be
 100 remapped.
 101
 102 Although computation-based schemes are simple and need very little
 103 storage requirement, they suffer from the drawback that the system
 104 designer/operator can’t fully control the target to assign a specific
 105 key, as this is dictated by the hash function.
 106 Deterministically co-locating of keys together (for example, to minimize
 107 inter-server traffic or to optimize for network traffic conditions,
 108 target load, etc.) is simply not possible.
 109
 110 Flow-Table Based Schemes
 111 ~~~~~~~~~~~~~~~~~~~~~~~~
 112
 113 When using a Flow-Table based scheme to handle flow distribution/load
 114 balancing, in contrast with computation-based schemes, the system designer
 115 has the flexibility of assigning a given flow to any given
 116 target. The flow table (e.g. DPDK RTE Hash Library) will simply store
 117 both the flow key and the target value.
 118
 119 .. _figure_efd3:
 120
 121 .. figure:: img/efd_i3.*
 122
 123   Table Based Flow Distribution
 124
 125 As shown in :numref:`figure_efd3`, when doing a lookup, the flow-table
 126 is indexed with the hash of the flow key and the keys (more than one is possible,
 127 because of hash collision) stored in this index and corresponding values
 128 are retrieved. The retrieved key(s) is matched with the input flow key
 129 and if there is a match the value (target id) is returned.
 130
 131 The drawback of using a hash table for flow distribution/load balancing
 132 is the storage requirement, since the flow table need to store keys,
 133 signatures and target values. This doesn't allow this scheme to scale to
 134 millions of flow keys. Large tables will usually not fit in
 135 the CPU cache, and hence, the lookup performance is degraded because of
 136 the latency to access the main memory.
 137
 138 EFD Based Scheme
 139 ~~~~~~~~~~~~~~~~
 140
 141 EFD combines the advantages of both flow-table based and computation-based
 142 schemes. It doesn't require the large storage necessary for
 143 flow-table based schemes (because EFD doesn't store the key as explained
 144 below), and it supports any arbitrary value for any given key.
 145
 146 .. _figure_efd4:
 147
 148 .. figure:: img/efd_i4.*
 149
 150   Searching for Perfect Hash Function
 151
 152 The basic idea of EFD is when a given key is to be inserted, a family of
 153 hash functions is searched until the correct hash function that maps the
 154 input key to the correct value is found, as shown in :numref:`figure_efd4`.
 155 However, rather than explicitly storing all keys and their associated values,
 156 EFD stores only indices of hash functions that map keys to values, and
 157 thereby consumes much less space than conventional flow-based tables.
 158 The lookup operation is very simple, similar to a computational-based
 159 scheme: given an input key the lookup operation is reduced to hashing
 160 that key with the correct hash function.
 161
 162 .. _figure_efd5:
 163
 164 .. figure:: img/efd_i5.*
 165
 166   Divide and Conquer for Millions of Keys
 167
 168 Intuitively, finding a hash function that maps each of a large number
 169 (millions) of input keys to the correct output value is effectively
 170 impossible, as a result EFD, as shown in :numref:`figure_efd5`,
 171 breaks the problem into smaller pieces (divide and conquer).
 172 EFD divides the entire input key set into many small groups.
 173 Each group consists of approximately 20-28 keys (a configurable parameter
 174 for the library), then, for each small group, a brute force search to find
 175 a hash function that produces the correct outputs for each key in the group.
 176
 177 It should be mentioned that, since the online lookup table for EFD
 178 doesn't store the key itself, the size of the EFD table is independent
 179 of the key size and hence EFD lookup performance which is almost
 180 constant irrespective of the length of the key which is a highly
 181 desirable feature especially for longer keys.
 182
 183 In summary, EFD is a set separation data structure that supports millions of
 184 keys. It is used to distribute a given key to an intended target. By itself
 185 EFD is not a FIB data structure with an exact match the input flow key.
 186
 187 .. _Efd_example:
 188
 189 Example of EFD Library Usage
 190 ----------------------------
 191
 192 EFD can be used along the data path of many network functions and middleboxes.
 193 As previously mentioned, it can used as an index table for
 194 <key,value> pairs, meta-data for objects, a flow-level load balancer, etc.
 195 :numref:`figure_efd6` shows an example of using EFD as a flow-level load
 196 balancer, where flows are received at a front end server before being forwarded
 197 to the target back end server for processing. The system designer would
 198 deterministically co-locate flows together in order to minimize cross-server
 199 interaction.
 200 (For example, flows requesting certain webpage objects are co-located
 201 together, to minimize forwarding of common objects across servers).
 202
 203 .. _figure_efd6:
 204
 205 .. figure:: img/efd_i6.*
 206
 207   EFD as a Flow-Level Load Balancer
 208
 209 As shown in :numref:`figure_efd6`, the front end server will have an EFD table that
 210 stores for each group what is the perfect hash index that satisfies the
 211 correct output. Because the table size is small and fits in cache (since
 212 keys are not stored), it sustains a large number of flows (N*X, where N
 213 is the maximum number of flows served by each back end server of the X
 214 possible targets).
 215
 216 With an input flow key, the group id is computed (for example, using
 217 last few bits of CRC hash) and then the EFD table is indexed with the
 218 group id to retrieve the corresponding hash index to use. Once the index
 219 is retrieved the key is hashed using this hash function and the result
 220 will be the intended correct target where this flow is supposed to be
 221 processed.
 222
 223 It should be noted that as a result of EFD not matching the exact key but
 224 rather distributing the flows to a target back end node based on the
 225 perfect hash index, a key that has not been inserted before
 226 will be distributed to a valid target. Hence, a local table which stores
 227 the flows served at each node is used and is
 228 exact matched with the input key to rule out new never seen before
 229 flows.
 230
 231 .. _Efd_api:
 232
 233 Library API Overview
 234 --------------------
 235
 236 The EFD library API is created with a very similar semantics of a
 237 hash-index or a flow table. The application creates an EFD table for a
 238 given maximum number of flows, a function is called to insert a flow key
 239 with a specific target value, and another function is used to retrieve
 240 target values for a given individual flow key or a bulk of keys.
 241
 242 EFD Table Create
 243 ~~~~~~~~~~~~~~~~
 244
 245 The function ``rte_efd_create()`` is used to create and return a pointer
 246 to an EFD table that is sized to hold up to num_flows key.
 247 The online version of the EFD table (the one that does
 248 not store the keys and is used for lookups) will be allocated and
 249 created in the last level cache (LLC) of the socket defined by the
 250 online_socket_bitmask, while the offline EFD table (the one that
 251 stores the keys and is used for key inserts and for computing the
 252 perfect hashing) is allocated and created in the LLC of the socket
 253 defined by offline_socket_bitmask. It should be noted, that for
 254 highest performance the socket id should match that where the thread is
 255 running, i.e. the online EFD lookup table should be created on the same
 256 socket as where the lookup thread is running.
 257
 258 EFD Insert and Update
 259 ~~~~~~~~~~~~~~~~~~~~~
 260
 261 The EFD function to insert a key or update a key to a new value is
 262 ``rte_efd_update()``. This function will update an existing key to
 263 a new value (target) if the key has already been inserted
 264 before, or will insert the <key,value> pair if this key has not been inserted
 265 before. It will return 0 upon success. It will return
 266 ``EFD_UPDATE_WARN_GROUP_FULL (1)`` if the operation is insert, and the
 267 last available space in the key's group was just used. It will return
 268 ``EFD_UPDATE_FAILED (2)`` when the insertion or update has failed (either it
 269 failed to find a suitable perfect hash or the group was full). The function
 270 will return ``EFD_UPDATE_NO_CHANGE (3)`` if there is no change to the EFD
 271 table (i.e, same value already exists).
 272
 273 .. Note::
 274
 275    This function is not multi-thread safe and should only be called
 276    from one thread.
 277
 278 EFD Lookup
 279 ~~~~~~~~~~
 280
 281 To lookup a certain key in an EFD table, the function ``rte_efd_lookup()``
 282 is used to return the value associated with single key.
 283 As previously mentioned, if the key has been inserted, the correct value
 284 inserted is returned, if the key has not been inserted before,
 285 a ‘random’ value (based on hashing of the key) is returned.
 286 For better performance and to decrease the overhead of
 287 function calls per key, it is always recommended to use a bulk lookup
 288 function (simultaneous lookup of multiple keys) instead of a single key
 289 lookup function. ``rte_efd_lookup_bulk()`` is the bulk lookup function,
 290 that looks up num_keys simultaneously stored in the key_list and the
 291 corresponding return values will be returned in the value_list.
 292
 293 .. Note::
 294
 295    This function is multi-thread safe, but there should not be other threads
 296    writing in the EFD table, unless locks are used.
 297
 298 EFD Delete
 299 ~~~~~~~~~~
 300
 301 To delete a certain key in an EFD table, the function
 302 ``rte_efd_delete()`` can be used. The function returns zero upon success
 303 when the key has been found and deleted. Socket_id is the parameter to
 304 use to lookup the existing value, which is ideally the caller's socket id.
 305 The previous value associated with this key will be returned
 306 in the prev_value argument.
 307
 308 .. Note::
 309
 310    This function is not multi-thread safe and should only be called
 311    from one thread.
 312
 313 .. _Efd_internals:
 314
 315 Library Internals
 316 -----------------
 317
 318 This section provides the brief high-level idea and an overview
 319 of the library internals to accompany the RFC. The intent of this
 320 section is to explain to readers the high-level implementation of
 321 insert, lookup and group rebalancing in the EFD library.
 322
 323 Insert Function Internals
 324 ~~~~~~~~~~~~~~~~~~~~~~~~~
 325
 326 As previously mentioned the EFD divides the whole set of keys into
 327 groups of a manageable size (e.g. 28 keys) and then searches for the
 328 perfect hash that satisfies the intended target value for each key. EFD
 329 stores two version of the <key,value> table:
 330
 331 -  Offline Version (in memory): Only used for the insertion/update
 332    operation, which is less frequent than the lookup operation. In the
 333    offline version the exact keys for each group is stored. When a new
 334    key is added, the hash function is updated that will satisfy the
 335    value for the new key together with the all old keys already inserted
 336    in this group.
 337
 338 -  Online Version (in cache): Used for the frequent lookup operation. In
 339    the online version, as previously mentioned, the keys are not stored
 340    but rather only the hash index for each group.
 341
 342 .. _figure_efd7:
 343
 344 .. figure:: img/efd_i7.*
 345
 346   Group Assignment
 347
 348 :numref:`figure_efd7` depicts the group assignment for 7 flow keys as an example.
 349 Given a flow key, a hash function (in our implementation CRC hash) is
 350 used to get the group id. As shown in the figure, the groups can be
 351 unbalanced. (We highlight group rebalancing further below).
 352
 353 .. _figure_efd8:
 354
 355 .. figure:: img/efd_i8.*
 356
 357   Perfect Hash Search - Assigned Keys & Target Value
 358
 359 Focusing on one group that has four keys, :numref:`figure_efd8` depicts the search
 360 algorithm to find the perfect hash function. Assuming that the target
 361 value bit for the keys is as shown in the figure, then the online EFD
 362 table will store a 16 bit hash index and 16 bit lookup table per group
 363 per value bit.
 364
 365 .. _figure_efd9:
 366
 367 .. figure:: img/efd_i9.*
 368
 369   Perfect Hash Search - Satisfy Target Values
 370
 371 For a given keyX, a hash function ``(h(keyX, seed1) + index * h(keyX, seed2))``
 372 is used to point to certain bit index in the 16bit lookup_table value,
 373 as shown in :numref:`figure_efd9`.
 374 The insert function will brute force search for all possible values for the
 375 hash index until a non conflicting lookup_table is found.
 376
 377 .. _figure_efd10:
 378
 379 .. figure:: img/efd_i10.*
 380
 381   Finding Hash Index for Conflict Free lookup_table
 382
 383 For example, since both key3 and key7 have a target bit value of 1, it
 384 is okay if the hash function of both keys point to the same bit in the
 385 lookup table. A conflict will occur if a hash index is used that maps
 386 both Key4 and Key7 to the same index in the lookup_table,
 387 as shown in :numref:`figure_efd10`, since their target value bit are not the same.
 388 Once a hash index is found that produces a lookup_table with no
 389 contradictions, this index is stored for this group. This procedure is
 390 repeated for each bit of target value.
 391
 392 Lookup Function Internals
 393 ~~~~~~~~~~~~~~~~~~~~~~~~~
 394
 395 The design principle of EFD is that lookups are much more frequent than
 396 inserts, and hence, EFD's design optimizes for the lookups which are
 397 faster and much simpler than the slower insert procedure (inserts are
 398 slow, because of perfect hash search as previously discussed).
 399
 400 .. _figure_efd11:
 401
 402 .. figure:: img/efd_i11.*
 403
 404   EFD Lookup Operation
 405
 406 :numref:`figure_efd11` depicts the lookup operation for EFD. Given an input key,
 407 the group id is computed (using CRC hash) and then the hash index for this
 408 group is retrieved from the EFD table. Using the retrieved hash index,
 409 the hash function ``h(key, seed1) + index *h(key, seed2)`` is used which will
 410 result in an index in the lookup_table, the bit corresponding to this
 411 index will be the target value bit. This procedure is repeated for each
 412 bit of the target value.
 413
 414 Group Rebalancing Function Internals
 415 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 416
 417 When discussing EFD inserts and lookups, the discussion is simplified by
 418 assuming that a group id is simply a result of hash function. However,
 419 since hashing in general is not perfect and will not always produce a
 420 uniform output, this simplified assumption will lead to unbalanced
 421 groups, i.e., some group will have more keys than other groups.
 422 Typically, and to minimize insert time with an increasing number of keys,
 423 it is preferable that all groups will have a balanced number of keys, so
 424 the brute force search for the perfect hash terminates with a valid hash
 425 index. In order to achieve this target, groups are rebalanced during
 426 runtime inserts, and keys are moved around from a busy group to a less
 427 crowded group as the more keys are inserted.
 428
 429 .. _figure_efd12:
 430
 431 .. figure:: img/efd_i12.*
 432
 433   Runtime Group Rebalancing
 434
 435 :numref:`figure_efd12` depicts the high level idea of group rebalancing, given an
 436 input key the hash result is split into two parts a chunk id and 8-bit
 437 bin id. A chunk contains 64 different groups and 256 bins (i.e. for any
 438 given bin it can map to 4 distinct groups). When a key is inserted, the
 439 bin id is computed, for example in :numref:`figure_efd12` bin_id=2,
 440 and since each bin can be mapped to one of four different groups (2 bit storage),
 441 the four possible mappings are evaluated and the one that will result in a
 442 balanced key distribution across these four is selected the mapping result
 443 is stored in these two bits.
 444
 445
 446 .. _Efd_references:
 447
 448 References
 449 -----------
 450
 451 1- EFD is based on collaborative research work between Intel and
 452 Carnegie Mellon University (CMU), interested readers can refer to the paper
 453 “Scaling Up Clustered Network Appliances with ScaleBricks;” Dong Zhou et al.
 454 at SIGCOMM 2015 (`http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p241.pdf`)
 455 for more information.