doc/guides/prog_guide/member_lib.rst

   1 ..  BSD LICENSE
   2     Copyright(c) 2017 Intel Corporation. All rights reserved.
   3     All rights reserved.
   4
   5     Redistribution and use in source and binary forms, with or without
   6     modification, are permitted provided that the following conditions
   7     are met:
   8
   9     * Redistributions of source code must retain the above copyright
  10     notice, this list of conditions and the following disclaimer.
  11     * Redistributions in binary form must reproduce the above copyright
  12     notice, this list of conditions and the following disclaimer in
  13     the documentation and/or other materials provided with the
  14     distribution.
  15     * Neither the name of Intel Corporation nor the names of its
  16     contributors may be used to endorse or promote products derived
  17     from this software without specific prior written permission.
  18
  19     THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  20     "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  21     LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  22     A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  23     OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  24     SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
  25     LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  26     DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  27     THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  28     (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  29     OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  30
  31
  32 .. _member_library:
  33
  34 Membership Library
  35 ==================
  36
  37 Introduction
  38 ------------
  39
  40 The DPDK Membership Library provides an API for DPDK applications to insert a
  41 new member, delete an existing member, or query the existence of a member in a
  42 given set, or a group of sets. For the case of a group of sets, the library
  43 will return not only whether the element has been inserted before in one of
  44 the sets but also which set it belongs to.  The Membership Library is an
  45 extension and generalization of a traditional filter structure (for example
  46 Bloom Filter [Member-bloom]) that has multiple usages in a wide variety of
  47 workloads and applications. In general, the Membership Library is a data
  48 structure that provides a "set-summary" on whether a member belongs to a set,
  49 and as discussed in detail later, there are two advantages of using such a
  50 set-summary rather than operating on a "full-blown" complete list of elements:
  51 first, it has a much smaller storage requirement than storing the whole list of
  52 elements themselves, and secondly checking an element membership (or other
  53 operations) in this set-summary is much faster than checking it for the
  54 original full-blown complete list of elements.
  55
  56 We use the term "Set-Summary" in this guide to refer to the space-efficient,
  57 probabilistic membership data structure that is provided by the library. A
  58 membership test for an element will return the set this element belongs to or
  59 that the element is "not-found" with very high probability of accuracy. Set-summary
  60 is a fundamental data aggregation component that can be used in many network
  61 (and other) applications. It is a crucial structure to address performance and
  62 scalability issues of diverse network applications including overlay networks,
  63 data-centric networks, flow table summaries, network statistics and
  64 traffic monitoring. A set-summary is useful for applications who need to
  65 include a list of elements while a complete list requires too much space
  66 and/or too much processing cost. In these situations, the set-summary works as
  67 a lossy hash-based representation of a set of members. It can dramatically
  68 reduce space requirement and significantly improve the performance of set
  69 membership queries at the cost of introducing a very small membership test error
  70 probability.
  71
  72 .. _figure_membership1:
  73 .. figure:: img/member_i1.*
  74
  75   Example Usages of Membership Library
  76
  77 There are various usages for a Membership Library in a very
  78 large set of applications and workloads. Interested readers can refer to
  79 [Member-survey] for a survey of possible networking usages. The above figure
  80 provide a small set of examples of using the Membership Library:
  81
  82 * Sub-figure (a)
  83   depicts a distributed web cache architecture where a collection of proxies
  84   attempt to share their web caches (cached from a set of back-end web servers) to
  85   provide faster responses to clients, and the proxies use the Membership
  86   Library to share summaries of what web pages/objects they are caching. With the
  87   Membership Library, a proxy receiving an http request will inquire the
  88   set-summary to find its location and quickly determine whether to retrieve the
  89   requested web page from a nearby proxy or from a back-end web server.
  90
  91 * Sub-figure (b) depicts another example for using the Membership Library to
  92   prevent routing loops which is typically done using slow TTL countdown and
  93   dropping packets when TTL expires. As shown in Sub-figure (b), an embedded
  94   set-summary in the packet header itself can be used to summarize the set of
  95   nodes a packet has gone through, and each node upon receiving a packet can check
  96   whether its id is a member of the set of visited nodes, and if it is, then a
  97   routing loop is detected.
  98
  99 * Sub-Figure (c) presents another usage of the Membership
 100   Library to load-balance flows to worker threads with in-order guarantee where a
 101   set-summary is used to query if a packet belongs to an existing flow or a new
 102   flow. Packets belonging to a new flow are forwarded to the current least loaded
 103   worker thread, while those belonging to an existing flow are forwarded to the
 104   pre-assigned thread to guarantee in-order processing.
 105
 106 * Sub-figure (d) highlights
 107   yet another usage example in the database domain where a set-summary is used to
 108   determine joins between sets instead of creating a join by comparing each
 109   element of a set against the other elements in a different set, a join is done
 110   on the summaries since they can efficiently encode members of a given set.
 111
 112 Membership Library is a configurable library that is optimized to cover set
 113 membership functionality for both a single set and multi-set scenarios. Two set-summary
 114 schemes are presented including (a) vector of Bloom Filters and (b) Hash-Table based
 115 set-summary schemes with and without false negative probability.
 116 This guide first briefly describes these different types of set-summaries, usage examples for each,
 117 and then it highlights the Membership Library API.
 118
 119 Vector of Bloom Filters
 120 -----------------------
 121
 122 Bloom Filter (BF) [Member-bloom] is a well-known space-efficient
 123 probabilistic data structure that answers set membership queries (test whether
 124 an element is a member of a set) with some probability of false positives and
 125 zero false negatives; a query for an element returns either it is "possibly in
 126 a set" (with very high probability) or "definitely not in a set".
 127
 128 The BF is a method for representing a set of ``n`` elements (for example flow keys
 129 in network applications domain) to support membership queries. The idea of BF is
 130 to allocate a bit-vector ``v`` with ``m`` bits, which are initially all set to 0. Then
 131 it chooses ``k`` independent hash functions ``h1``, ``h2``, ... ``hk`` with hash values range from
 132 ``0`` to ``m-1`` to perform hashing calculations on each element to be inserted. Every time when an
 133 element ``X`` being inserted into the set, the bits at positions ``h1(X)``, ``h2(X)``, ...
 134 ``hk(X)`` in ``v`` are set to 1 (any particular bit might be set to 1 multiple times
 135 for multiple different inserted elements). Given a query for any element ``Y``, the
 136 bits at positions ``h1(Y)``, ``h2(Y)``, ... ``hk(Y)`` are checked. If any of them is 0,
 137 then Y is definitely not in the set. Otherwise there is a high probability that
 138 Y is a member of the set with certain false positive probability. As shown in
 139 the next equation, the false positive probability can be made arbitrarily small
 140 by changing the number of hash functions (``k``) and the vector length (``m``).
 141
 142 .. _figure_membership2:
 143 .. figure:: img/member_i2.*
 144
 145   Bloom Filter False Positive Probability
 146
 147 Without BF, an accurate membership testing could involve a costly hash table
 148 lookup and full element comparison. The advantage of using a BF is to simplify
 149 the membership test into a series of hash calculations and memory accesses for a
 150 small bit-vector, which can be easily optimized. Hence the lookup throughput
 151 (set membership test) can be significantly faster than a normal hash table
 152 lookup with element comparison.
 153
 154 .. _figure_membership3:
 155 .. figure:: img/member_i3.*
 156
 157   Detecting Routing Loops Using BF
 158
 159 BF is used for applications that need only one set, and the
 160 membership of elements is checked against the BF. The example discussed
 161 in the above figure is one example of potential applications that uses only one
 162 set to capture the node IDs that have been visited so far by the packet. Each
 163 node will then check this embedded BF in the packet header for its own id, and
 164 if the BF indicates that the current node is definitely not in the set then a
 165 loop-free route is guaranteed.
 166
 167
 168 .. _figure_membership4:
 169 .. figure:: img/member_i4.*
 170
 171   Vector Bloom Filter (vBF) Overview
 172
 173 To support membership test for both multiple sets and a single set,
 174 the library implements a Vector Bloom Filter (vBF) scheme.
 175 vBF basically composes multiple bloom filters into a vector of bloom filers.
 176 The membership test is conducted on all of the
 177 bloom filters concurrently to determine which set(s) it belongs to or none of
 178 them. The basic idea of vBF is shown in the above figure where an element is
 179 used to address multiple bloom filters concurrently and the bloom filter
 180 index(es) with a hit is returned.
 181
 182 .. _figure_membership5:
 183 .. figure:: img/member_i5.*
 184
 185   vBF for Flow Scheduling to Worker Thread
 186
 187 As previously mentioned, there are many usages of such structures. vBF is used
 188 for applications that need to check membership against multiple sets
 189 simultaneously. The example shown in the above figure uses a set to capture
 190 all flows being assigned for processing at a given worker thread. Upon receiving
 191 a packet the vBF is used to quickly figure out if this packet belongs to a new flow
 192 so as to be forwarded to the current least loaded worker thread, or otherwise it
 193 should be queued for an existing thread to guarantee in-order processing (i.e.
 194 the property of vBF to indicate right away that a given flow is a new one or
 195 not is critical to minimize response time latency).
 196
 197 It should be noted that vBF can be implemented using a set of single bloom
 198 filters with sequential lookup of each BF. However, being able to concurrently
 199 search all set-summaries is a big throughput advantage. In the library, certain
 200 parallelism is realized by the implementation of checking all bloom filters
 201 together.
 202
 203
 204 Hash-Table based Set-Summaries
 205 ------------------------------
 206
 207 Hash-table based set-summary (HTSS) is another scheme in the membership library.
 208 Cuckoo filter [Member-cfilter] is an example of HTSS.
 209 HTSS supports multi-set membership testing like
 210 vBF does. However, while vBF is better for a small number of targets, HTSS is more suitable
 211 and can easily outperform vBF when the number of sets is
 212 large, since HTSS uses a single hash table for membership testing while vBF
 213 requires testing a series of Bloom Filters each corresponding to one set.
 214 As a result, generally speaking vBF is more adequate for the case of a small limited number of sets
 215 while HTSS should be used with a larger number of sets.
 216
 217 .. _figure_membership6:
 218 .. figure:: img/member_i6.*
 219
 220   Using HTSS for Attack Signature Matching
 221
 222 As shown in the above figure, attack signature matching where each set
 223 represents a certain signature length (for correctness of this example, an
 224 attack signature should not be a subset of another one) in the payload is a good
 225 example for using HTSS with 0% false negative (i.e., when an element returns not
 226 found, it has a 100% certainty that it is not a member of any set).  The packet
 227 inspection application benefits from knowing right away that the current payload
 228 does not match any attack signatures in the database to establish its
 229 legitimacy, otherwise a deep inspection of the packet is needed.
 230
 231 HTSS employs a similar but simpler data structure to a traditional hash table,
 232 and the major difference is that HTSS stores only the signatures but not the
 233 full keys/elements which can significantly reduce the footprint of the table.
 234 Along with the signature, HTSS also stores a value to indicate the target set.
 235 When looking up an element, the element is hashed and the HTSS is addressed
 236 to retrieve the signature stored. If the signature matches then the value is
 237 retrieved corresponding to the index of the target set which the element belongs
 238 to. Because signatures can collide, HTSS can still has false positive
 239 probability. Furthermore, if elements are allowed to be
 240 overwritten or evicted when the hash table becomes full, it will also have a
 241 false negative probability. We discuss this case in the next section.
 242
 243 Set-Summaries with False Negative Probability
 244 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 245
 246 As previously mentioned, traditional set-summaries (e.g. Bloom Filters) do not
 247 have a false negative probability, i.e., it is 100% certain when an element
 248 returns "not to be present" for a given set. However, the Membership Library
 249 also supports a set-summary probabilistic data structure based on HTSS which
 250 allows for false negative probability.
 251
 252 In HTSS, when the hash table becomes full, keys/elements will fail to be added
 253 into the table and the hash table has to be resized to accommodate for these new
 254 elements, which can be expensive. However, if we allow new elements to overwrite
 255 or evict existing elements (as a cache typically does), then the resulting
 256 set-summary will begin to have false negative probability. This is because the
 257 element that was evicted from the set-summary may still be present in the target
 258 set. For subsequent inquiries the set-summary will falsely report the element
 259 not being in the set, hence having a false negative probability.
 260
 261 The major usage of HTSS with false negative is to use it as a cache for
 262 distributing elements to different target sets. By allowing HTSS to evict old
 263 elements, the set-summary can keep track of the most recent elements
 264 (i.e. active) as a cache typically does. Old inactive elements (infrequently
 265 used elements) will automatically and eventually get evicted from the
 266 set-summary. It is worth noting that the set-summary still has false positive
 267 probability, which means the application either can tolerate certain false positive
 268 or it has fall-back path when false positive happens.
 269
 270 .. _figure_membership7:
 271 .. figure:: img/member_i7.*
 272
 273   Using HTSS with False Negatives for Wild Card Classification
 274
 275 HTSS with false negative (i.e. a cache) also has its wide set of applications.
 276 For example wild card flow classification (e.g. ACL rules) highlighted in the
 277 above figure is an example of such application. In that case each target set
 278 represents a sub-table with rules defined by a certain flow mask. The flow masks
 279 are non-overlapping, and for flows matching more than one rule only the highest
 280 priority one is inserted in the corresponding sub-table (interested readers can
 281 refer to the Open vSwitch (OvS) design of Mega Flow Cache (MFC) [Member-OvS]
 282 for further details). Typically the rules will have a large number of distinct
 283 unique masks and hence, a large number of target sets each corresponding to one
 284 mask. Because the active set of flows varies widely based on the network
 285 traffic, HTSS with false negative will act as a cache for <flowid, target ACL
 286 sub-table> pair for the current active set of flows. When a miss occurs (as
 287 shown in red in the above figure) the sub-tables will be searched sequentially
 288 one by one for a possible match, and when found the flow key and target
 289 sub-table will be inserted into the set-summary (i.e. cache insertion) so
 290 subsequent packets from the same flow don’t incur the overhead of the
 291 sequential search of sub-tables.
 292
 293 Library API Overview
 294 --------------------
 295
 296 The design goal of the Membership Library API is to be as generic as possible to
 297 support all the different types of set-summaries we discussed in previous
 298 sections and beyond. Fundamentally, the APIs need to include creation,
 299 insertion, deletion, and lookup.
 300
 301
 302 Set-summary Create
 303 ~~~~~~~~~~~~~~~~~~
 304
 305 The ``rte_member_create()`` function is used to create a set-summary structure, the input parameter
 306 is a struct to pass in parameters that needed to initialize the set-summary, while the function returns the
 307 pointer to the created set-summary or ``NULL`` if the creation failed.
 308
 309 The general input arguments used when creating the set-summary should include ``name``
 310 which is the name of the created set-summary, *type* which is one of the types
 311 supported by the library (e.g. ``RTE_MEMBER_TYPE_HT`` for HTSS or ``RTE_MEMBER_TYPE_VBF`` for vBF), and ``key_len``
 312 which is the length of the element/key. There are other parameters
 313 are only used for certain type of set-summary, or which have a slightly different meaning for different types of set-summary.
 314 For example, ``num_keys`` parameter means the maximum number of entries for Hash table based set-summary.
 315 However, for bloom filter, this value means the expected number of keys that could be
 316 inserted into the bloom filter(s). The value is used to calculate the size of each
 317 bloom filter.
 318
 319 We also pass two seeds: ``prim_hash_seed`` and
 320 ``sec_hash_seed`` for the primary and secondary hash functions to calculate two independent hash values.
 321 ``socket_id`` parameter is the NUMA socket ID for the memory used to create the
 322 set-summary. For HTSS, another parameter ``is_cache`` is used to indicate
 323 if this set-summary is a cache (i.e. with false negative probability) or not.
 324 For vBF, extra parameters are needed. For example, ``num_set`` is the number of
 325 sets needed to initialize the vector bloom filters. This number is equal to the
 326 number of bloom filters will be created.
 327 ``false_pos_rate`` is the false positive rate. num_keys and false_pos_rate will be used to determine
 328 the number of hash functions and the bloom filter size.
 329
 330
 331 Set-summary Element Insertion
 332 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 333
 334 The ``rte_member_add()`` function is used to insert an element/key into a set-summary structure. If it fails an
 335 error is returned. For success the returned value is dependent on the
 336 set-summary mode to provide extra information for the users. For vBF
 337 mode, a return value of 0 means a successful insert. For HTSS mode without false negative, the insert
 338 could fail with ``-ENOSPC`` if the table is full. With false negative (i.e. cache mode),
 339 for insert that does not cause any eviction (i.e. no overwriting happens to an
 340 existing entry) the return value is 0. For insertion that causes eviction, the return
 341 value is 1 to indicate such situation, but it is not an error.
 342
 343 The input arguments for the function should include the ``key`` which is a pointer to the element/key that needs to
 344 be added to the set-summary, and ``set_id`` which is the set id associated
 345 with the key that needs to be added.
 346
 347
 348 Set-summary Element Lookup
 349 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 350
 351 The ``rte_member_lookup()`` function looks up a single key/element in the set-summary structure. It
 352 returns as soon as the first match is found. The return value is 1 if a
 353 match is found and 0 otherwise. The arguments for the function include ``key`` which is a pointer to the
 354 element/key that needs to be looked up, and ``set_id`` which is used to return the
 355 first target set id where the key has matched, if any.
 356
 357 The ``rte_member_lookup_bulk()`` function is used to look up a bulk of keys/elements in the
 358 set-summary structure for their first match. Each key lookup returns as soon as the first match is found. The
 359 return value is the number of keys that find a match. The arguments of the function include ``keys``
 360 which is a pointer to a bulk of keys that are to be looked up,
 361 ``num_keys`` is the number
 362 of keys that will be looked up, and ``set_ids`` are the return target set
 363 ids for the first match found for each of the input keys. ``set_ids`` is an array
 364 needs to be sized according to the ``num_keys``. If there is no match, the set id
 365 for that key will be set to RTE_MEMBER_NO_MATCH.
 366
 367 The ``rte_member_lookup_multi()`` function looks up a single key/element in the
 368 set-summary structure for multiple matches. It
 369 returns ALL the matches (possibly more than one) found for this key when it
 370 is matched against all target sets (it is worth noting that for cache mode HTSS,
 371 the current implementation matches at most one target set). The return value is
 372 the number of matches
 373 that was found for this key (for cache mode HTSS the return value
 374 should be at most 1). The arguments for the function include ``key`` which is a pointer to the
 375 element/key that needs to be looked up, ``max_match_per_key`` which is to indicate the maximum number of matches
 376 the user expects to find for each key, and ``set_id`` which is used to return all
 377 target set ids where the key has matched, if any. The ``set_id`` array should be sized
 378 according to ``max_match_per_key``. For vBF, the maximum number of matches per key is equal
 379 to the number of sets. For HTSS, the maximum number of matches per key is equal to two time
 380 entry count per bucket. ``max_match_per_key`` should be equal or smaller than the maximum number of
 381 possible matches.
 382
 383 The ``rte_membership_lookup_multi_bulk()`` function looks up a bulk of keys/elements in the
 384 set-summary structure for multiple matches, each key lookup returns ALL the matches (possibly more
 385 than one) found for this key when it is matched against all target sets (cache mode HTSS
 386 matches at most one target set). The
 387 return value is the number of keys that find one or more matches in the
 388 set-summary structure. The arguments of the
 389 function include ``keys`` which is
 390 a pointer to a bulk of keys that are to be looked up, ``num_keys`` is the number
 391 of keys that will be looked up, ``max_match_per_key`` is the possible
 392 maximum number of matches for each key, ``match_count`` which is the returned number
 393 of matches for each key, and ``set_ids`` are the returned target set
 394 ids for all matches found for each keys. ``set_ids`` is 2-D array
 395 containing a 1-D array for each key (the size of 1-D array per key should be set by the user according to ``max_match_per_key``).
 396 ``max_match_per_key`` should be equal or smaller than the maximum number of
 397 possible matches, similar to ``rte_member_lookup_multi``.
 398
 399
 400 Set-summary Element Delete
 401 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 402
 403 The ``rte_membership_delete()`` function deletes an element/key from a set-summary structure, if it fails
 404 an error is returned. The input arguments should include ``key`` which is a pointer to the
 405 element/key that needs to be deleted from the set-summary, and ``set_id``
 406 which is the set id associated with the key to delete. It is worth noting that current
 407 implementation of vBF does not support deletion [1]_. An error code ``-EINVAL`` will be returned.
 408
 409 .. [1] Traditional bloom filter does not support proactive deletion. Supporting proactive deletion require additional implementation and performance overhead.
 410
 411 References
 412 -----------
 413
 414 [Member-bloom] B H Bloom, "Space/Time Trade-offs in Hash Coding with Allowable Errors," Communications of the ACM, 1970.
 415
 416 [Member-survey] A Broder and M Mitzenmacher, "Network Applications of Bloom Filters: A Survey," in Internet Mathematics, 2005.
 417
 418 [Member-cfilter] B Fan, D G Andersen and M Kaminsky, "Cuckoo Filter: Practically Better Than Bloom," in Conference on emerging Networking Experiments and Technologies, 2014.
 419
 420 [Member-OvS] B Pfaff, "The Design and Implementation of Open vSwitch," in NSDI, 2015.