1 .. SPDX-License-Identifier: BSD-3-Clause
2 Copyright(c) 2017-2018 Cavium Networks.
4 Compression Device Library
5 ===========================
7 The compression framework provides a generic set of APIs to perform compression services
8 as well as to query and configure compression devices both physical(hardware) and virtual(software)
9 to perform those services. The framework currently only supports lossless compression schemes:
18 Physical compression devices are discovered during the bus probe of the EAL function
19 which is executed at DPDK initialization, based on their unique device identifier.
20 For e.g. PCI devices can be identified using PCI BDF (bus/bridge, device, function).
21 Specific physical compression devices, like other physical devices in DPDK can be
22 white-listed or black-listed using the EAL command line options.
24 Virtual devices can be created by two mechanisms, either using the EAL command
25 line options or from within the application using an EAL API directly.
27 From the command line using the --vdev EAL option
29 .. code-block:: console
31 --vdev '<pmd name>,socket_id=0'
35 * If DPDK application requires multiple software compression PMD devices then required
36 number of ``--vdev`` with appropriate libraries are to be added.
38 * An Application with multiple compression device instances exposed by the same PMD must
39 specify a unique name for each device.
41 Example: ``--vdev 'pmd0' --vdev 'pmd1'``
43 Or, by using the rte_vdev_init API within the application code.
47 rte_vdev_init("<pmd_name>","socket_id=0")
49 All virtual compression devices support the following initialization parameters:
51 * ``socket_id`` - socket on which to allocate the device resources on.
56 Each device, whether virtual or physical is uniquely designated by two
59 - A unique device index used to designate the compression device in all functions
60 exported by the compressdev API.
62 - A device name used to designate the compression device in console messages, for
63 administration or debugging purposes.
68 The configuration of each compression device includes the following operations:
70 - Allocation of resources, including hardware resources if a physical device.
71 - Resetting the device into a well-known default state.
72 - Initialization of statistics counters.
74 The ``rte_compressdev_configure`` API is used to configure a compression device.
76 The ``rte_compressdev_config`` structure is used to pass the configuration
79 See *DPDK API Reference* for details.
81 Configuration of Queue Pairs
82 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
84 Each compression device queue pair is individually configured through the
85 ``rte_compressdev_queue_pair_setup`` API.
87 The ``max_inflight_ops`` is used to pass maximum number of
88 rte_comp_op that could be present in a queue at-a-time.
89 PMD then can allocate resources accordingly on a specified socket.
91 See *DPDK API Reference* for details.
93 Logical Cores, Memory and Queues Pair Relationships
94 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
96 Library supports NUMA similarly as described in Cryptodev library section.
98 A queue pair cannot be shared and should be exclusively used by a single processing
99 context for enqueuing operations or dequeuing operations on the same compression device
100 since sharing would require global locks and hinder performance. It is however possible
101 to use a different logical core to dequeue an operation on a queue pair from the logical
102 core on which it was enqueued. This means that a compression burst enqueue/dequeue
103 APIs are a logical place to transition from one logical core to another in a
104 data processing pipeline.
106 Device Features and Capabilities
107 ---------------------------------
109 Compression devices define their functionality through two mechanisms, global device
110 features and algorithm features. Global devices features identify device
111 wide level features which are applicable to the whole device such as supported hardware
112 acceleration and CPU features. List of compression device features can be seen in the
113 RTE_COMPDEV_FF_XXX macros.
115 The algorithm features lists individual algo feature which device supports per-algorithm,
116 such as a stateful compression/decompression, checksums operation etc. List of algorithm
117 features can be seen in the RTE_COMP_FF_XXX macros.
121 Each PMD has a list of capabilities, including algorithms listed in
122 enum ``rte_comp_algorithm`` and its associated feature flag and
123 sliding window range in log base 2 value. Sliding window tells
124 the minimum and maximum size of lookup window that algorithm uses
127 See *DPDK API Reference* for details.
129 Each Compression poll mode driver defines its array of capabilities
130 for each algorithm it supports. See PMD implementation for capability
133 Capabilities Discovery
134 ~~~~~~~~~~~~~~~~~~~~~~
136 PMD capability and features are discovered via ``rte_compressdev_info_get`` function.
138 The ``rte_compressdev_info`` structure contains all the relevant information for the device.
140 See *DPDK API Reference* for details.
142 Compression Operation
143 ----------------------
145 DPDK compression supports two types of compression methodologies:
147 - Stateless, data associated to a compression operation is compressed without any reference
148 to another compression operation.
150 - Stateful, data in each compression operation is compressed with reference to previous compression
151 operations in the same data stream i.e. history of data is maintained between the operations.
153 For more explanation, please refer RFC https://www.ietf.org/rfc/rfc1951.txt
155 Operation Representation
156 ~~~~~~~~~~~~~~~~~~~~~~~~
158 Compression operation is described via ``struct rte_comp_op``, which contains both input and
159 output data. The operation structure includes the operation type (stateless or stateful),
160 the operation status and the priv_xform/stream handle, source, destination and checksum buffer
161 pointers. It also contains the source mempool from which the operation is allocated.
162 PMD updates consumed field with amount of data read from source buffer and produced
163 field with amount of data of written into destination buffer along with status of
164 operation. See section *Produced, Consumed And Operation Status* for more details.
166 Compression operations mempool also has an ability to allocate private memory with the
167 operation for application's purposes. Application software is responsible for specifying
168 all the operation specific fields in the ``rte_comp_op`` structure which are then used
169 by the compression PMD to process the requested operation.
172 Operation Management and Allocation
173 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
175 The compressdev library provides an API set for managing compression operations which
176 utilize the Mempool Library to allocate operation buffers. Therefore, it ensures
177 that the compression operation is interleaved optimally across the channels and
178 ranks for optimal processing.
180 A ``rte_comp_op`` contains a field indicating the pool it originated from.
182 ``rte_comp_op_alloc()`` and ``rte_comp_op_bulk_alloc()`` are used to allocate
183 compression operations from a given compression operation mempool.
184 The operation gets reset before being returned to a user so that operation
185 is always in a good known state before use by the application.
187 ``rte_comp_op_free()`` is called by the application to return an operation to
190 See *DPDK API Reference* for details.
192 Passing source data as mbuf-chain
193 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
194 If input data is scattered across several different buffers, then
195 Application can either parse through all such buffers and make one
196 mbuf-chain and enqueue it for processing or, alternatively, it can
197 make multiple sequential enqueue_burst() calls for each of them
198 processing them statefully. See *Compression API Stateful Operation*
199 for stateful processing of ops.
203 Each operation carries a status information updated by PMD after it is processed.
204 Following are currently supported:
206 - RTE_COMP_OP_STATUS_SUCCESS,
207 Operation is successfully completed
209 - RTE_COMP_OP_STATUS_NOT_PROCESSED,
210 Operation has not yet been processed by the device
212 - RTE_COMP_OP_STATUS_INVALID_ARGS,
213 Operation failed due to invalid arguments in request
215 - RTE_COMP_OP_STATUS_ERROR,
216 Operation failed because of internal error
218 - RTE_COMP_OP_STATUS_INVALID_STATE,
219 Operation is invoked in invalid state
221 - RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED,
222 Output buffer ran out of space during processing. Error case,
223 PMD cannot continue from here.
225 - RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE,
226 Output buffer ran out of space before operation completed, but this
227 is not an error case. Output data up to op.produced can be used and
228 next op in the stream should continue on from op.consumed+1.
230 Operation status after enqueue / dequeue
231 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
232 Some of the above values may arise in the op after an
233 ``rte_compressdev_enqueue_burst()``. If number ops enqueued < number ops requested then
234 the app should check the op.status of nb_enqd+1. If status is RTE_COMP_OP_STATUS_NOT_PROCESSED,
235 it likely indicates a full-queue case for a hardware device and a retry after dequeuing some ops is likely
236 to be successful. If the op holds any other status, e.g. RTE_COMP_OP_STATUS_INVALID_ARGS, a retry with
237 the same op is unlikely to be successful.
240 Produced, Consumed And Operation Status
241 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
243 - If status is RTE_COMP_OP_STATUS_SUCCESS,
244 consumed = amount of data read from input buffer, and
245 produced = amount of data written in destination buffer
246 - If status is RTE_COMP_OP_STATUS_ERROR,
247 consumed = produced = undefined
248 - If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED,
250 produced = usually 0, but in decompression cases a PMD may return > 0
251 i.e. amount of data successfully produced until out of space condition
252 hit. Application can consume output data in this case, if required.
253 - If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE,
254 consumed = amount of data read, and
255 produced = amount of data successfully produced until
256 out of space condition hit. PMD has ability to recover
257 from here, so application can submit next op from
258 consumed+1 and a destination buffer with available space.
263 Compression transforms (``rte_comp_xform``) are the mechanism
264 to specify the details of the compression operation such as algorithm,
265 window size and checksum.
267 Compression API Hash support
268 ----------------------------
270 Compression API allows application to enable digest calculation
271 alongside compression and decompression of data. A PMD reflects its
272 support for hash algorithms via capability algo feature flags.
273 If supported, PMD calculates digest always on plaintext i.e.
274 before compression and after decompression.
276 Currently supported list of hash algos are SHA-1 and SHA2 family
279 See *DPDK API Reference* for details.
281 If required, application should set valid hash algo in compress
282 or decompress xforms during ``rte_compressdev_stream_create()``
283 or ``rte_compressdev_private_xform_create()`` and pass a valid
284 output buffer in ``rte_comp_op`` hash field struct to store the
285 resulting digest. Buffer passed should be contiguous and large
286 enough to store digest which is 20 bytes for SHA-1 and
287 32 bytes for SHA2-256.
289 Compression API Stateless operation
290 ------------------------------------
292 An op is processed stateless if it has
293 - op_type set to RTE_COMP_OP_STATELESS
294 - flush value set to RTE_FLUSH_FULL or RTE_FLUSH_FINAL
295 (required only on compression side),
296 - All required input in source buffer
298 When all of the above conditions are met, PMD initiates stateless processing
299 and releases acquired resources after processing of current operation is
300 complete. Application can enqueue multiple stateless ops in a single burst
301 and must attach priv_xform handle to such ops.
303 priv_xform in Stateless operation
304 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
306 priv_xform is PMD internally managed private data that it maintains to do stateless processing.
307 priv_xforms are initialized provided a generic xform structure by an application via making call
308 to ``rte_comp_private_xform_create``, at an output PMD returns an opaque priv_xform reference.
309 If PMD support SHAREABLE priv_xform indicated via algorithm feature flag, then application can
310 attach same priv_xform with many stateless ops at-a-time. If not, then application needs to
311 create as many priv_xforms as it expects to have stateless operations in-flight.
313 .. figure:: img/stateless-op.*
315 Stateless Ops using Non-Shareable priv_xform
318 .. figure:: img/stateless-op-shared.*
320 Stateless Ops using Shareable priv_xform
323 Application should call ``rte_compressdev_private_xform_create()`` and attach to stateless op before
324 enqueuing them for processing and free via ``rte_compressdev_private_xform_free()`` during termination.
326 An example pseudocode to setup and process NUM_OPS stateless ops with each of length OP_LEN
327 using priv_xform would look like:
332 * pseudocode for stateless compression
335 uint8_t cdev_id = rte_compdev_get_dev_id(<pmd name>);
337 /* configure the device. */
338 if (rte_compressdev_configure(cdev_id, &conf) < 0)
339 rte_exit(EXIT_FAILURE, "Failed to configure compressdev %u", cdev_id);
341 if (rte_compressdev_queue_pair_setup(cdev_id, 0, NUM_MAX_INFLIGHT_OPS,
343 rte_exit(EXIT_FAILURE, "Failed to setup queue pair\n");
345 if (rte_compressdev_start(cdev_id) < 0)
346 rte_exit(EXIT_FAILURE, "Failed to start device\n");
348 /* setup compress transform */
349 struct rte_compress_compress_xform compress_xform = {
350 .type = RTE_COMP_COMPRESS,
352 .algo = RTE_COMP_ALGO_DEFLATE,
354 .huffman = RTE_COMP_HUFFMAN_DEFAULT
356 .level = RTE_COMP_LEVEL_PMD_DEFAULT,
357 .chksum = RTE_COMP_CHECKSUM_NONE,
358 .window_size = DEFAULT_WINDOW_SIZE,
359 .hash_algo = RTE_COMP_HASH_ALGO_NONE
363 /* create priv_xform and initialize it for the compression device. */
364 void *priv_xform = NULL;
365 rte_compressdev_info_get(cdev_id, &dev_info);
366 if(dev_info.capability->comps_feature_flag & RTE_COMP_FF_SHAREABLE_PRIV_XFORM) {
367 rte_comp_priv_xform_create(cdev_id, &compress_xform, &priv_xform);
372 /* create operation pool via call to rte_comp_op_pool_create and alloc ops */
373 rte_comp_op_bulk_alloc(op_pool, comp_ops, NUM_OPS);
375 /* prepare ops for compression operations */
376 for (i = 0; i < NUM_OPS; i++) {
377 struct rte_comp_op *op = comp_ops[i];
379 rte_priv_xform_create(cdev_id, &compress_xform, &op->priv_xform)
381 op->priv_xform = priv_xform;
382 op->type = RTE_COMP_OP_STATELESS;
383 op->flush = RTE_COMP_FLUSH_FINAL;
387 op->src.length = OP_LEN;
388 op->input_chksum = 0;
389 setup op->m_src and op->m_dst;
391 num_enqd = rte_compressdev_enqueue_burst(cdev_id, 0, comp_ops, NUM_OPS);
392 /* wait for this to complete before enqueuing next*/
394 num_deque = rte_compressdev_dequeue_burst(cdev_id, 0 , &processed_ops, NUM_OPS);
395 } while (num_dqud < num_enqd);
398 Stateless and OUT_OF_SPACE
399 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
401 OUT_OF_SPACE is a condition when output buffer runs out of space and where PMD
402 still has more data to produce. If PMD runs into such condition, then PMD returns
403 RTE_COMP_OP_OUT_OF_SPACE_TERMINATED error. In such case, PMD resets itself and can set
404 consumed=0 and produced=amount of output it could produce before hitting out_of_space.
405 Application would need to resubmit the whole input with a larger output buffer, if it
406 wants the operation to be completed.
410 If hash is enabled, digest buffer will contain valid data after op is successfully
411 processed i.e. dequeued with status = RTE_COMP_OP_STATUS_SUCCESS.
413 Checksum in Stateless
414 ~~~~~~~~~~~~~~~~~~~~~
415 If checksum is enabled, checksum will only be available after op is successfully
416 processed i.e. dequeued with status = RTE_COMP_OP_STATUS_SUCCESS.
418 Compression API Stateful operation
419 -----------------------------------
421 Compression API provide RTE_COMP_FF_STATEFUL_COMPRESSION and
422 RTE_COMP_FF_STATEFUL_DECOMPRESSION feature flag for PMD to reflect
423 its support for Stateful operations.
425 A Stateful operation in DPDK compression means application invokes enqueue
426 burst() multiple times to process related chunk of data because
427 application broke data into several ops.
430 - ops are setup with op_type RTE_COMP_OP_STATEFUL,
431 - all ops except last set to flush value = RTE_COMP_NO/SYNC_FLUSH
432 and last set to flush value RTE_COMP_FULL/FINAL_FLUSH.
434 In case of either one or all of the above conditions, PMD initiates
435 stateful processing and releases acquired resources after processing
436 operation with flush value = RTE_COMP_FLUSH_FULL/FINAL is complete.
437 Unlike stateless, application can enqueue only one stateful op from
438 a particular stream at a time and must attach stream handle
441 Stream in Stateful operation
442 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
444 `stream` in DPDK compression is a logical entity which identifies related set of ops, say, a one large
445 file broken into multiple chunks then file is represented by a stream and each chunk of that file is
446 represented by compression op `rte_comp_op`. Whenever application wants a stateful processing of such
447 data, then it must get a stream handle via making call to ``rte_comp_stream_create()``
448 with xform, at an output the target PMD will return an opaque stream handle to application which
449 it must attach to all of the ops carrying data of that stream. In stateful processing, every op
450 requires previous op data for compression/decompression. A PMD allocates and set up resources such
451 as history, states, etc. within a stream, which are maintained during the processing of the related ops.
453 Unlike priv_xforms, stream is always a NON_SHAREABLE entity. One stream handle must be attached to only
454 one set of related ops and cannot be reused until all of them are processed with status Success or failure.
456 .. figure:: img/stateful-op.*
461 Application should call ``rte_comp_stream_create()`` and attach to op before
462 enqueuing them for processing and free via ``rte_comp_stream_free()`` during
463 termination. All ops that are to be processed statefully should carry *same* stream.
465 See *DPDK API Reference* document for details.
467 An example pseudocode to set up and process a stream having NUM_CHUNKS with each chunk size of CHUNK_LEN would look like:
472 * pseudocode for stateful compression
475 uint8_t cdev_id = rte_compdev_get_dev_id(<pmd name>);
477 /* configure the device. */
478 if (rte_compressdev_configure(cdev_id, &conf) < 0)
479 rte_exit(EXIT_FAILURE, "Failed to configure compressdev %u", cdev_id);
481 if (rte_compressdev_queue_pair_setup(cdev_id, 0, NUM_MAX_INFLIGHT_OPS,
483 rte_exit(EXIT_FAILURE, "Failed to setup queue pair\n");
485 if (rte_compressdev_start(cdev_id) < 0)
486 rte_exit(EXIT_FAILURE, "Failed to start device\n");
488 /* setup compress transform. */
489 struct rte_compress_compress_xform compress_xform = {
490 .type = RTE_COMP_COMPRESS,
492 .algo = RTE_COMP_ALGO_DEFLATE,
494 .huffman = RTE_COMP_HUFFMAN_DEFAULT
496 .level = RTE_COMP_LEVEL_PMD_DEFAULT,
497 .chksum = RTE_COMP_CHECKSUM_NONE,
498 .window_size = DEFAULT_WINDOW_SIZE,
499 .hash_algo = RTE_COMP_HASH_ALGO_NONE
504 rte_comp_stream_create(cdev_id, &compress_xform, &stream);
506 /* create an op pool and allocate ops */
507 rte_comp_op_bulk_alloc(op_pool, comp_ops, NUM_CHUNKS);
509 /* Prepare source and destination mbufs for compression operations */
511 for (i = 0; i < NUM_CHUNKS; i++) {
512 if (rte_pktmbuf_append(mbufs[i], CHUNK_LEN) == NULL)
513 rte_exit(EXIT_FAILURE, "Not enough room in the mbuf\n");
514 comp_ops[i]->m_src = mbufs[i];
515 if (rte_pktmbuf_append(dst_mbufs[i], CHUNK_LEN) == NULL)
516 rte_exit(EXIT_FAILURE, "Not enough room in the mbuf\n");
517 comp_ops[i]->m_dst = dst_mbufs[i];
520 /* Set up the compress operations. */
521 for (i = 0; i < NUM_CHUNKS; i++) {
522 struct rte_comp_op *op = comp_ops[i];
524 op->m_src = src_buf[i];
525 op->m_dst = dst_buf[i];
526 op->type = RTE_COMP_OP_STATEFUL;
527 if(i == NUM_CHUNKS-1) {
528 /* set to final, if last chunk*/
529 op->flush = RTE_COMP_FLUSH_FINAL;
531 /* set to NONE, for all intermediary ops */
532 op->flush = RTE_COMP_FLUSH_NONE;
536 op->src.length = CHUNK_LEN;
537 op->input_chksum = 0;
538 num_enqd = rte_compressdev_enqueue_burst(cdev_id, 0, &op[i], 1);
539 /* wait for this to complete before enqueuing next*/
541 num_deqd = rte_compressdev_dequeue_burst(cdev_id, 0 , &processed_ops, 1);
542 } while (num_deqd < num_enqd);
547 Stateful and OUT_OF_SPACE
548 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
550 If PMD supports stateful operation, then OUT_OF_SPACE status is not an actual
551 error for the PMD. In such case, PMD returns with status
552 RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE with consumed = number of input bytes
553 read and produced = length of complete output buffer.
554 Application should enqueue next op with source starting at consumed+1 and an
555 output buffer with available space.
559 If enabled, digest buffer will contain valid digest after last op in stream
560 (having flush = RTE_COMP_OP_FLUSH_FINAL) is successfully processed i.e. dequeued
561 with status = RTE_COMP_OP_STATUS_SUCCESS.
565 If enabled, checksum will only be available after last op in stream
566 (having flush = RTE_COMP_OP_FLUSH_FINAL) is successfully processed i.e. dequeued
567 with status = RTE_COMP_OP_STATUS_SUCCESS.
569 Burst in compression API
570 -------------------------
572 Scheduling of compression operations on DPDK's application data path is
573 performed using a burst oriented asynchronous API set. A queue pair on a compression
574 device accepts a burst of compression operations using enqueue burst API. On physical
575 devices the enqueue burst API will place the operations to be processed
576 on the device's hardware input queue, for virtual devices the processing of the
577 operations is usually completed during the enqueue call to the compression
578 device. The dequeue burst API will retrieve any processed operations available
579 from the queue pair on the compression device, from physical devices this is usually
580 directly from the devices processed queue, and for virtual device's from a
581 ``rte_ring`` where processed operations are placed after being processed on the
584 A burst in DPDK compression can be a combination of stateless and stateful operations with a condition
585 that for stateful ops only one op at-a-time should be enqueued from a particular stream i.e. no-two ops
586 should belong to same stream in a single burst. However a burst may contain multiple stateful ops as long
587 as each op is attached to a different stream i.e. a burst can look like:
589 +---------------+--------------+--------------+-----------------+--------------+--------------+
590 | enqueue_burst | op1.no_flush | op2.no_flush | op3.flush_final | op4.no_flush | op5.no_flush |
591 +---------------+--------------+--------------+-----------------+--------------+--------------+
593 Where, op1 .. op5 all belong to different independent data units. op1, op2, op4, op5 must be stateful
594 as stateless ops can only use flush full or final and op3 can be of type stateless or stateful.
595 Every op with type set to RTE_COMP_OP_TYPE_STATELESS must be attached to priv_xform and
596 Every op with type set to RTE_COMP_OP_TYPE_STATEFUL *must* be attached to stream.
598 Since each operation in a burst is independent and thus can be completed
599 out-of-order, applications which need ordering, should setup per-op user data
600 area with reordering information so that it can determine enqueue order at
603 Also if multiple threads calls enqueue_burst() on same queue pair then it’s
604 application onus to use proper locking mechanism to ensure exclusive enqueuing
607 Enqueue / Dequeue Burst APIs
608 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
610 The burst enqueue API uses a compression device identifier and a queue pair
611 identifier to specify the compression device queue pair to schedule the processing on.
612 The ``nb_ops`` parameter is the number of operations to process which are
613 supplied in the ``ops`` array of ``rte_comp_op`` structures.
614 The enqueue function returns the number of operations it actually enqueued for
615 processing, a return value equal to ``nb_ops`` means that all packets have been
618 The dequeue API uses the same format as the enqueue API but
619 the ``nb_ops`` and ``ops`` parameters are now used to specify the max processed
620 operations the user wishes to retrieve and the location in which to store them.
621 The API call returns the actual number of processed operations returned, this
622 can never be larger than ``nb_ops``.
627 There are unit test applications that show how to use the compressdev library inside
628 app/test/test_compressdev.c
630 Compression Device API
631 ~~~~~~~~~~~~~~~~~~~~~~
633 The compressdev Library API is described in the *DPDK API Reference* document.