+DLB.
+
+Hardware Credits
+~~~~~~~~~~~~~~~~
+
+DLB uses a hardware credit scheme to prevent software from overflowing hardware
+event storage, with each unit of storage represented by a credit. A port spends
+a credit to enqueue an event, and hardware refills the ports with credits as the
+events are scheduled to ports. Refills come from credit pools.
+
+For DLB v2.5, there is a single credit pool used for both load balanced and
+directed traffic.
+
+For DLB v2.0, each port is a member of both a load-balanced credit pool and a
+directed credit pool. The load-balanced credits are used to enqueue to
+load-balanced queues, and directed credits are used for directed queues.
+These pools' sizes are controlled by the nb_events_limit field in struct
+rte_event_dev_config. The load-balanced pool is sized to contain
+nb_events_limit credits, and the directed pool is sized to contain
+nb_events_limit/4 credits. The directed pool size can be overridden with the
+num_dir_credits devargs argument, like so:
+
+ .. code-block:: console
+
+ --allow ea:00.0,num_dir_credits=<value>
+
+This can be used if the default allocation is too low or too high for the
+specific application needs. The PMD also supports a devarg that limits the
+max_num_events reported by rte_event_dev_info_get():
+
+ .. code-block:: console
+
+ --allow ea:00.0,max_num_events=<value>
+
+By default, max_num_events is reported as the total available load-balanced
+credits. If multiple DLB-based applications are being used, it may be desirable
+to control how many load-balanced credits each application uses, particularly
+when application(s) are written to configure nb_events_limit equal to the
+reported max_num_events.
+
+Each port is a member of both credit pools. A port's credit allocation is
+defined by its low watermark, high watermark, and refill quanta. These three
+parameters are calculated by the DLB PMD like so:
+
+- The load-balanced high watermark is set to the port's enqueue_depth.
+ The directed high watermark is set to the minimum of the enqueue_depth and
+ the directed pool size divided by the total number of ports.
+- The refill quanta is set to half the high watermark.
+- The low watermark is set to the minimum of 16 and the refill quanta.
+
+When the eventdev is started, each port is pre-allocated a high watermark's
+worth of credits. For example, if an eventdev contains four ports with enqueue
+depths of 32 and a load-balanced credit pool size of 4096, each port will start
+with 32 load-balanced credits, and there will be 3968 credits available to
+replenish the ports. Thus, a single port is not capable of enqueueing up to the
+nb_events_limit (without any events being dequeued), since the other ports are
+retaining their initial credit allocation; in short, all ports must enqueue in
+order to reach the limit.
+
+If a port attempts to enqueue and has no credits available, the enqueue
+operation will fail and the application must retry the enqueue. Credits are
+replenished asynchronously by the DLB hardware.
+
+Software Credits
+~~~~~~~~~~~~~~~~
+
+The DLB is a "closed system" event dev, and the DLB PMD layers a software
+credit scheme on top of the hardware credit scheme in order to comply with
+the per-port backpressure described in the eventdev API.
+
+The DLB's hardware scheme is local to a queue/pipeline stage: a port spends a
+credit when it enqueues to a queue, and credits are later replenished after the
+events are dequeued and released.
+
+In the software credit scheme, a credit is consumed when a new (.op =
+RTE_EVENT_OP_NEW) event is injected into the system, and the credit is
+replenished when the event is released from the system (either explicitly with
+RTE_EVENT_OP_RELEASE or implicitly in dequeue_burst()).
+
+In this model, an event is "in the system" from its first enqueue into eventdev
+until it is last dequeued. If the event goes through multiple event queues, it
+is still considered "in the system" while a worker thread is processing it.
+
+A port will fail to enqueue if the number of events in the system exceeds its
+``new_event_threshold`` (specified at port setup time). A port will also fail
+to enqueue if it lacks enough hardware credits to enqueue; load-balanced
+credits are used to enqueue to a load-balanced queue, and directed credits are
+used to enqueue to a directed queue.
+
+The out-of-credit situations are typically transient, and an eventdev
+application using the DLB ought to retry its enqueues if they fail.
+If enqueue fails, DLB PMD sets rte_errno as follows:
+
+- -ENOSPC: Credit exhaustion (either hardware or software)
+- -EINVAL: Invalid argument, such as port ID, queue ID, or sched_type.
+
+Depending on the pipeline the application has constructed, it's possible to
+enter a credit deadlock scenario wherein the worker thread lacks the credit
+to enqueue an event, and it must dequeue an event before it can recover the
+credit. If the worker thread retries its enqueue indefinitely, it will not
+make forward progress. Such deadlock is possible if the application has event
+"loops", in which an event in dequeued from queue A and later enqueued back to
+queue A.
+
+Due to this, workers should stop retrying after a time, release the events it
+is attempting to enqueue, and dequeue more events. It is important that the
+worker release the events and don't simply set them aside to retry the enqueue
+again later, because the port has limited history list size (by default, twice
+the port's dequeue_depth).
+
+Priority
+~~~~~~~~
+
+The DLB supports event priority and per-port queue service priority, as
+described in the eventdev header file. The DLB does not support 'global' event
+queue priority established at queue creation time.
+
+DLB supports 4 event and queue service priority levels. For both priority types,
+the PMD uses the upper three bits of the priority field to determine the DLB
+priority, discarding the 5 least significant bits. But least significant bit out
+of 3 priority bits is effectively ignored for binning into 4 priorities. The
+discarded 5 least significant event priority bits are not preserved when an event
+is enqueued.
+
+Note that event priority only works within the same event type.
+When atomic and ordered or unordered events are enqueued to same QID, priority
+across the types is always equal, and both types are served in a round robin manner.
+
+Reconfiguration
+~~~~~~~~~~~~~~~
+
+The Eventdev API allows one to reconfigure a device, its ports, and its queues
+by first stopping the device, calling the configuration function(s), then
+restarting the device. The DLB does not support configuring an individual queue
+or port without first reconfiguring the entire device, however, so there are
+certain reconfiguration sequences that are valid in the eventdev API but not
+supported by the PMD.
+
+Specifically, the PMD supports the following configuration sequence:
+1. Configure and start the device
+2. Stop the device
+3. (Optional) Reconfigure the device
+4. (Optional) If step 3 is run:
+
+ a. Setup queue(s). The reconfigured queue(s) lose their previous port links.
+ b. The reconfigured port(s) lose their previous queue links.
+
+5. (Optional, only if steps 4a and 4b are run) Link port(s) to queue(s)
+6. Restart the device. If the device is reconfigured in step 3 but one or more
+ of its ports or queues are not, the PMD will apply their previous
+ configuration (including port->queue links) at this time.
+
+The PMD does not support the following configuration sequences:
+1. Configure and start the device
+2. Stop the device
+3. Setup queue or setup port
+4. Start the device
+
+This sequence is not supported because the event device must be reconfigured
+before its ports or queues can be.
+
+Atomic Inflights Allocation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In the last stage prior to scheduling an atomic event to a CQ, DLB holds the
+inflight event in a temporary buffer that is divided among load-balanced
+queues. If a queue's atomic buffer storage fills up, this can result in
+head-of-line-blocking. For example:
+
+- An LDB queue allocated N atomic buffer entries
+- All N entries are filled with events from flow X, which is pinned to CQ 0.
+
+Until CQ 0 releases 1+ events, no other atomic flows for that LDB queue can be
+scheduled. The likelihood of this case depends on the eventdev configuration,
+traffic behavior, event processing latency, potential for a worker to be
+interrupted or otherwise delayed, etc.
+
+By default, the PMD allocates 16 buffer entries for each load-balanced queue,
+which provides an even division across all 128 queues but potentially wastes
+buffer space (e.g. if not all queues are used, or aren't used for atomic
+scheduling).
+
+The PMD provides a dev arg to override the default per-queue allocation. To
+increase per-queue atomic-inflight allocation to (for example) 64:
+
+ .. code-block:: console
+
+ --allow ea:00.0,atm_inflights=64
+
+QID Depth Threshold
+~~~~~~~~~~~~~~~~~~~
+
+DLB supports setting and tracking queue depth thresholds. Hardware uses
+the thresholds to track how full a queue is compared to its threshold.
+Four buckets are used
+
+- Less than or equal to 50% of queue depth threshold
+- Greater than 50%, but less than or equal to 75% of depth threshold
+- Greater than 75%, but less than or equal to 100% of depth threshold
+- Greater than 100% of depth thresholds
+
+Per queue threshold metrics are tracked in the DLB xstats, and are also
+returned in the impl_opaque field of each received event.
+
+The per qid threshold can be specified as part of the device args, and
+can be applied to all queue, a range of queues, or a single queue, as
+shown below.
+
+ .. code-block:: console
+
+ --allow ea:00.0,qid_depth_thresh=all:<threshold_value>
+ --allow ea:00.0,qid_depth_thresh=qidA-qidB:<threshold_value>
+ --allow ea:00.0,qid_depth_thresh=qid:<threshold_value>
+
+Class of service
+~~~~~~~~~~~~~~~~
+
+DLB supports provisioning the DLB bandwidth into 4 classes of service.
+
+- Class 4 corresponds to 40% of the DLB hardware bandwidth
+- Class 3 corresponds to 30% of the DLB hardware bandwidth
+- Class 2 corresponds to 20% of the DLB hardware bandwidth
+- Class 1 corresponds to 10% of the DLB hardware bandwidth
+- Class 0 corresponds to don't care
+
+The classes are applied globally to the set of ports contained in this
+scheduling domain, which is more appropriate for the bifurcated
+PMD than for the PF PMD, since the PF PMD supports just 1 scheduling
+domain.
+
+Class of service can be specified in the devargs, as follows
+
+ .. code-block:: console
+
+ --allow ea:00.0,cos=<0..4>
+
+Use X86 Vector Instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+DLB supports using x86 vector instructions to optimize the data path.
+
+The default mode of operation is to use scalar instructions, but
+the use of vector instructions can be enabled in the devargs, as
+follows
+
+ .. code-block:: console