doc/guides/eventdevs/dlb2.rst

   1 ..  SPDX-License-Identifier: BSD-3-Clause
   2     Copyright(c) 2020 Intel Corporation.
   3
   4 Driver for the Intel® Dynamic Load Balancer (DLB)
   5 =================================================
   6
   7 The DPDK DLB poll mode driver supports the Intel® Dynamic Load Balancer,
   8 hardware versions 2.0 and 2.5.
   9
  10 Prerequisites
  11 -------------
  12
  13 Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup
  14 the basic DPDK environment.
  15
  16 Configuration
  17 -------------
  18
  19 The DLB PF PMD is a user-space PMD that uses VFIO to gain direct
  20 device access. To use this operation mode, the PCIe PF device must be bound
  21 to a DPDK-compatible VFIO driver, such as vfio-pci.
  22
  23 Eventdev API Notes
  24 ------------------
  25
  26 The DLB PMD provides the functions of a DPDK event device; specifically, it
  27 supports atomic, ordered, and parallel scheduling events from queues to ports.
  28 However, the DLB hardware is not a perfect match to the eventdev API. Some DLB
  29 features are abstracted by the PMD such as directed ports.
  30
  31 In general the DLB PMD is designed for ease-of-use and does not require a
  32 detailed understanding of the hardware, but these details are important when
  33 writing high-performance code. This section describes the places where the
  34 eventdev API and DLB misalign.
  35
  36 Scheduling Domain Configuration
  37 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  38
  39 DLB supports 32 scheduling domains.
  40 When one is configured, it allocates load-balanced and
  41 directed queues, ports, credits, and other hardware resources. Some
  42 resource allocations are user-controlled -- the number of queues, for example
  43 -- and others, like credit pools (one directed and one load-balanced pool per
  44 scheduling domain), are not.
  45
  46 The DLB is a closed system eventdev, and as such the ``nb_events_limit`` device
  47 setup argument and the per-port ``new_event_threshold`` argument apply as
  48 defined in the eventdev header file. The limit is applied to all enqueues,
  49 regardless of whether it will consume a directed or load-balanced credit.
  50
  51 Load-Balanced Queues
  52 ~~~~~~~~~~~~~~~~~~~~
  53
  54 A load-balanced queue can support atomic and ordered scheduling, or atomic and
  55 unordered scheduling, but not atomic and unordered and ordered scheduling. A
  56 queue's scheduling types are controlled by the event queue configuration.
  57
  58 If the user sets the ``RTE_EVENT_QUEUE_CFG_ALL_TYPES`` flag, the
  59 ``nb_atomic_order_sequences`` determines the supported scheduling types.
  60 With non-zero ``nb_atomic_order_sequences``, the queue is configured for atomic
  61 and ordered scheduling. In this case, ``RTE_SCHED_TYPE_PARALLEL`` scheduling is
  62 supported by scheduling those events as ordered events.  Note that when the
  63 event is dequeued, its sched_type will be ``RTE_SCHED_TYPE_ORDERED``. Else if
  64 ``nb_atomic_order_sequences`` is zero, the queue is configured for atomic and
  65 unordered scheduling. In this case, ``RTE_SCHED_TYPE_ORDERED`` is unsupported.
  66
  67 If the ``RTE_EVENT_QUEUE_CFG_ALL_TYPES`` flag is not set, schedule_type
  68 dictates the queue's scheduling type.
  69
  70 The ``nb_atomic_order_sequences`` queue configuration field sets the ordered
  71 queue's reorder buffer size.  DLB has 2 groups of ordered queues, where each
  72 group is configured to contain either 1 queue with 1024 reorder entries, 2
  73 queues with 512 reorder entries, and so on down to 32 queues with 32 entries.
  74
  75 When a load-balanced queue is created, the PMD will configure a new sequence
  76 number group on-demand if num_sequence_numbers does not match a pre-existing
  77 group with available reorder buffer entries. If all sequence number groups are
  78 in use, no new group will be created and queue configuration will fail. (Note
  79 that when the PMD is used with a virtual DLB device, it cannot change the
  80 sequence number configuration.)
  81
  82 The queue's ``nb_atomic_flows`` parameter is ignored by the DLB PMD, because
  83 the DLB does not limit the number of flows a queue can track. In the DLB, all
  84 load-balanced queues can use the full 16-bit flow ID range.
  85
  86 Load-balanced and Directed Ports
  87 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  88
  89 DLB ports come in two flavors: load-balanced and directed. The eventdev API
  90 does not have the same concept, but it has a similar one: ports and queues that
  91 are singly-linked (i.e. linked to a single queue or port, respectively).
  92
  93 The ``rte_event_dev_info_get()`` function reports the number of available
  94 event ports and queues (among other things). For the DLB PMD, max_event_ports
  95 and max_event_queues report the number of available load-balanced ports and
  96 queues, and max_single_link_event_port_queue_pairs reports the number of
  97 available directed ports and queues.
  98
  99 When a scheduling domain is created in ``rte_event_dev_configure()``, the user
 100 specifies ``nb_event_ports`` and ``nb_single_link_event_port_queues``, which
 101 control the total number of ports (load-balanced and directed) and the number
 102 of directed ports. Hence, the number of requested load-balanced ports is
 103 ``nb_event_ports - nb_single_link_event_ports``. The ``nb_event_queues`` field
 104 specifies the total number of queues (load-balanced and directed). The number
 105 of directed queues comes from ``nb_single_link_event_port_queues``, since
 106 directed ports and queues come in pairs.
 107
 108 When a port is setup, the ``RTE_EVENT_PORT_CFG_SINGLE_LINK`` flag determines
 109 whether it should be configured as a directed (the flag is set) or a
 110 load-balanced (the flag is unset) port. Similarly, the
 111 ``RTE_EVENT_QUEUE_CFG_SINGLE_LINK`` queue configuration flag controls
 112 whether it is a directed or load-balanced queue.
 113
 114 Load-balanced ports can only be linked to load-balanced queues, and directed
 115 ports can only be linked to directed queues. Furthermore, directed ports can
 116 only be linked to a single directed queue (and vice versa), and that link
 117 cannot change after the eventdev is started.
 118
 119 The eventdev API does not have a directed scheduling type. To support directed
 120 traffic, the DLB PMD detects when an event is being sent to a directed queue
 121 and overrides its scheduling type. Note that the originally selected scheduling
 122 type (atomic, ordered, or parallel) is not preserved, and an event's sched_type
 123 will be set to ``RTE_SCHED_TYPE_ATOMIC`` when it is dequeued from a directed
 124 port.
 125
 126 Finally, even though all 3 event types are supported on the same QID by
 127 converting unordered events to ordered, such use should be discouraged as much
 128 as possible, since mixing types on the same queue uses valuable reorder
 129 resources, and orders events which do not require ordering.
 130
 131 Flow ID
 132 ~~~~~~~
 133
 134 The flow ID field is preserved in the event when it is scheduled in the
 135 DLB.
 136
 137 Hardware Credits
 138 ~~~~~~~~~~~~~~~~
 139
 140 DLB uses a hardware credit scheme to prevent software from overflowing hardware
 141 event storage, with each unit of storage represented by a credit. A port spends
 142 a credit to enqueue an event, and hardware refills the ports with credits as the
 143 events are scheduled to ports. Refills come from credit pools.
 144
 145 For DLB v2.5, there is a single credit pool used for both load balanced and
 146 directed traffic.
 147
 148 For DLB v2.0, each port is a member of both a load-balanced credit pool and a
 149 directed credit pool. The load-balanced credits are used to enqueue to
 150 load-balanced queues, and directed credits are used for directed queues.
 151 These pools' sizes are controlled by the nb_events_limit field in struct
 152 rte_event_dev_config. The load-balanced pool is sized to contain
 153 nb_events_limit credits, and the directed pool is sized to contain
 154 nb_events_limit/4 credits. The directed pool size can be overridden with the
 155 num_dir_credits devargs argument, like so:
 156
 157     .. code-block:: console
 158
 159        --allow ea:00.0,num_dir_credits=<value>
 160
 161 This can be used if the default allocation is too low or too high for the
 162 specific application needs. The PMD also supports a devarg that limits the
 163 max_num_events reported by rte_event_dev_info_get():
 164
 165     .. code-block:: console
 166
 167        --allow ea:00.0,max_num_events=<value>
 168
 169 By default, max_num_events is reported as the total available load-balanced
 170 credits. If multiple DLB-based applications are being used, it may be desirable
 171 to control how many load-balanced credits each application uses, particularly
 172 when application(s) are written to configure nb_events_limit equal to the
 173 reported max_num_events.
 174
 175 Each port is a member of both credit pools. A port's credit allocation is
 176 defined by its low watermark, high watermark, and refill quanta. These three
 177 parameters are calculated by the DLB PMD like so:
 178
 179 - The load-balanced high watermark is set to the port's enqueue_depth.
 180   The directed high watermark is set to the minimum of the enqueue_depth and
 181   the directed pool size divided by the total number of ports.
 182 - The refill quanta is set to half the high watermark.
 183 - The low watermark is set to the minimum of 16 and the refill quanta.
 184
 185 When the eventdev is started, each port is pre-allocated a high watermark's
 186 worth of credits. For example, if an eventdev contains four ports with enqueue
 187 depths of 32 and a load-balanced credit pool size of 4096, each port will start
 188 with 32 load-balanced credits, and there will be 3968 credits available to
 189 replenish the ports. Thus, a single port is not capable of enqueueing up to the
 190 nb_events_limit (without any events being dequeued), since the other ports are
 191 retaining their initial credit allocation; in short, all ports must enqueue in
 192 order to reach the limit.
 193
 194 If a port attempts to enqueue and has no credits available, the enqueue
 195 operation will fail and the application must retry the enqueue. Credits are
 196 replenished asynchronously by the DLB hardware.
 197
 198 Software Credits
 199 ~~~~~~~~~~~~~~~~
 200
 201 The DLB is a "closed system" event dev, and the DLB PMD layers a software
 202 credit scheme on top of the hardware credit scheme in order to comply with
 203 the per-port backpressure described in the eventdev API.
 204
 205 The DLB's hardware scheme is local to a queue/pipeline stage: a port spends a
 206 credit when it enqueues to a queue, and credits are later replenished after the
 207 events are dequeued and released.
 208
 209 In the software credit scheme, a credit is consumed when a new (.op =
 210 RTE_EVENT_OP_NEW) event is injected into the system, and the credit is
 211 replenished when the event is released from the system (either explicitly with
 212 RTE_EVENT_OP_RELEASE or implicitly in dequeue_burst()).
 213
 214 In this model, an event is "in the system" from its first enqueue into eventdev
 215 until it is last dequeued. If the event goes through multiple event queues, it
 216 is still considered "in the system" while a worker thread is processing it.
 217
 218 A port will fail to enqueue if the number of events in the system exceeds its
 219 ``new_event_threshold`` (specified at port setup time). A port will also fail
 220 to enqueue if it lacks enough hardware credits to enqueue; load-balanced
 221 credits are used to enqueue to a load-balanced queue, and directed credits are
 222 used to enqueue to a directed queue.
 223
 224 The out-of-credit situations are typically transient, and an eventdev
 225 application using the DLB ought to retry its enqueues if they fail.
 226 If enqueue fails, DLB PMD sets rte_errno as follows:
 227
 228 - -ENOSPC: Credit exhaustion (either hardware or software)
 229 - -EINVAL: Invalid argument, such as port ID, queue ID, or sched_type.
 230
 231 Depending on the pipeline the application has constructed, it's possible to
 232 enter a credit deadlock scenario wherein the worker thread lacks the credit
 233 to enqueue an event, and it must dequeue an event before it can recover the
 234 credit. If the worker thread retries its enqueue indefinitely, it will not
 235 make forward progress. Such deadlock is possible if the application has event
 236 "loops", in which an event in dequeued from queue A and later enqueued back to
 237 queue A.
 238
 239 Due to this, workers should stop retrying after a time, release the events it
 240 is attempting to enqueue, and dequeue more events. It is important that the
 241 worker release the events and don't simply set them aside to retry the enqueue
 242 again later, because the port has limited history list size (by default, twice
 243 the port's dequeue_depth).
 244
 245 Priority
 246 ~~~~~~~~
 247
 248 The DLB supports event priority and per-port queue service priority, as
 249 described in the eventdev header file. The DLB does not support 'global' event
 250 queue priority established at queue creation time.
 251
 252 DLB supports 4 event and queue service priority levels. For both priority types,
 253 the PMD uses the upper three bits of the priority field to determine the DLB
 254 priority, discarding the 5 least significant bits. But least significant bit out
 255 of 3 priority bits is effectively ignored for binning into 4 priorities. The
 256 discarded 5 least significant event priority bits are not preserved when an event
 257 is enqueued.
 258
 259 Note that event priority only works within the same event type.
 260 When atomic and ordered or unordered events are enqueued to same QID, priority
 261 across the types is always equal, and both types are served in a round robin manner.
 262
 263 Reconfiguration
 264 ~~~~~~~~~~~~~~~
 265
 266 The Eventdev API allows one to reconfigure a device, its ports, and its queues
 267 by first stopping the device, calling the configuration function(s), then
 268 restarting the device. The DLB does not support configuring an individual queue
 269 or port without first reconfiguring the entire device, however, so there are
 270 certain reconfiguration sequences that are valid in the eventdev API but not
 271 supported by the PMD.
 272
 273 Specifically, the PMD supports the following configuration sequence:
 274 1. Configure and start the device
 275 2. Stop the device
 276 3. (Optional) Reconfigure the device
 277 4. (Optional) If step 3 is run:
 278
 279    a. Setup queue(s). The reconfigured queue(s) lose their previous port links.
 280    b. The reconfigured port(s) lose their previous queue links.
 281
 282 5. (Optional, only if steps 4a and 4b are run) Link port(s) to queue(s)
 283 6. Restart the device. If the device is reconfigured in step 3 but one or more
 284    of its ports or queues are not, the PMD will apply their previous
 285    configuration (including port->queue links) at this time.
 286
 287 The PMD does not support the following configuration sequences:
 288 1. Configure and start the device
 289 2. Stop the device
 290 3. Setup queue or setup port
 291 4. Start the device
 292
 293 This sequence is not supported because the event device must be reconfigured
 294 before its ports or queues can be.
 295
 296 Atomic Inflights Allocation
 297 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 298
 299 In the last stage prior to scheduling an atomic event to a CQ, DLB holds the
 300 inflight event in a temporary buffer that is divided among load-balanced
 301 queues. If a queue's atomic buffer storage fills up, this can result in
 302 head-of-line-blocking. For example:
 303
 304 - An LDB queue allocated N atomic buffer entries
 305 - All N entries are filled with events from flow X, which is pinned to CQ 0.
 306
 307 Until CQ 0 releases 1+ events, no other atomic flows for that LDB queue can be
 308 scheduled. The likelihood of this case depends on the eventdev configuration,
 309 traffic behavior, event processing latency, potential for a worker to be
 310 interrupted or otherwise delayed, etc.
 311
 312 By default, the PMD allocates 16 buffer entries for each load-balanced queue,
 313 which provides an even division across all 128 queues but potentially wastes
 314 buffer space (e.g. if not all queues are used, or aren't used for atomic
 315 scheduling).
 316
 317 The PMD provides a dev arg to override the default per-queue allocation. To
 318 increase per-queue atomic-inflight allocation to (for example) 64:
 319
 320     .. code-block:: console
 321
 322        --allow ea:00.0,atm_inflights=64
 323
 324 QID Depth Threshold
 325 ~~~~~~~~~~~~~~~~~~~
 326
 327 DLB supports setting and tracking queue depth thresholds. Hardware uses
 328 the thresholds to track how full a queue is compared to its threshold.
 329 Four buckets are used
 330
 331 - Less than or equal to 50% of queue depth threshold
 332 - Greater than 50%, but less than or equal to 75% of depth threshold
 333 - Greater than 75%, but less than or equal to 100% of depth threshold
 334 - Greater than 100% of depth thresholds
 335
 336 Per queue threshold metrics are tracked in the DLB xstats, and are also
 337 returned in the impl_opaque field of each received event.
 338
 339 The per qid threshold can be specified as part of the device args, and
 340 can be applied to all queue, a range of queues, or a single queue, as
 341 shown below.
 342
 343     .. code-block:: console
 344
 345        --allow ea:00.0,qid_depth_thresh=all:<threshold_value>
 346        --allow ea:00.0,qid_depth_thresh=qidA-qidB:<threshold_value>
 347        --allow ea:00.0,qid_depth_thresh=qid:<threshold_value>
 348
 349 Class of service
 350 ~~~~~~~~~~~~~~~~
 351
 352 DLB supports provisioning the DLB bandwidth into 4 classes of service.
 353
 354 - Class 4 corresponds to 40% of the DLB hardware bandwidth
 355 - Class 3 corresponds to 30% of the DLB hardware bandwidth
 356 - Class 2 corresponds to 20% of the DLB hardware bandwidth
 357 - Class 1 corresponds to 10% of the DLB hardware bandwidth
 358 - Class 0 corresponds to don't care
 359
 360 The classes are applied globally to the set of ports contained in this
 361 scheduling domain, which is more appropriate for the bifurcated
 362 PMD than for the PF PMD, since the PF PMD supports just 1 scheduling
 363 domain.
 364
 365 Class of service can be specified in the devargs, as follows
 366
 367     .. code-block:: console
 368
 369        --allow ea:00.0,cos=<0..4>
 370
 371 Use X86 Vector Instructions
 372 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 373
 374 DLB supports using x86 vector instructions to optimize the data path.
 375
 376 The default mode of operation is to use scalar instructions, but
 377 the use of vector instructions can be enabled in the devargs, as
 378 follows
 379
 380     .. code-block:: console
 381
 382        --allow ea:00.0,vector_opts_enabled=<y/Y>