doc/guides/eventdevs/dlb2.rst

   1 ..  SPDX-License-Identifier: BSD-3-Clause
   2     Copyright(c) 2020 Intel Corporation.
   3
   4 Driver for the Intel® Dynamic Load Balancer (DLB2)
   5 ==================================================
   6
   7 The DPDK dlb poll mode driver supports the Intel® Dynamic Load Balancer.
   8
   9 Prerequisites
  10 -------------
  11
  12 Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup
  13 the basic DPDK environment.
  14
  15 Configuration
  16 -------------
  17
  18 The DLB2 PF PMD is a user-space PMD that uses VFIO to gain direct
  19 device access. To use this operation mode, the PCIe PF device must be bound
  20 to a DPDK-compatible VFIO driver, such as vfio-pci.
  21
  22 Eventdev API Notes
  23 ------------------
  24
  25 The DLB2 provides the functions of a DPDK event device; specifically, it
  26 supports atomic, ordered, and parallel scheduling events from queues to ports.
  27 However, the DLB2 hardware is not a perfect match to the eventdev API. Some DLB2
  28 features are abstracted by the PMD such as directed ports.
  29
  30 In general the dlb PMD is designed for ease-of-use and does not require a
  31 detailed understanding of the hardware, but these details are important when
  32 writing high-performance code. This section describes the places where the
  33 eventdev API and DLB2 misalign.
  34
  35 Scheduling Domain Configuration
  36 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  37
  38 There are 32 scheduling domainis the DLB2.
  39 When one is configured, it allocates load-balanced and
  40 directed queues, ports, credits, and other hardware resources. Some
  41 resource allocations are user-controlled -- the number of queues, for example
  42 -- and others, like credit pools (one directed and one load-balanced pool per
  43 scheduling domain), are not.
  44
  45 The DLB2 is a closed system eventdev, and as such the ``nb_events_limit`` device
  46 setup argument and the per-port ``new_event_threshold`` argument apply as
  47 defined in the eventdev header file. The limit is applied to all enqueues,
  48 regardless of whether it will consume a directed or load-balanced credit.
  49
  50 Load-Balanced Queues
  51 ~~~~~~~~~~~~~~~~~~~~
  52
  53 A load-balanced queue can support atomic and ordered scheduling, or atomic and
  54 unordered scheduling, but not atomic and unordered and ordered scheduling. A
  55 queue's scheduling types are controlled by the event queue configuration.
  56
  57 If the user sets the ``RTE_EVENT_QUEUE_CFG_ALL_TYPES`` flag, the
  58 ``nb_atomic_order_sequences`` determines the supported scheduling types.
  59 With non-zero ``nb_atomic_order_sequences``, the queue is configured for atomic
  60 and ordered scheduling. In this case, ``RTE_SCHED_TYPE_PARALLEL`` scheduling is
  61 supported by scheduling those events as ordered events.  Note that when the
  62 event is dequeued, its sched_type will be ``RTE_SCHED_TYPE_ORDERED``. Else if
  63 ``nb_atomic_order_sequences`` is zero, the queue is configured for atomic and
  64 unordered scheduling. In this case, ``RTE_SCHED_TYPE_ORDERED`` is unsupported.
  65
  66 If the ``RTE_EVENT_QUEUE_CFG_ALL_TYPES`` flag is not set, schedule_type
  67 dictates the queue's scheduling type.
  68
  69 The ``nb_atomic_order_sequences`` queue configuration field sets the ordered
  70 queue's reorder buffer size.  DLB2 has 4 groups of ordered queues, where each
  71 group is configured to contain either 1 queue with 1024 reorder entries, 2
  72 queues with 512 reorder entries, and so on down to 32 queues with 32 entries.
  73
  74 When a load-balanced queue is created, the PMD will configure a new sequence
  75 number group on-demand if num_sequence_numbers does not match a pre-existing
  76 group with available reorder buffer entries. If all sequence number groups are
  77 in use, no new group will be created and queue configuration will fail. (Note
  78 that when the PMD is used with a virtual DLB2 device, it cannot change the
  79 sequence number configuration.)
  80
  81 The queue's ``nb_atomic_flows`` parameter is ignored by the DLB2 PMD, because
  82 the DLB2 does not limit the number of flows a queue can track. In the DLB2, all
  83 load-balanced queues can use the full 16-bit flow ID range.
  84
  85 Load-Balanced Queues
  86 ~~~~~~~~~~~~~~~~~~~~
  87
  88 A load-balanced queue can support atomic and ordered scheduling, or atomic and
  89 unordered scheduling, but not atomic and unordered and ordered scheduling. A
  90 queue's scheduling types are controlled by the event queue configuration.
  91
  92 If the user sets the ``RTE_EVENT_QUEUE_CFG_ALL_TYPES`` flag, the
  93 ``nb_atomic_order_sequences`` determines the supported scheduling types.
  94 With non-zero ``nb_atomic_order_sequences``, the queue is configured for atomic
  95 and ordered scheduling. In this case, ``RTE_SCHED_TYPE_PARALLEL`` scheduling is
  96 supported by scheduling those events as ordered events.  Note that when the
  97 event is dequeued, its sched_type will be ``RTE_SCHED_TYPE_ORDERED``. Else if
  98 ``nb_atomic_order_sequences`` is zero, the queue is configured for atomic and
  99 unordered scheduling. In this case, ``RTE_SCHED_TYPE_ORDERED`` is unsupported.
 100
 101 If the ``RTE_EVENT_QUEUE_CFG_ALL_TYPES`` flag is not set, schedule_type
 102 dictates the queue's scheduling type.
 103
 104 The ``nb_atomic_order_sequences`` queue configuration field sets the ordered
 105 queue's reorder buffer size.  DLB2 has 4 groups of ordered queues, where each
 106 group is configured to contain either 1 queue with 1024 reorder entries, 2
 107 queues with 512 reorder entries, and so on down to 32 queues with 32 entries.
 108
 109 When a load-balanced queue is created, the PMD will configure a new sequence
 110 number group on-demand if num_sequence_numbers does not match a pre-existing
 111 group with available reorder buffer entries. If all sequence number groups are
 112 in use, no new group will be created and queue configuration will fail. (Note
 113 that when the PMD is used with a virtual DLB2 device, it cannot change the
 114 sequence number configuration.)
 115
 116 The queue's ``nb_atomic_flows`` parameter is ignored by the DLB2 PMD, because
 117 the DLB2 does not limit the number of flows a queue can track. In the DLB2, all
 118 load-balanced queues can use the full 16-bit flow ID range.
 119
 120 Load-balanced and Directed Ports
 121 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 122
 123 DLB2 ports come in two flavors: load-balanced and directed. The eventdev API
 124 does not have the same concept, but it has a similar one: ports and queues that
 125 are singly-linked (i.e. linked to a single queue or port, respectively).
 126
 127 The ``rte_event_dev_info_get()`` function reports the number of available
 128 event ports and queues (among other things). For the DLB2 PMD, max_event_ports
 129 and max_event_queues report the number of available load-balanced ports and
 130 queues, and max_single_link_event_port_queue_pairs reports the number of
 131 available directed ports and queues.
 132
 133 When a scheduling domain is created in ``rte_event_dev_configure()``, the user
 134 specifies ``nb_event_ports`` and ``nb_single_link_event_port_queues``, which
 135 control the total number of ports (load-balanced and directed) and the number
 136 of directed ports. Hence, the number of requested load-balanced ports is
 137 ``nb_event_ports - nb_single_link_event_ports``. The ``nb_event_queues`` field
 138 specifies the total number of queues (load-balanced and directed). The number
 139 of directed queues comes from ``nb_single_link_event_port_queues``, since
 140 directed ports and queues come in pairs.
 141
 142 When a port is setup, the ``RTE_EVENT_PORT_CFG_SINGLE_LINK`` flag determines
 143 whether it should be configured as a directed (the flag is set) or a
 144 load-balanced (the flag is unset) port. Similarly, the
 145 ``RTE_EVENT_QUEUE_CFG_SINGLE_LINK`` queue configuration flag controls
 146 whether it is a directed or load-balanced queue.
 147
 148 Load-balanced ports can only be linked to load-balanced queues, and directed
 149 ports can only be linked to directed queues. Furthermore, directed ports can
 150 only be linked to a single directed queue (and vice versa), and that link
 151 cannot change after the eventdev is started.
 152
 153 The eventdev API does not have a directed scheduling type. To support directed
 154 traffic, the dlb PMD detects when an event is being sent to a directed queue
 155 and overrides its scheduling type. Note that the originally selected scheduling
 156 type (atomic, ordered, or parallel) is not preserved, and an event's sched_type
 157 will be set to ``RTE_SCHED_TYPE_ATOMIC`` when it is dequeued from a directed
 158 port.
 159
 160 Flow ID
 161 ~~~~~~~
 162
 163 The flow ID field is preserved in the event when it is scheduled in the
 164 DLB2.
 165
 166 Hardware Credits
 167 ~~~~~~~~~~~~~~~~
 168
 169 DLB2 uses a hardware credit scheme to prevent software from overflowing hardware
 170 event storage, with each unit of storage represented by a credit. A port spends
 171 a credit to enqueue an event, and hardware refills the ports with credits as the
 172 events are scheduled to ports. Refills come from credit pools, and each port is
 173 a member of a load-balanced credit pool and a directed credit pool. The
 174 load-balanced credits are used to enqueue to load-balanced queues, and directed
 175 credits are used for directed queues.
 176
 177 A DLB2 eventdev contains one load-balanced and one directed credit pool. These
 178 pools' sizes are controlled by the nb_events_limit field in struct
 179 rte_event_dev_config. The load-balanced pool is sized to contain
 180 nb_events_limit credits, and the directed pool is sized to contain
 181 nb_events_limit/4 credits. The directed pool size can be overridden with the
 182 num_dir_credits vdev argument, like so:
 183
 184     .. code-block:: console
 185
 186        --vdev=dlb1_event,num_dir_credits=<value>
 187
 188 This can be used if the default allocation is too low or too high for the
 189 specific application needs. The PMD also supports a vdev arg that limits the
 190 max_num_events reported by rte_event_dev_info_get():
 191
 192     .. code-block:: console
 193
 194        --vdev=dlb1_event,max_num_events=<value>
 195
 196 By default, max_num_events is reported as the total available load-balanced
 197 credits. If multiple DLB2-based applications are being used, it may be desirable
 198 to control how many load-balanced credits each application uses, particularly
 199 when application(s) are written to configure nb_events_limit equal to the
 200 reported max_num_events.
 201
 202 Each port is a member of both credit pools. A port's credit allocation is
 203 defined by its low watermark, high watermark, and refill quanta. These three
 204 parameters are calculated by the dlb PMD like so:
 205
 206 - The load-balanced high watermark is set to the port's enqueue_depth.
 207   The directed high watermark is set to the minimum of the enqueue_depth and
 208   the directed pool size divided by the total number of ports.
 209 - The refill quanta is set to half the high watermark.
 210 - The low watermark is set to the minimum of 16 and the refill quanta.
 211
 212 When the eventdev is started, each port is pre-allocated a high watermark's
 213 worth of credits. For example, if an eventdev contains four ports with enqueue
 214 depths of 32 and a load-balanced credit pool size of 4096, each port will start
 215 with 32 load-balanced credits, and there will be 3968 credits available to
 216 replenish the ports. Thus, a single port is not capable of enqueueing up to the
 217 nb_events_limit (without any events being dequeued), since the other ports are
 218 retaining their initial credit allocation; in short, all ports must enqueue in
 219 order to reach the limit.
 220
 221 If a port attempts to enqueue and has no credits available, the enqueue
 222 operation will fail and the application must retry the enqueue. Credits are
 223 replenished asynchronously by the DLB2 hardware.
 224
 225 Software Credits
 226 ~~~~~~~~~~~~~~~~
 227
 228 The DLB2 is a "closed system" event dev, and the DLB2 PMD layers a software
 229 credit scheme on top of the hardware credit scheme in order to comply with
 230 the per-port backpressure described in the eventdev API.
 231
 232 The DLB2's hardware scheme is local to a queue/pipeline stage: a port spends a
 233 credit when it enqueues to a queue, and credits are later replenished after the
 234 events are dequeued and released.
 235
 236 In the software credit scheme, a credit is consumed when a new (.op =
 237 RTE_EVENT_OP_NEW) event is injected into the system, and the credit is
 238 replenished when the event is released from the system (either explicitly with
 239 RTE_EVENT_OP_RELEASE or implicitly in dequeue_burst()).
 240
 241 In this model, an event is "in the system" from its first enqueue into eventdev
 242 until it is last dequeued. If the event goes through multiple event queues, it
 243 is still considered "in the system" while a worker thread is processing it.
 244
 245 A port will fail to enqueue if the number of events in the system exceeds its
 246 ``new_event_threshold`` (specified at port setup time). A port will also fail
 247 to enqueue if it lacks enough hardware credits to enqueue; load-balanced
 248 credits are used to enqueue to a load-balanced queue, and directed credits are
 249 used to enqueue to a directed queue.
 250
 251 The out-of-credit situations are typically transient, and an eventdev
 252 application using the DLB2 ought to retry its enqueues if they fail.
 253 If enqueue fails, DLB2 PMD sets rte_errno as follows:
 254
 255 - -ENOSPC: Credit exhaustion (either hardware or software)
 256 - -EINVAL: Invalid argument, such as port ID, queue ID, or sched_type.
 257
 258 Depending on the pipeline the application has constructed, it's possible to
 259 enter a credit deadlock scenario wherein the worker thread lacks the credit
 260 to enqueue an event, and it must dequeue an event before it can recover the
 261 credit. If the worker thread retries its enqueue indefinitely, it will not
 262 make forward progress. Such deadlock is possible if the application has event
 263 "loops", in which an event in dequeued from queue A and later enqueued back to
 264 queue A.
 265
 266 Due to this, workers should stop retrying after a time, release the events it
 267 is attempting to enqueue, and dequeue more events. It is important that the
 268 worker release the events and don't simply set them aside to retry the enqueue
 269 again later, because the port has limited history list size (by default, twice
 270 the port's dequeue_depth).
 271
 272 Priority
 273 ~~~~~~~~
 274
 275 The DLB2 supports event priority and per-port queue service priority, as
 276 described in the eventdev header file. The DLB2 does not support 'global' event
 277 queue priority established at queue creation time.
 278
 279 DLB2 supports 8 event and queue service priority levels. For both priority
 280 types, the PMD uses the upper three bits of the priority field to determine the
 281 DLB2 priority, discarding the 5 least significant bits. The 5 least significant
 282 event priority bits are not preserved when an event is enqueued.
 283
 284 Reconfiguration
 285 ~~~~~~~~~~~~~~~
 286
 287 The Eventdev API allows one to reconfigure a device, its ports, and its queues
 288 by first stopping the device, calling the configuration function(s), then
 289 restarting the device. The DLB2 does not support configuring an individual queue
 290 or port without first reconfiguring the entire device, however, so there are
 291 certain reconfiguration sequences that are valid in the eventdev API but not
 292 supported by the PMD.
 293
 294 Specifically, the PMD supports the following configuration sequence:
 295 1. Configure and start the device
 296 2. Stop the device
 297 3. (Optional) Reconfigure the device
 298 4. (Optional) If step 3 is run:
 299
 300    a. Setup queue(s). The reconfigured queue(s) lose their previous port links.
 301    b. The reconfigured port(s) lose their previous queue links.
 302
 303 5. (Optional, only if steps 4a and 4b are run) Link port(s) to queue(s)
 304 6. Restart the device. If the device is reconfigured in step 3 but one or more
 305    of its ports or queues are not, the PMD will apply their previous
 306    configuration (including port->queue links) at this time.
 307
 308 The PMD does not support the following configuration sequences:
 309 1. Configure and start the device
 310 2. Stop the device
 311 3. Setup queue or setup port
 312 4. Start the device
 313
 314 This sequence is not supported because the event device must be reconfigured
 315 before its ports or queues can be.
 316
 317 Deferred Scheduling
 318 ~~~~~~~~~~~~~~~~~~~
 319
 320 The DLB2 PMD's default behavior for managing a CQ is to "pop" the CQ once per
 321 dequeued event before returning from rte_event_dequeue_burst(). This frees the
 322 corresponding entries in the CQ, which enables the DLB2 to schedule more events
 323 to it.
 324
 325 To support applications seeking finer-grained scheduling control -- for example
 326 deferring scheduling to get the best possible priority scheduling and
 327 load-balancing -- the PMD supports a deferred scheduling mode. In this mode,
 328 the CQ entry is not popped until the *subsequent* rte_event_dequeue_burst()
 329 call. This mode only applies to load-balanced event ports with dequeue depth of
 330 1.
 331
 332 To enable deferred scheduling, use the defer_sched vdev argument like so:
 333
 334     .. code-block:: console
 335
 336        --vdev=dlb1_event,defer_sched=on
 337
 338 Atomic Inflights Allocation
 339 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 340
 341 In the last stage prior to scheduling an atomic event to a CQ, DLB2 holds the
 342 inflight event in a temporary buffer that is divided among load-balanced
 343 queues. If a queue's atomic buffer storage fills up, this can result in
 344 head-of-line-blocking. For example:
 345
 346 - An LDB queue allocated N atomic buffer entries
 347 - All N entries are filled with events from flow X, which is pinned to CQ 0.
 348
 349 Until CQ 0 releases 1+ events, no other atomic flows for that LDB queue can be
 350 scheduled. The likelihood of this case depends on the eventdev configuration,
 351 traffic behavior, event processing latency, potential for a worker to be
 352 interrupted or otherwise delayed, etc.
 353
 354 By default, the PMD allocates 16 buffer entries for each load-balanced queue,
 355 which provides an even division across all 128 queues but potentially wastes
 356 buffer space (e.g. if not all queues are used, or aren't used for atomic
 357 scheduling).
 358
 359 The PMD provides a dev arg to override the default per-queue allocation. To
 360 increase a vdev's per-queue atomic-inflight allocation to (for example) 64:
 361
 362     .. code-block:: console
 363
 364        --vdev=dlb1_event,atm_inflights=64
 365
 366 QID Depth Threshold
 367 ~~~~~~~~~~~~~~~~~~~
 368
 369 DLB2 supports setting and tracking queue depth thresholds. Hardware uses
 370 the thresholds to track how full a queue is compared to its threshold.
 371 Four buckets are used
 372
 373 - Less than or equal to 50% of queue depth threshold
 374 - Greater than 50%, but less than or equal to 75% of depth threshold
 375 - Greater than 75%, but less than or equal to 100% of depth threshold
 376 - Greater than 100% of depth thresholds
 377
 378 Per queue threshold metrics are tracked in the DLB2 xstats, and are also
 379 returned in the impl_opaque field of each received event.
 380
 381 The per qid threshold can be specified as part of the device args, and
 382 can be applied to all queue, a range of queues, or a single queue, as
 383 shown below.
 384
 385     .. code-block:: console
 386
 387        --vdev=dlb2_event,qid_depth_thresh=all:<threshold_value>
 388        --vdev=dlb2_event,qid_depth_thresh=qidA-qidB:<threshold_value>
 389        --vdev=dlb2_event,qid_depth_thresh=qid:<threshold_value>
 390
 391 Class of service
 392 ~~~~~~~~~~~~~~~~
 393
 394 DLB2 supports provisioning the DLB2 bandwidth into 4 classes of service.
 395
 396 - Class 4 corresponds to 40% of the DLB2 hardware bandwidth
 397 - Class 3 corresponds to 30% of the DLB2 hardware bandwidth
 398 - Class 2 corresponds to 20% of the DLB2 hardware bandwidth
 399 - Class 1 corresponds to 10% of the DLB2 hardware bandwidth
 400 - Class 0 corresponds to don't care
 401
 402 The classes are applied globally to the set of ports contained in this
 403 scheduling domain, which is more appropriate for the bifurcated
 404 PMD than for the PF PMD, since the PF PMD supports just 1 scheduling
 405 domain.
 406
 407 Class of service can be specified in the devargs, as follows
 408
 409     .. code-block:: console
 410
 411        --vdev=dlb2_event,cos=<0..4>