1 .. SPDX-License-Identifier: BSD-3-Clause
2 Copyright(c) 2019 Intel Corporation.
4 .. include:: <isonum.txt>
9 The ``ioat`` rawdev driver provides a poll-mode driver (PMD) for Intel\ |reg|
10 Data Streaming Accelerator `(Intel DSA)
11 <https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator>`_ and for Intel\ |reg|
12 QuickData Technology, part of Intel\ |reg| I/O Acceleration Technology
14 <https://www.intel.com/content/www/us/en/wireless-network/accel-technology.html>`_.
15 This PMD, when used on supported hardware, allows data copies, for example,
16 cloning packet data, to be accelerated by that hardware rather than having to
17 be done by software, freeing up CPU cycles for other tasks.
20 ----------------------
22 The ``dpdk-devbind.py`` script, included with DPDK,
23 can be used to show the presence of supported hardware.
24 Running ``dpdk-devbind.py --status-dev misc`` will show all the miscellaneous,
25 or rawdev-based devices on the system.
26 For Intel\ |reg| QuickData Technology devices, the hardware will be often listed as "Crystal Beach DMA",
28 For Intel\ |reg| DSA devices, they are currently (at time of writing) appearing as devices with type "0b25",
29 due to the absence of pci-id database entries for them at this point.
34 For builds using ``meson`` and ``ninja``, the driver will be built when the target platform is x86-based.
35 No additional compilation steps are necessary.
40 Depending on support provided by the PMD, HW devices can either use the kernel configured driver
41 or be bound to a user-space IO driver for use.
42 For example, Intel\ |reg| DSA devices can use the IDXD kernel driver or DPDK-supported drivers,
45 Intel\ |reg| DSA devices using idxd kernel driver
46 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48 To use a Intel\ |reg| DSA device bound to the IDXD kernel driver, the device must first be configured.
49 The `accel-config <https://github.com/intel/idxd-config>`_ utility library can be used for configuration.
52 The device configuration can also be done by directly interacting with the sysfs nodes.
53 An example of how this may be done can be seen in the script ``dpdk_idxd_cfg.py``
54 included in the driver source directory.
56 There are some mandatory configuration steps before being able to use a device with an application.
57 The internal engines, which do the copies or other operations,
58 and the work-queues, which are used by applications to assign work to the device,
59 need to be assigned to groups, and the various other configuration options,
60 such as priority or queue depth, need to be set for each queue.
62 To assign an engine to a group::
64 $ accel-config config-engine dsa0/engine0.0 --group-id=0
65 $ accel-config config-engine dsa0/engine0.1 --group-id=1
67 To assign work queues to groups for passing descriptors to the engines a similar accel-config command can be used.
68 However, the work queues also need to be configured depending on the use-case.
69 Some configuration options include:
71 * mode (Dedicated/Shared): Indicates whether a WQ may accept jobs from multiple queues simultaneously.
72 * priority: WQ priority between 1 and 15. Larger value means higher priority.
73 * wq-size: the size of the WQ. Sum of all WQ sizes must be less that the total-size defined by the device.
74 * type: WQ type (kernel/mdev/user). Determines how the device is presented.
75 * name: identifier given to the WQ.
77 Example configuration for a work queue::
79 $ accel-config config-wq dsa0/wq0.0 --group-id=0 \
80 --mode=dedicated --priority=10 --wq-size=8 \
81 --type=user --name=app1
83 Once the devices have been configured, they need to be enabled::
85 $ accel-config enable-device dsa0
86 $ accel-config enable-wq dsa0/wq0.0
88 Check the device configuration::
92 Devices using VFIO/UIO drivers
93 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
95 The HW devices to be used will need to be bound to a user-space IO driver for use.
96 The ``dpdk-devbind.py`` script can be used to view the state of the devices
97 and to bind them to a suitable DPDK-supported driver, such as ``vfio-pci``.
100 $ dpdk-devbind.py -b vfio-pci 00:04.0 00:04.1
102 Device Probing and Initialization
103 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
105 For devices bound to a suitable DPDK-supported VFIO/UIO driver, the HW devices will
106 be found as part of the device scan done at application initialization time without
107 the need to pass parameters to the application.
109 If the device is bound to the IDXD kernel driver (and previously configured with sysfs),
110 then a specific work queue needs to be passed to the application via a vdev parameter.
111 This vdev parameter take the driver name and work queue name as parameters.
112 For example, to use work queue 0 on Intel\ |reg| DSA instance 0::
114 $ dpdk-test --no-pci --vdev=rawdev_idxd,wq=0.0
116 Once probed successfully, the device will appear as a ``rawdev``, that is a
117 "raw device type" inside DPDK, and can be accessed using APIs from the
118 ``rte_rawdev`` library.
120 Using IOAT Rawdev Devices
121 --------------------------
123 To use the devices from an application, the rawdev API can be used, along
124 with definitions taken from the device-specific header file
125 ``rte_ioat_rawdev.h``. This header is needed to get the definition of
126 structure parameters used by some of the rawdev APIs for IOAT rawdev
127 devices, as well as providing key functions for using the device for memory
130 Getting Device Information
131 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
133 Basic information about each rawdev device can be queried using the
134 ``rte_rawdev_info_get()`` API. For most applications, this API will be
135 needed to verify that the rawdev in question is of the expected type. For
136 example, the following code snippet can be used to identify an IOAT
137 rawdev device for use by an application:
141 for (i = 0; i < count && !found; i++) {
142 struct rte_rawdev_info info = { .dev_private = NULL };
143 found = (rte_rawdev_info_get(i, &info, 0) == 0 &&
144 strcmp(info.driver_name,
145 IOAT_PMD_RAWDEV_NAME_STR) == 0);
148 When calling the ``rte_rawdev_info_get()`` API for an IOAT rawdev device,
149 the ``dev_private`` field in the ``rte_rawdev_info`` struct should either
150 be NULL, or else be set to point to a structure of type
151 ``rte_ioat_rawdev_config``, in which case the size of the configured device
152 input ring will be returned in that structure.
155 ~~~~~~~~~~~~~~~~~~~~~
157 Configuring an IOAT rawdev device is done using the
158 ``rte_rawdev_configure()`` API, which takes the same structure parameters
159 as the, previously referenced, ``rte_rawdev_info_get()`` API. The main
160 difference is that, because the parameter is used as input rather than
161 output, the ``dev_private`` structure element cannot be NULL, and must
162 point to a valid ``rte_ioat_rawdev_config`` structure, containing the ring
163 size to be used by the device. The ring size must be a power of two,
165 If it is not needed, the tracking by the driver of user-provided completion
166 handles may be disabled by setting the ``hdls_disable`` flag in
167 the configuration structure also.
169 The following code shows how the device is configured in
170 ``test_ioat_rawdev.c``:
174 #define IOAT_TEST_RINGSIZE 512
175 struct rte_ioat_rawdev_config p = { .ring_size = -1 };
176 struct rte_rawdev_info info = { .dev_private = &p };
180 p.ring_size = IOAT_TEST_RINGSIZE;
181 if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) {
182 printf("Error with rte_rawdev_configure()\n");
186 Once configured, the device can then be made ready for use by calling the
187 ``rte_rawdev_start()`` API.
189 Performing Data Copies
190 ~~~~~~~~~~~~~~~~~~~~~~~
192 To perform data copies using IOAT rawdev devices, the functions
193 ``rte_ioat_enqueue_copy()`` and ``rte_ioat_perform_ops()`` should be used.
194 Once copies have been completed, the completion will be reported back when
195 the application calls ``rte_ioat_completed_ops()``.
197 The ``rte_ioat_enqueue_copy()`` function enqueues a single copy to the
198 device ring for copying at a later point. The parameters to that function
199 include the IOVA addresses of both the source and destination buffers,
200 as well as two "handles" to be returned to the user when the copy is
201 completed. These handles can be arbitrary values, but two are provided so
202 that the library can track handles for both source and destination on
203 behalf of the user, e.g. virtual addresses for the buffers, or mbuf
204 pointers if packet data is being copied.
206 While the ``rte_ioat_enqueue_copy()`` function enqueues a copy operation on
207 the device ring, the copy will not actually be performed until after the
208 application calls the ``rte_ioat_perform_ops()`` function. This function
209 informs the device hardware of the elements enqueued on the ring, and the
210 device will begin to process them. It is expected that, for efficiency
211 reasons, a burst of operations will be enqueued to the device via multiple
212 enqueue calls between calls to the ``rte_ioat_perform_ops()`` function.
214 The following code from ``test_ioat_rawdev.c`` demonstrates how to enqueue
215 a burst of copies to the device and start the hardware processing of them:
219 struct rte_mbuf *srcs[32], *dsts[32];
222 for (i = 0; i < RTE_DIM(srcs); i++) {
225 srcs[i] = rte_pktmbuf_alloc(pool);
226 dsts[i] = rte_pktmbuf_alloc(pool);
227 srcs[i]->data_len = srcs[i]->pkt_len = length;
228 dsts[i]->data_len = dsts[i]->pkt_len = length;
229 src_data = rte_pktmbuf_mtod(srcs[i], char *);
231 for (j = 0; j < length; j++)
232 src_data[j] = rand() & 0xFF;
234 if (rte_ioat_enqueue_copy(dev_id,
235 srcs[i]->buf_iova + srcs[i]->data_off,
236 dsts[i]->buf_iova + dsts[i]->data_off,
239 (uintptr_t)dsts[i]) != 1) {
240 printf("Error with rte_ioat_enqueue_copy for buffer %u\n",
245 rte_ioat_perform_ops(dev_id);
247 To retrieve information about completed copies, the API
248 ``rte_ioat_completed_ops()`` should be used. This API will return to the
249 application a set of completion handles passed in when the relevant copies
252 The following code from ``test_ioat_rawdev.c`` shows the test code
253 retrieving information about the completed copies and validating the data
254 is correct before freeing the data buffers using the returned handles:
258 if (rte_ioat_completed_ops(dev_id, 64, (void *)completed_src,
259 (void *)completed_dst) != RTE_DIM(srcs)) {
260 printf("Error with rte_ioat_completed_ops\n");
263 for (i = 0; i < RTE_DIM(srcs); i++) {
264 char *src_data, *dst_data;
266 if (completed_src[i] != srcs[i]) {
267 printf("Error with source pointer %u\n", i);
270 if (completed_dst[i] != dsts[i]) {
271 printf("Error with dest pointer %u\n", i);
275 src_data = rte_pktmbuf_mtod(srcs[i], char *);
276 dst_data = rte_pktmbuf_mtod(dsts[i], char *);
277 for (j = 0; j < length; j++)
278 if (src_data[j] != dst_data[j]) {
279 printf("Error with copy of packet %u, byte %u\n",
283 rte_pktmbuf_free(srcs[i]);
284 rte_pktmbuf_free(dsts[i]);
288 Filling an Area of Memory
289 ~~~~~~~~~~~~~~~~~~~~~~~~~~
291 The IOAT driver also has support for the ``fill`` operation, where an area
292 of memory is overwritten, or filled, with a short pattern of data.
293 Fill operations can be performed in much the same was as copy operations
294 described above, just using the ``rte_ioat_enqueue_fill()`` function rather
295 than the ``rte_ioat_enqueue_copy()`` function.
298 Querying Device Statistics
299 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
301 The statistics from the IOAT rawdev device can be got via the xstats
302 functions in the ``rte_rawdev`` library, i.e.
303 ``rte_rawdev_xstats_names_get()``, ``rte_rawdev_xstats_get()`` and
304 ``rte_rawdev_xstats_by_name_get``. The statistics returned for each device
307 * ``failed_enqueues``
308 * ``successful_enqueues``
310 * ``copies_completed``