doc/guides/gpus/cuda.rst

   1 .. SPDX-License-Identifier: BSD-3-Clause
   2    Copyright (c) 2021 NVIDIA Corporation & Affiliates
   3
   4 CUDA GPU driver
   5 ===============
   6
   7 The CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs.
   8 Information and documentation about these devices can be found on the
   9 `NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the
  10 `NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_.
  11
  12 Build dependencies
  13 ------------------
  14
  15 The CUDA GPU driver library has an header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``.
  16 To get these headers there are two options:
  17
  18 - Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_
  19   (either regular or stubs installation).
  20 - Download these two headers from this `CUDA headers
  21   <https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository.
  22
  23 You need to indicate to meson where CUDA headers files are through the CFLAGS variable.
  24 Three ways:
  25
  26 - Set ``export CFLAGS=-I/usr/local/cuda/include`` before building
  27 - Add CFLAGS in the meson command line ``CFLAGS=-I/usr/local/cuda/include meson build``
  28 - Add the ``-Dc_args`` in meson command line ``meson build -Dc_args=-I/usr/local/cuda/include``
  29
  30 If headers are not found, the CUDA GPU driver library is not built.
  31
  32 CPU map GPU memory
  33 ~~~~~~~~~~~~~~~~~~
  34
  35 To enable this gpudev feature (i.e. implement the ``rte_gpu_mem_cpu_map``),
  36 you need the `GDRCopy <https://github.com/NVIDIA/gdrcopy>`_ library and driver
  37 installed on your system.
  38
  39 A quick recipe to download, build and run GDRCopy library and driver:
  40
  41 .. code-block:: console
  42
  43   $ git clone https://github.com/NVIDIA/gdrcopy.git
  44   $ make
  45   $ # make install to install GDRCopy library system wide
  46   $ # Launch gdrdrv kernel module on the system
  47   $ sudo ./insmod.sh
  48
  49 You need to indicate to meson where GDRCopy headers files are as in case of CUDA headers.
  50 An example would be:
  51
  52 .. code-block:: console
  53
  54   $ meson build -Dc_args="-I/usr/local/cuda/include -I/path/to/gdrcopy/include"
  55
  56 If headers are not found, the CUDA GPU driver library is built without the CPU map capability
  57 and will return error if the application invokes the gpudev ``rte_gpu_mem_cpu_map`` function.
  58
  59
  60 CUDA Shared Library
  61 -------------------
  62
  63 To avoid any system configuration issue, the CUDA API **libcuda.so** shared library
  64 is not linked at building time because of a Meson bug that looks
  65 for `cudart` module even if the `meson.build` file only requires default `cuda` module.
  66
  67 **libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen``
  68 when the very first GPU is detected.
  69 If CUDA installation resides in a custom directory,
  70 the environment variable ``CUDA_PATH_L`` should specify where ``dlopen``
  71 can look for **libcuda.so**.
  72
  73 All CUDA API symbols are loaded at runtime as well.
  74 For this reason, to build the CUDA driver library,
  75 no need to install the CUDA library.
  76
  77 CPU map GPU memory
  78 ~~~~~~~~~~~~~~~~~~
  79
  80 Similarly to CUDA shared library, if the **libgdrapi.so** shared library
  81 is not installed in default locations (e.g. /usr/local/lib),
  82 you can use the variable ``GDRCOPY_PATH_L``.
  83
  84 As an example, to enable the CPU map feature sanity check,
  85 run the ``app/test-gpudev`` application with:
  86
  87 .. code-block:: console
  88
  89   $ sudo CUDA_PATH_L=/path/to/libcuda GDRCOPY_PATH_L=/path/to/libgdrapi ./build/app/dpdk-test-gpudev
  90
  91 Additionally, the ``gdrdrv`` kernel module built with the GDRCopy project
  92 has to be loaded on the system:
  93
  94 .. code-block:: console
  95
  96   $ lsmod | egrep gdrdrv
  97   gdrdrv                 20480  0
  98   nvidia              35307520  19 nvidia_uvm,nv_peer_mem,gdrdrv,nvidia_modeset
  99
 100
 101 Design
 102 ------
 103
 104 **librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API).
 105
 106 Goal of this driver library is not to provide a wrapper for the whole CUDA Driver API.
 107 Instead, the scope is to implement the generic features of gpudev API.
 108 For a CUDA application, integrating the gpudev library functions
 109 using the CUDA driver library is quite straightforward
 110 and doesn't create any compatibility problem.
 111
 112 Initialization
 113 ~~~~~~~~~~~~~~
 114
 115 During initialization, CUDA driver library detects NVIDIA physical GPUs
 116 on the system or specified via EAL device options (e.g. ``-a b6:00.0``).
 117 The driver initializes the CUDA driver environment through ``cuInit(0)`` function.
 118 For this reason, it's required to set any CUDA environment configuration before
 119 calling ``rte_eal_init`` function in the DPDK application.
 120
 121 If the CUDA driver environment has been already initialized, the ``cuInit(0)``
 122 in CUDA driver library has no effect.
 123
 124 CUDA Driver sub-contexts
 125 ~~~~~~~~~~~~~~~~~~~~~~~~
 126
 127 After initialization, a CUDA application can create multiple sub-contexts
 128 on GPU physical devices.
 129 Through gpudev library, is possible to register these sub-contexts
 130 in the CUDA driver library as child devices having as parent a GPU physical device.
 131
 132 CUDA driver library also supports `MPS
 133 <https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__.
 134
 135 GPU memory management
 136 ~~~~~~~~~~~~~~~~~~~~~
 137
 138 The CUDA driver library maintains a table of GPU memory addresses allocated
 139 and CPU memory addresses registered associated to the input CUDA context.
 140 Whenever the application tried to deallocate or deregister a memory address,
 141 if the address is not in the table the CUDA driver library will return an error.
 142
 143 Features
 144 --------
 145
 146 - Register new child devices aka new CUDA Driver contexts.
 147 - Allocate memory on the GPU.
 148 - Register CPU memory to make it visible from GPU.
 149
 150 Minimal requirements
 151 --------------------
 152
 153 Minimal requirements to enable the CUDA driver library are:
 154
 155 - NVIDIA GPU Ampere or Volta
 156 - CUDA 11.4 Driver API or newer
 157
 158 `GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_
 159 allows compatible network cards (e.g. Mellanox) to directly send and receive packets
 160 using GPU memory instead of additional memory copies through the CPU system memory.
 161 To enable this technology, system requirements are:
 162
 163 - `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_
 164   module running on the system;
 165 - Mellanox network card ConnectX-5 or newer (BlueField models included);
 166 - DPDK mlx5 PMD enabled;
 167 - To reach the best performance, an additional PCIe switch between GPU and NIC is recommended.
 168
 169 Limitations
 170 -----------
 171
 172 Supported only on Linux.
 173
 174 Supported GPUs
 175 --------------
 176
 177 The following NVIDIA GPU devices are supported by this CUDA driver library:
 178
 179 - NVIDIA A100 80GB PCIe
 180 - NVIDIA A100 40GB PCIe
 181 - NVIDIA A30 24GB
 182 - NVIDIA A10 24GB
 183 - NVIDIA V100 32GB PCIe
 184 - NVIDIA V100 16GB PCIe
 185
 186 External references
 187 -------------------
 188
 189 A good example of how to use the GPU CUDA driver library through the gpudev library
 190 is the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_.
 191
 192 The application is based on vanilla DPDK example l2fwd
 193 and is enhanced with GPU memory managed through gpudev library
 194 and CUDA to launch the swap of packets MAC addresses workload on the GPU.
 195
 196 l2fwd-nv is not intended to be used for performance
 197 (testpmd is the good candidate for this).
 198 The goal is to show different use-cases about how a CUDA application can use DPDK to:
 199
 200 - Allocate memory on GPU device using gpudev library.
 201 - Use that memory to create an external GPU memory mempool.
 202 - Receive packets directly in GPU memory.
 203 - Coordinate the workload on the GPU with the network and CPU activity to receive packets.
 204 - Send modified packets directly from the GPU memory.