doc/guides/gpus/cuda.rst

   1 .. SPDX-License-Identifier: BSD-3-Clause
   2    Copyright (c) 2021 NVIDIA Corporation & Affiliates
   3
   4 CUDA GPU driver
   5 ===============
   6
   7 The CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs.
   8 Information and documentation about these devices can be found on the
   9 `NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the
  10 `NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_.
  11
  12 Build dependencies
  13 ------------------
  14
  15 The CUDA GPU driver library has an header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``.
  16 To get these headers there are two options:
  17
  18 - Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_
  19   (either regular or stubs installation).
  20 - Download these two headers from this `CUDA headers
  21   <https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository.
  22
  23 You need to indicate to meson where CUDA headers files are through the CFLAGS variable.
  24 Three ways:
  25
  26 - Set ``export CFLAGS=-I/usr/local/cuda/include`` before building
  27 - Add CFLAGS in the meson command line ``CFLAGS=-I/usr/local/cuda/include meson build``
  28 - Add the ``-Dc_args`` in meson command line ``meson build -Dc_args=-I/usr/local/cuda/include``
  29
  30 If headers are not found, the CUDA GPU driver library is not built.
  31
  32 CUDA Shared Library
  33 -------------------
  34
  35 To avoid any system configuration issue, the CUDA API **libcuda.so** shared library
  36 is not linked at building time because of a Meson bug that looks
  37 for `cudart` module even if the `meson.build` file only requires default `cuda` module.
  38
  39 **libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen``
  40 when the very first GPU is detected.
  41 If CUDA installation resides in a custom directory,
  42 the environment variable ``CUDA_PATH_L`` should specify where ``dlopen``
  43 can look for **libcuda.so**.
  44
  45 All CUDA API symbols are loaded at runtime as well.
  46 For this reason, to build the CUDA driver library,
  47 no need to install the CUDA library.
  48
  49 Design
  50 ------
  51
  52 **librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API).
  53
  54 Goal of this driver library is not to provide a wrapper for the whole CUDA Driver API.
  55 Instead, the scope is to implement the generic features of gpudev API.
  56 For a CUDA application, integrating the gpudev library functions
  57 using the CUDA driver library is quite straightforward
  58 and doesn't create any compatibility problem.
  59
  60 Initialization
  61 ~~~~~~~~~~~~~~
  62
  63 During initialization, CUDA driver library detects NVIDIA physical GPUs
  64 on the system or specified via EAL device options (e.g. ``-a b6:00.0``).
  65 The driver initializes the CUDA driver environment through ``cuInit(0)`` function.
  66 For this reason, it's required to set any CUDA environment configuration before
  67 calling ``rte_eal_init`` function in the DPDK application.
  68
  69 If the CUDA driver environment has been already initialized, the ``cuInit(0)``
  70 in CUDA driver library has no effect.
  71
  72 CUDA Driver sub-contexts
  73 ~~~~~~~~~~~~~~~~~~~~~~~~
  74
  75 After initialization, a CUDA application can create multiple sub-contexts
  76 on GPU physical devices.
  77 Through gpudev library, is possible to register these sub-contexts
  78 in the CUDA driver library as child devices having as parent a GPU physical device.
  79
  80 CUDA driver library also supports `MPS
  81 <https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__.
  82
  83 GPU memory management
  84 ~~~~~~~~~~~~~~~~~~~~~
  85
  86 The CUDA driver library maintains a table of GPU memory addresses allocated
  87 and CPU memory addresses registered associated to the input CUDA context.
  88 Whenever the application tried to deallocate or deregister a memory address,
  89 if the address is not in the table the CUDA driver library will return an error.
  90
  91 Features
  92 --------
  93
  94 - Register new child devices aka new CUDA Driver contexts.
  95 - Allocate memory on the GPU.
  96 - Register CPU memory to make it visible from GPU.
  97
  98 Minimal requirements
  99 --------------------
 100
 101 Minimal requirements to enable the CUDA driver library are:
 102
 103 - NVIDIA GPU Ampere or Volta
 104 - CUDA 11.4 Driver API or newer
 105
 106 `GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_
 107 allows compatible network cards (e.g. Mellanox) to directly send and receive packets
 108 using GPU memory instead of additional memory copies through the CPU system memory.
 109 To enable this technology, system requirements are:
 110
 111 - `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_
 112   module running on the system;
 113 - Mellanox network card ConnectX-5 or newer (BlueField models included);
 114 - DPDK mlx5 PMD enabled;
 115 - To reach the best performance, an additional PCIe switch between GPU and NIC is recommended.
 116
 117 Limitations
 118 -----------
 119
 120 Supported only on Linux.
 121
 122 Supported GPUs
 123 --------------
 124
 125 The following NVIDIA GPU devices are supported by this CUDA driver library:
 126
 127 - NVIDIA A100 80GB PCIe
 128 - NVIDIA A100 40GB PCIe
 129 - NVIDIA A30 24GB
 130 - NVIDIA A10 24GB
 131 - NVIDIA V100 32GB PCIe
 132 - NVIDIA V100 16GB PCIe
 133
 134 External references
 135 -------------------
 136
 137 A good example of how to use the GPU CUDA driver library through the gpudev library
 138 is the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_.
 139
 140 The application is based on vanilla DPDK example l2fwd
 141 and is enhanced with GPU memory managed through gpudev library
 142 and CUDA to launch the swap of packets MAC addresses workload on the GPU.
 143
 144 l2fwd-nv is not intended to be used for performance
 145 (testpmd is the good candidate for this).
 146 The goal is to show different use-cases about how a CUDA application can use DPDK to:
 147
 148 - Allocate memory on GPU device using gpudev library.
 149 - Use that memory to create an external GPU memory mempool.
 150 - Receive packets directly in GPU memory.
 151 - Coordinate the workload on the GPU with the network and CPU activity to receive packets.
 152 - Send modified packets directly from the GPU memory.