VFIO kernel is usually present by default in all distributions,
however please consult your distributions documentation to make sure that is the case.
+For DMA mapping of either external memory or hugepages, VFIO interface is used.
+VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is
+mapped in hugepage granularity or system page granularity. Number of DMA
+mappings is limited by kernel with user locked memory limit of a process (rlimit)
+for system/hugepage memory. Another per-container overall limit applicable both
+for external memory and system memory was added in kernel 5.1 defined by
+VFIO module parameter ``dma_entry_limit`` with a default value of 64K.
+When application is out of DMA entries, these limits need to be adjusted to
+increase the allowed limit.
+
Since Linux version 5.7,
the ``vfio-pci`` module supports the creation of virtual functions.
After the PF is bound to ``vfio-pci`` module,
vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
void *arg __rte_unused)
{
- rte_iova_t iova_start, iova_expected;
struct rte_memseg_list *msl;
struct rte_memseg *ms;
size_t cur_len = 0;
- uint64_t va_start;
msl = rte_mem_virt2memseg_list(addr);
/* memsegs are contiguous in memory */
ms = rte_mem_virt2memseg(addr, msl);
-
- /*
- * This memory is not guaranteed to be contiguous, but it still could
- * be, or it could have some small contiguous chunks. Since the number
- * of VFIO mappings is limited, and VFIO appears to not concatenate
- * adjacent mappings, we have to do this ourselves.
- *
- * So, find contiguous chunks, then map them.
- */
- va_start = ms->addr_64;
- iova_start = iova_expected = ms->iova;
while (cur_len < len) {
- bool new_contig_area = ms->iova != iova_expected;
- bool last_seg = (len - cur_len) == ms->len;
- bool skip_last = false;
-
- /* only do mappings when current contiguous area ends */
- if (new_contig_area) {
- if (type == RTE_MEM_EVENT_ALLOC)
- vfio_dma_mem_map(default_vfio_cfg, va_start,
- iova_start,
- iova_expected - iova_start, 1);
- else
- vfio_dma_mem_map(default_vfio_cfg, va_start,
- iova_start,
- iova_expected - iova_start, 0);
- va_start = ms->addr_64;
- iova_start = ms->iova;
- }
/* some memory segments may have invalid IOVA */
if (ms->iova == RTE_BAD_IOVA) {
RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n",
ms->addr);
- skip_last = true;
+ goto next;
}
- iova_expected = ms->iova + ms->len;
+ if (type == RTE_MEM_EVENT_ALLOC)
+ vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
+ ms->iova, ms->len, 1);
+ else
+ vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
+ ms->iova, ms->len, 0);
+next:
cur_len += ms->len;
++ms;
-
- /*
- * don't count previous segment, and don't attempt to
- * dereference a potentially invalid pointer.
- */
- if (skip_last && !last_seg) {
- iova_expected = iova_start = ms->iova;
- va_start = ms->addr_64;
- } else if (!skip_last && last_seg) {
- /* this is the last segment and we're not skipping */
- if (type == RTE_MEM_EVENT_ALLOC)
- vfio_dma_mem_map(default_vfio_cfg, va_start,
- iova_start,
- iova_expected - iova_start, 1);
- else
- vfio_dma_mem_map(default_vfio_cfg, va_start,
- iova_start,
- iova_expected - iova_start, 0);
- }
}
}