X-Git-Url: http://git.droids-corp.org/?a=blobdiff_plain;f=doc%2Fguides%2Fprog_guide%2Fenv_abstraction_layer.rst;h=6e59faeea9f3168eb1bdfc90d5fcd6e7530b6d34;hb=a2aafb9aa6517160a2621e2140e36d67326190d5;hp=fa8afdb3a349e27bfe7c9f3da53d46e37166e507;hpb=e75bc77f98bcf1e9772022b6f833da588b59c8e1;p=dpdk.git diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index fa8afdb3a3..6e59faeea9 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -147,6 +147,14 @@ A default validator callback is provided by EAL, which can be enabled with a ``--socket-limit`` command-line option, for a simple way to limit maximum amount of memory that can be used by DPDK application. +.. warning:: + Memory subsystem uses DPDK IPC internally, so memory allocations/callbacks + and IPC must not be mixed: it is not safe to allocate/free memory inside + memory-related or IPC callbacks, and it is not safe to use IPC inside + memory-related callbacks. See chapter + :ref:`Multi-process Support ` for more details about + DPDK IPC. + + Legacy memory mode This mode is enabled by specifying ``--legacy-mem`` command-line switch to the @@ -411,6 +419,62 @@ Misc Functions Locks and atomic operations are per-architecture (i686 and x86_64). +IOVA Mode Detection +~~~~~~~~~~~~~~~~~~~ + +IOVA Mode is selected by considering what the current usable Devices on the +system require and/or support. + +On FreeBSD, RTE_IOVA_PA is always the default. On Linux, the IOVA mode is +detected based on a 2-step heuristic detailed below. + +For the first step, EAL asks each bus its requirement in terms of IOVA mode +and decides on a preferred IOVA mode. + +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA, +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA, +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the + preferred mode is RTE_IOVA_DC, +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the + check on Physical Addresses availability), + +If the buses have expressed no preference on which IOVA mode to pick, then a +default is selected using the following logic: + +- if physical addresses are not available, RTE_IOVA_VA mode is used +- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used +- otherwise, RTE_IOVA_PA mode is used + +In the case when the buses had disagreed on their preferred IOVA mode, part of +the buses won't work because of this decision. + +The second step checks if the preferred mode complies with the Physical +Addresses availability since those are only available to root user in recent +kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to +Physical Addresses, then EAL init fails early, since later probing of the +devices would fail anyway. + +.. note:: + + The RTE_IOVA_VA mode is preferred as the default in most cases for the + following reasons: + + - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of + physical address availability. + - By default, the mempool, first asks for IOVA-contiguous memory using + ``RTE_MEMZONE_IOVA_CONTIG``. This is slow in RTE_IOVA_PA mode and it may + affect the application boot time. + - It is easy to enable large amount of IOVA-contiguous memory use-cases + with IOVA in VA mode. + + It is expected that all PCI drivers work in both RTE_IOVA_PA and + RTE_IOVA_VA modes. + + If a PCI driver does not support RTE_IOVA_PA mode, the + ``RTE_PCI_DRV_NEED_IOVA_AS_VA`` flag is used to dictate that this PCI + driver can only work in RTE_IOVA_VA mode. + IOVA Mode Configuration ~~~~~~~~~~~~~~~~~~~~~~~ @@ -584,8 +648,8 @@ Known Issues Alternatively, applications can use the lock-free stack mempool handler. When considering this handler, note that: - - It is currently limited to the x86_64 platform, because it uses an - instruction (16-byte compare-and-swap) that is not yet available on other + - It is currently limited to the aarch64 and x86_64 platforms, because it uses + an instruction (16-byte compare-and-swap) that is not yet available on other platforms. - It has worse average-case performance than the non-preemptive rte_ring, but software caching (e.g. the mempool cache) can mitigate this by reducing the @@ -733,7 +797,7 @@ The most important fields in the structure and how they are used are described b Malloc heap is a doubly-linked list, where each element keeps track of its previous and next elements. Due to the fact that hugepage memory can come and -go, neighbouring malloc elements may not necessarily be adjacent in memory. +go, neighboring malloc elements may not necessarily be adjacent in memory. Also, since a malloc element may span multiple pages, its contents may not necessarily be IOVA-contiguous either - each malloc element is only guaranteed to be virtually contiguous.