Then, the main() function is called. The core initialization and launch is done in rte_eal_init() (see the API documentation).
It consist of calls to the pthread library (more specifically, pthread_self(), pthread_create(), and pthread_setaffinity_np()).
-.. _figure_linuxapp_launch:
+.. _figure_linux_launch:
.. figure:: img/linuxapp_launch.*
Multi-process Support
~~~~~~~~~~~~~~~~~~~~~
-The Linuxapp EAL allows a multi-process as well as a multi-threaded (pthread) deployment model.
+The Linux EAL allows a multi-process as well as a multi-threaded (pthread) deployment model.
See chapter
:ref:`Multi-process Support <Multi-process_Support>` for more details.
``--socket-limit`` command-line option, for a simple way to limit maximum amount
of memory that can be used by DPDK application.
+.. warning::
+ Memory subsystem uses DPDK IPC internally, so memory allocations/callbacks
+ and IPC must not be mixed: it is not safe to allocate/free memory inside
+ memory-related or IPC callbacks, and it is not safe to use IPC inside
+ memory-related callbacks. See chapter
+ :ref:`Multi-process Support <Multi-process_Support>` for more details about
+ DPDK IPC.
+
+ Legacy memory mode
This mode is enabled by specifying ``--legacy-mem`` command-line switch to the
can later be mapped into that preallocated VA space (if dynamic memory mode
is enabled), and can optionally be mapped into it at startup.
++ Segment file descriptors
+
+On Linux, in most cases, EAL will store segment file descriptors in EAL. This
+can become a problem when using smaller page sizes due to underlying limitations
+of ``glibc`` library. For example, Linux API calls such as ``select()`` may not
+work correctly because ``glibc`` does not support more than certain number of
+file descriptors.
+
+There are two possible solutions for this problem. The recommended solution is
+to use ``--single-file-segments`` mode, as that mode will not use a file
+descriptor per each page, and it will keep compatibility with Virtio with
+vhost-user backend. This option is not available when using ``--legacy-mem``
+mode.
+
+Another option is to use bigger page sizes. Since fewer pages are required to
+cover the same memory area, fewer file descriptors will be stored internally
+by EAL.
+
Support for Externally Allocated Memory
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- If IOVA table is not specified, IOVA addresses will be assumed to be
unavailable
- Other processes must attach to the memory area before they can use it
-* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
+* Perform DMA mapping with ``rte_dev_dma_map`` if needed
* Use the memory area in your application
* If memory area is no longer needed, it can be unregistered
- If the area was mapped for DMA, unmapping must be performed before
Locks and atomic operations are per-architecture (i686 and x86_64).
+IOVA Mode Detection
+~~~~~~~~~~~~~~~~~~~
+
+IOVA Mode is selected by considering what the current usable Devices on the
+system require and/or support.
+
+On FreeBSD, RTE_IOVA_PA is always the default. On Linux, the IOVA mode is
+detected based on a 2-step heuristic detailed below.
+
+For the first step, EAL asks each bus its requirement in terms of IOVA mode
+and decides on a preferred IOVA mode.
+
+- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA,
+- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA,
+- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the
+ preferred mode is RTE_IOVA_DC,
+- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants
+ RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
+ check on Physical Addresses availability),
+
+If the buses have expressed no preference on which IOVA mode to pick, then a
+default is selected using the following logic:
+
+- if physical addresses are not available, RTE_IOVA_VA mode is used
+- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
+- otherwise, RTE_IOVA_PA mode is used
+
+In the case when the buses had disagreed on their preferred IOVA mode, part of
+the buses won't work because of this decision.
+
+The second step checks if the preferred mode complies with the Physical
+Addresses availability since those are only available to root user in recent
+kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to
+Physical Addresses, then EAL init fails early, since later probing of the
+devices would fail anyway.
+
+.. note::
+
+ The RTE_IOVA_VA mode is preferred as the default in most cases for the
+ following reasons:
+
+ - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
+ physical address availability.
+ - By default, the mempool, first asks for IOVA-contiguous memory using
+ ``RTE_MEMZONE_IOVA_CONTIG``. This is slow in RTE_IOVA_PA mode and it may
+ affect the application boot time.
+ - It is easy to enable large amount of IOVA-contiguous memory use-cases
+ with IOVA in VA mode.
+
+ It is expected that all PCI drivers work in both RTE_IOVA_PA and
+ RTE_IOVA_VA modes.
+
+ If a PCI driver does not support RTE_IOVA_PA mode, the
+ ``RTE_PCI_DRV_NEED_IOVA_AS_VA`` flag is used to dictate that this PCI
+ driver can only work in RTE_IOVA_VA mode.
+
IOVA Mode Configuration
~~~~~~~~~~~~~~~~~~~~~~~
5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
+ Alternatively, applications can use the lock-free stack mempool handler. When
+ considering this handler, note that:
+
+ - It is currently limited to the aarch64 and x86_64 platforms, because it uses
+ an instruction (16-byte compare-and-swap) that is not yet available on other
+ platforms.
+ - It has worse average-case performance than the non-preemptive rte_ring, but
+ software caching (e.g. the mempool cache) can mitigate this by reducing the
+ number of stack accesses.
+
+ rte_timer
Running ``rte_timer_manage()`` on a non-EAL pthread is not allowed. However, resetting/stopping the timer from a non-EAL pthread is allowed.
Malloc heap is a doubly-linked list, where each element keeps track of its
previous and next elements. Due to the fact that hugepage memory can come and
-go, neighbouring malloc elements may not necessarily be adjacent in memory.
+go, neighboring malloc elements may not necessarily be adjacent in memory.
Also, since a malloc element may span multiple pages, its contents may not
necessarily be IOVA-contiguous either - each malloc element is only guaranteed
to be virtually contiguous.