~~~~~~~~~~~~~~~~~~~~~
The Linuxapp EAL allows a multi-process as well as a multi-threaded (pthread) deployment model.
-See chapter 2.20
+See chapter
:ref:`Multi-process Support <Multi-process_Support>` for more details.
Memory Mapping Discovery and Memory Reservation
.. note::
- Memory reservations done using the APIs provided by the rte_malloc library are also backed by pages from the hugetlbfs filesystem.
-
-Xen Dom0 support without hugetbls
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The existing memory management implementation is based on the Linux kernel hugepage mechanism.
-However, Xen Dom0 does not support hugepages, so a new Linux kernel module rte_dom0_mm is added to workaround this limitation.
-
-The EAL uses IOCTL interface to notify the Linux kernel module rte_dom0_mm to allocate memory of specified size,
-and get all memory segments information from the module,
-and the EAL uses MMAP interface to map the allocated memory.
-For each memory segment, the physical addresses are contiguous within it but actual hardware addresses are contiguous within 2MB.
+ Memory reservations done using the APIs provided by rte_malloc are also backed by pages from the hugetlbfs filesystem.
PCI Access
~~~~~~~~~~
CPU Feature Identification
~~~~~~~~~~~~~~~~~~~~~~~~~~
-The EAL can query the CPU at runtime (using the rte_cpu_get_feature() function) to determine which CPU features are available.
+The EAL can query the CPU at runtime (using the rte_cpu_get_features() function) to determine which CPU features are available.
+
+User Space Interrupt Event
+~~~~~~~~~~~~~~~~~~~~~~~~~~
-User Space Interrupt and Alarm Handling
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
++ User Space Interrupt and Alarm Handling in Host Thread
The EAL creates a host thread to poll the UIO device file descriptors to detect the interrupts.
Callbacks can be registered or unregistered by the EAL functions for a specific interrupt event
.. note::
- The only interrupts supported by the DPDK Poll-Mode Drivers are those for link status change,
- i.e. link up and link down notification.
+ In DPDK PMD, the only interrupts handled by the dedicated host thread are those for link status change
+ (link up and link down notification) and for sudden device removal.
+
+
++ RX Interrupt Event
+
+The receive and transmit routines provided by each PMD don't limit themselves to execute in polling thread mode.
+To ease the idle polling with tiny throughput, it's useful to pause the polling and wait until the wake-up event happens.
+The RX interrupt is the first choice to be such kind of wake-up event, but probably won't be the only one.
+
+EAL provides the event APIs for this event-driven thread mode.
+Taking linuxapp as an example, the implementation relies on epoll. Each thread can monitor an epoll instance
+in which all the wake-up events' file descriptors are added. The event file descriptors are created and mapped to
+the interrupt vectors according to the UIO/VFIO spec.
+From bsdapp's perspective, kqueue is the alternative way, but not implemented yet.
+
+EAL initializes the mapping between event file descriptors and interrupt vectors, while each device initializes the mapping
+between interrupt vectors and queues. In this way, EAL actually is unaware of the interrupt cause on the specific vector.
+The eth_dev driver takes responsibility to program the latter mapping.
+
+.. note::
+
+ Per queue RX interrupt event is only allowed in VFIO which supports multiple MSI-X vector. In UIO, the RX interrupt
+ together with other interrupt causes shares the same vector. In this case, when RX interrupt and LSC(link status change)
+ interrupt are both enabled(intr_conf.lsc == 1 && intr_conf.rxq == 1), only the former is capable.
+
+The RX interrupt are controlled/enabled/disabled by ethdev APIs - 'rte_eth_dev_rx_intr_*'. They return failure if the PMD
+hasn't support them yet. The intr_conf.rxq flag is used to turn on the capability of RX interrupt per device.
+
++ Device Removal Event
+
+This event is triggered by a device being removed at a bus level. Its
+underlying resources may have been made unavailable (i.e. PCI mappings
+unmapped). The PMD must make sure that on such occurrence, the application can
+still safely use its callbacks.
+
+This event can be subscribed to in the same way one would subscribe to a link
+status change event. The execution context is thus the same, i.e. it is the
+dedicated interrupt host thread.
+
+Considering this, it is likely that an application would want to close a
+device having emitted a Device Removal Event. In such case, calling
+``rte_eth_dev_close()`` can trigger it to unregister its own Device Removal Event
+callback. Care must be taken not to close the device from the interrupt handler
+context. It is necessary to reschedule such closing operation.
Blacklisting
~~~~~~~~~~~~
Public Thread API
~~~~~~~~~~~~~~~~~
-There are two public APIs ``rte_thread_set_affinity()`` and ``rte_pthread_get_affinity()`` introduced for threads.
+There are two public APIs ``rte_thread_set_affinity()`` and ``rte_thread_get_affinity()`` introduced for threads.
When they're used in any pthread context, the Thread Local Storage(TLS) will be set/get.
Those TLS include *_cpuset* and *_socket_id*:
The rte_mempool uses a per-lcore cache inside the mempool.
For non-EAL pthreads, ``rte_lcore_id()`` will not return a valid number.
- So for now, when rte_mempool is used with non-EAL pthreads, the put/get operations will bypass the mempool cache and there is a performance penalty because of this bypass.
- Support for non-EAL mempool cache is currently being enabled.
+ So for now, when rte_mempool is used with non-EAL pthreads, the put/get operations will bypass the default mempool cache and there is a performance penalty because of this bypass.
+ Only user-owned external caches can be used in a non-EAL context in conjunction with ``rte_mempool_generic_put()`` and ``rte_mempool_generic_get()`` that accept an explicit cache parameter.
+ rte_ring
be preempted by another pthread doing a multi-consumer dequeue on
the same ring.
- Bypassing this constraint it may cause the 2nd pthread to spin until the 1st one is scheduled again.
+ Bypassing this constraint may cause the 2nd pthread to spin until the 1st one is scheduled again.
Moreover, if the 1st pthread is preempted by a context that has an higher priority, it may even cause a dead lock.
This does not mean it cannot be used, simply, there is a need to narrow down the situation when it is used by multi-pthread on the same core.
3. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
- ``RTE_RING_PAUSE_REP_COUNT`` is defined for rte_ring to reduce contention. It's mainly for case 2, a yield is issued after number of times pause repeat.
-
- It adds a sched_yield() syscall if the thread spins for too long while waiting on the other thread to finish its operations on the ring.
- This gives the preempted thread a chance to proceed and finish with the ring enqueue/dequeue operation.
-
+ rte_timer
Running ``rte_timer_manager()`` on a non-EAL pthread is not allowed. However, resetting/stopping the timer from a non-EAL pthread is allowed.
Memory Allocation
^^^^^^^^^^^^^^^^^
-On EAL initialisation, all memsegs are setup as part of the malloc heap.
+On EAL initialization, all memsegs are setup as part of the malloc heap.
This setup involves placing a dummy structure at the end with ``BUSY`` state,
which may contain a sentinel value if ``CONFIG_RTE_MALLOC_DEBUG`` is enabled,
and a proper :ref:`element header<malloc_elem>` with ``FREE`` at the start