X-Git-Url: http://git.droids-corp.org/?a=blobdiff_plain;f=doc%2Fguides%2Fprog_guide%2Fmulti_proc_support.rst;h=a84083b96c8afdc0b062c8933403cb79d5140b70;hb=c94366cfc641c6ae43d01c2ac4c6b8993817b356;hp=badd102ea2886c6a151bda89ae0ce12f93df12b8;hpb=29e30cbcc18ccabd134ee27b25323b468206451e;p=dpdk.git diff --git a/doc/guides/prog_guide/multi_proc_support.rst b/doc/guides/prog_guide/multi_proc_support.rst index badd102ea2..a84083b96c 100644 --- a/doc/guides/prog_guide/multi_proc_support.rst +++ b/doc/guides/prog_guide/multi_proc_support.rst @@ -1,32 +1,5 @@ -.. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of Intel Corporation nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2010-2014 Intel Corporation. .. _Multi-process_Support: @@ -35,7 +8,7 @@ Multi-process Support In the DPDK, multi-process support is designed to allow a group of DPDK processes to work together in a simple transparent manner to perform packet processing, -or other workloads, on Intel® architecture hardware. +or other workloads. To support this functionality, a number of additions have been made to the core DPDK Environment Abstraction Layer (EAL). @@ -52,6 +25,13 @@ Standalone DPDK processes are primary processes, while secondary processes can only run alongside a primary process or after a primary process has already configured the hugepage shared memory for them. +.. note:: + + Secondary processes should run alongside primary process with same DPDK version. + + Secondary processes which requires access to physical devices in Primary process, must + be passed with the same whitelist and blacklist options. + To support these two process types, and other multi-process setups described later, two additional command-line parameters are available to the EAL: @@ -83,6 +63,10 @@ and point to the same objects, in both processes. Refer to `Multi-process Limitations`_ for details of how Linux kernel Address-Space Layout Randomization (ASLR) can affect memory sharing. + If the primary process was run with ``--legacy-mem`` or + ``--single-file-segments`` switch, secondary processes must be run with the + same switch specified. Otherwise, memory corruption may occur. + .. _figure_multi_process_memory: .. figure:: img/multi_process_memory.* @@ -136,8 +120,13 @@ The rte part of the filenames of each of the above is configurable using the fil In addition to specifying the file-prefix parameter, any DPDK applications that are to be run side-by-side must explicitly limit their memory use. -This is done by passing the -m flag to each process to specify how much hugepage memory, in megabytes, -each process can use (or passing ``--socket-mem`` to specify how much hugepage memory on each socket each process can use). +This is less of a problem on Linux, as by default, applications will not +allocate more memory than they need. However if ``--legacy-mem`` is used, DPDK +will attempt to preallocate all memory it can get to, and memory use must be +explicitly limited. This is done by passing the ``-m`` flag to each process to +specify how much hugepage memory, in megabytes, each process can use (or passing +``--socket-mem`` to specify how much hugepage memory on each socket each process +can use). .. note:: @@ -164,8 +153,10 @@ There are a number of limitations to what can be done when running DPDK multi-pr Some of these are documented below: * The multi-process feature requires that the exact same hugepage memory mappings be present in all applications. - The Linux security feature - Address-Space Layout Randomization (ASLR) can interfere with this mapping, - so it may be necessary to disable this feature in order to reliably run multi-process applications. + This makes secondary process startup process generally unreliable. Disabling + Linux security feature - Address-Space Layout Randomization (ASLR) may + help getting more consistent mappings, but not necessarily more reliable - + if the mappings are wrong, they will be consistently wrong! .. warning:: @@ -173,7 +164,7 @@ Some of these are documented below: so it is recommended that it be disabled only when absolutely necessary, and only when the implications of this change have been understood. -* All DPDK processes running as a single application and using shared memory must have distinct coremask arguments. +* All DPDK processes running as a single application and using shared memory must have distinct coremask/corelist arguments. It is not possible to have a primary and secondary instance, or two secondary instances, using any of the same logical cores. Attempting to do so can cause corruption of memory pool caches, among other issues. @@ -185,7 +176,7 @@ Some of these are documented below: * The use of function pointers between multiple processes running based of different compiled binaries is not supported, since the location of a given function in one process may be different to its location in a second. - This prevents the librte_hash library from behaving properly as in a multi-threaded instance, + This prevents the librte_hash library from behaving properly as in a multi-process instance, since it uses a pointer to the hash function internally. To work around this issue, it is recommended that multi-process applications perform the hash calculations by directly calling @@ -198,3 +189,165 @@ instead of the functions which do the hashing internally, such as rte_hash_add() which means that only the first, primary DPDK process instance can open and mmap /dev/hpet. If the number of required DPDK processes exceeds that of the number of available HPET comparators, the TSC (which is the default timer in this release) must be used as a time source across all processes instead of the HPET. + +Communication between multiple processes +---------------------------------------- + +While there are multiple ways one can approach inter-process communication in +DPDK, there is also a native DPDK IPC API available. It is not intended to be +performance-critical, but rather is intended to be a convenient, general +purpose API to exchange short messages between primary and secondary processes. + +DPDK IPC API supports the following communication modes: + +* Unicast message from secondary to primary +* Broadcast message from primary to all secondaries + +In other words, any IPC message sent in a primary process will be delivered to +all secondaries, while any IPC message sent in a secondary process will only be +delivered to primary process. Unicast from primary to secondary or from +secondary to secondary is not supported. + +There are three types of communications that are available within DPDK IPC API: + +* Message +* Synchronous request +* Asynchronous request + +A "message" type does not expect a response and is meant to be a best-effort +notification mechanism, while the two types of "requests" are meant to be a two +way communication mechanism, with the requester expecting a response from the +other side. + +Both messages and requests will trigger a named callback on the receiver side. +These callbacks will be called from within a dedicated IPC or interrupt thread +that are not part of EAL lcore threads. + +Registering for incoming messages +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Before any messages can be received, a callback will need to be registered. +This is accomplished by calling ``rte_mp_action_register()`` function. This +function accepts a unique callback name, and a function pointer to a callback +that will be called when a message or a request matching this callback name +arrives. + +If the application is no longer willing to receive messages intended for a +specific callback function, ``rte_mp_action_unregister()`` function can be +called to ensure that callback will not be triggered again. + +Sending messages +~~~~~~~~~~~~~~~~ + +To send a message, a ``rte_mp_msg`` descriptor must be populated first. The list +of fields to be populated are as follows: + +* ``name`` - message name. This name must match receivers' callback name. +* ``param`` - message data (up to 256 bytes). +* ``len_param`` - length of message data. +* ``fds`` - file descriptors to pass long with the data (up to 8 fd's). +* ``num_fds`` - number of file descriptors to send. + +Once the structure is populated, calling ``rte_mp_sendmsg()`` will send the +descriptor either to all secondary processes (if sent from primary process), or +to primary process (if sent from secondary process). The function will return +a value indicating whether sending the message succeeded or not. + +Sending requests +~~~~~~~~~~~~~~~~ + +Sending requests involves waiting for the other side to reply, so they can block +for a relatively long time. + +To send a request, a message descriptor ``rte_mp_msg`` must be populated. +Additionally, a ``timespec`` value must be specified as a timeout, after which +IPC will stop waiting and return. + +For synchronous requests, the ``rte_mp_reply`` descriptor must also be created. +This is where the responses will be stored. +The list of fields that will be populated by IPC are as follows: + +* ``nb_sent`` - number indicating how many requests were sent (i.e. how many + peer processes were active at the time of the request). +* ``nb_received`` - number indicating how many responses were received (i.e. of + those peer processes that were active at the time of request, how many have + replied) +* ``msgs`` - pointer to where all of the responses are stored. The order in + which responses appear is undefined. When doing synchronous requests, this + memory must be freed by the requestor after request completes! + +For asynchronous requests, a function pointer to the callback function must be +provided instead. This callback will be called when the request either has timed +out, or will have received a response to all the messages that were sent. + +.. warning:: + + When an asynchronous request times out, the callback will be called not by + a dedicated IPC thread, but rather from EAL interrupt thread. Because of + this, it may not be possible for DPDK to trigger another interrupt-based + event (such as an alarm) while handling asynchronous IPC callback. + +When the callback is called, the original request descriptor will be provided +(so that it would be possible to determine for which sent message this is a +callback to), along with a response descriptor like the one described above. +When doing asynchronous requests, there is no need to free the resulting +``rte_mp_reply`` descriptor. + +Receiving and responding to messages +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To receive a message, a name callback must be registered using the +``rte_mp_action_register()`` function. The name of the callback must match the +``name`` field in sender's ``rte_mp_msg`` message descriptor in order for this +message to be delivered and for the callback to be trigger. + +The callback's definition is ``rte_mp_t``, and consists of the incoming message +pointer ``msg``, and an opaque pointer ``peer``. Contents of ``msg`` will be +identical to ones sent by the sender. + +If a response is required, a new ``rte_mp_msg`` message descriptor must be +constructed and sent via ``rte_mp_reply()`` function, along with ``peer`` +pointer. The resulting response will then be delivered to the correct requestor. + +.. warning:: + Simply returning a value when processing a request callback will not send a + response to the request - it must always be explicitly sent even in case + of errors. Implementation of error signalling rests with the application, + there is no built-in way to indicate success or error for a request. Failing + to do so will cause the requestor to time out while waiting on a response. + +Misc considerations +~~~~~~~~~~~~~~~~~~~~~~~~ + +Due to the underlying IPC implementation being single-threaded, recursive +requests (i.e. sending a request while responding to another request) is not +supported. However, since sending messages (not requests) does not involve an +IPC thread, sending messages while processing another message or request is +supported. + +Since the memory sybsystem uses IPC internally, memory allocations and IPC must +not be mixed: it is not safe to use IPC inside a memory-related callback, nor is +it safe to allocate/free memory inside IPC callbacks. Attempting to do so may +lead to a deadlock. + +Asynchronous request callbacks may be triggered either from IPC thread or from +interrupt thread, depending on whether the request has timed out. It is +therefore suggested to avoid waiting for interrupt-based events (such as alarms) +inside asynchronous IPC request callbacks. This limitation does not apply to +messages or synchronous requests. + +If callbacks spend a long time processing the incoming requests, the requestor +might time out, so setting the right timeout value on the requestor side is +imperative. + +If some of the messages timed out, ``nb_sent`` and ``nb_received`` fields in the +``rte_mp_reply`` descriptor will not have matching values. This is not treated +as error by the IPC API, and it is expected that the user will be responsible +for deciding how to handle such cases. + +If a callback has been registered, IPC will assume that it is safe to call it. +This is important when registering callbacks during DPDK initialization. +During initialization, IPC will consider the receiving side as non-existing if +the callback has not been registered yet. However, once the callback has been +registered, it is expected that IPC should be safe to trigger it, even if the +rest of the DPDK initialization hasn't finished yet.