X-Git-Url: http://git.droids-corp.org/?a=blobdiff_plain;f=doc%2Fguides%2Fprog_guide%2Fmulti_proc_support.rst;h=a84083b96c8afdc0b062c8933403cb79d5140b70;hb=c94366cfc641c6ae43d01c2ac4c6b8993817b356;hp=badd102ea2886c6a151bda89ae0ce12f93df12b8;hpb=29e30cbcc18ccabd134ee27b25323b468206451e;p=dpdk.git

diff --git a/doc/guides/prog_guide/multi_proc_support.rst b/doc/guides/prog_guide/multi_proc_support.rst
index badd102ea2..a84083b96c 100644
--- a/doc/guides/prog_guide/multi_proc_support.rst
+++ b/doc/guides/prog_guide/multi_proc_support.rst
@@ -1,32 +1,5 @@
-..  BSD LICENSE
-    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-    All rights reserved.
-
-    Redistribution and use in source and binary forms, with or without
-    modification, are permitted provided that the following conditions
-    are met:
-
-    * Redistributions of source code must retain the above copyright
-    notice, this list of conditions and the following disclaimer.
-    * Redistributions in binary form must reproduce the above copyright
-    notice, this list of conditions and the following disclaimer in
-    the documentation and/or other materials provided with the
-    distribution.
-    * Neither the name of Intel Corporation nor the names of its
-    contributors may be used to endorse or promote products derived
-    from this software without specific prior written permission.
-
-    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2010-2014 Intel Corporation.
 
 .. _Multi-process_Support:
 
@@ -35,7 +8,7 @@ Multi-process Support
 
 In the DPDK, multi-process support is designed to allow a group of DPDK processes
 to work together in a simple transparent manner to perform packet processing,
-or other workloads, on IntelÂ® architecture hardware.
+or other workloads.
 To support this functionality,
 a number of additions have been made to the core DPDK Environment Abstraction Layer (EAL).
 
@@ -52,6 +25,13 @@ Standalone DPDK processes are primary processes,
 while secondary processes can only run alongside a primary process or
 after a primary process has already configured the hugepage shared memory for them.
 
+.. note::
+
+    Secondary processes should run alongside primary process with same DPDK version.
+
+    Secondary processes which requires access to physical devices in Primary process, must
+    be passed with the same whitelist and blacklist options.
+
 To support these two process types, and other multi-process setups described later,
 two additional command-line parameters are available to the EAL:
 
@@ -83,6 +63,10 @@ and point to the same objects, in both processes.
     Refer to `Multi-process Limitations`_ for details of
     how Linux kernel Address-Space Layout Randomization (ASLR) can affect memory sharing.
 
+    If the primary process was run with ``--legacy-mem`` or
+    ``--single-file-segments`` switch, secondary processes must be run with the
+    same switch specified. Otherwise, memory corruption may occur.
+
 .. _figure_multi_process_memory:
 
 .. figure:: img/multi_process_memory.*
@@ -136,8 +120,13 @@ The rte part of the filenames of each of the above is configurable using the fil
 
 In addition to specifying the file-prefix parameter,
 any DPDK applications that are to be run side-by-side must explicitly limit their memory use.
-This is done by passing the -m flag to each process to specify how much hugepage memory, in megabytes,
-each process can use (or passing ``--socket-mem`` to specify how much hugepage memory on each socket each process can use).
+This is less of a problem on Linux, as by default, applications will not
+allocate more memory than they need. However if ``--legacy-mem`` is used, DPDK
+will attempt to preallocate all memory it can get to, and memory use must be
+explicitly limited. This is done by passing the ``-m`` flag to each process to
+specify how much hugepage memory, in megabytes, each process can use (or passing
+``--socket-mem`` to specify how much hugepage memory on each socket each process
+can use).
 
 .. note::
 
@@ -164,8 +153,10 @@ There are a number of limitations to what can be done when running DPDK multi-pr
 Some of these are documented below:
 
 *   The multi-process feature requires that the exact same hugepage memory mappings be present in all applications.
-    The Linux security feature - Address-Space Layout Randomization (ASLR) can interfere with this mapping,
-    so it may be necessary to disable this feature in order to reliably run multi-process applications.
+    This makes secondary process startup process generally unreliable. Disabling
+    Linux security feature - Address-Space Layout Randomization (ASLR) may
+    help getting more consistent mappings, but not necessarily more reliable -
+    if the mappings are wrong, they will be consistently wrong!
 
 .. warning::
 
@@ -173,7 +164,7 @@ Some of these are documented below:
     so it is recommended that it be disabled only when absolutely necessary,
     and only when the implications of this change have been understood.
 
-*   All DPDK processes running as a single application and using shared memory must have distinct coremask arguments.
+*   All DPDK processes running as a single application and using shared memory must have distinct coremask/corelist arguments.
     It is not possible to have a primary and secondary instance, or two secondary instances,
     using any of the same logical cores.
     Attempting to do so can cause corruption of memory pool caches, among other issues.
@@ -185,7 +176,7 @@ Some of these are documented below:
 
 *   The use of function pointers between multiple processes running based of different compiled binaries is not supported,
     since the location of a given function in one process may be different to its location in a second.
-    This prevents the librte_hash library from behaving properly as in a multi-threaded instance,
+    This prevents the librte_hash library from behaving properly as in a multi-process instance,
     since it uses a pointer to the hash function internally.
 
 To work around this issue, it is recommended that multi-process applications perform the hash calculations by directly calling
@@ -198,3 +189,165 @@ instead of the functions which do the hashing internally, such as rte_hash_add()
     which means that only the first, primary DPDK process instance can open and mmap  /dev/hpet.
     If the number of required DPDK processes exceeds that of the number of available HPET comparators,
     the TSC (which is the default timer in this release) must be used as a time source across all processes instead of the HPET.
+
+Communication between multiple processes
+----------------------------------------
+
+While there are multiple ways one can approach inter-process communication in
+DPDK, there is also a native DPDK IPC API available. It is not intended to be
+performance-critical, but rather is intended to be a convenient, general
+purpose API to exchange short messages between primary and secondary processes.
+
+DPDK IPC API supports the following communication modes:
+
+* Unicast message from secondary to primary
+* Broadcast message from primary to all secondaries
+
+In other words, any IPC message sent in a primary process will be delivered to
+all secondaries, while any IPC message sent in a secondary process will only be
+delivered to primary process. Unicast from primary to secondary or from
+secondary to secondary is not supported.
+
+There are three types of communications that are available within DPDK IPC API:
+
+* Message
+* Synchronous request
+* Asynchronous request
+
+A "message" type does not expect a response and is meant to be a best-effort
+notification mechanism, while the two types of "requests" are meant to be a two
+way communication mechanism, with the requester expecting a response from the
+other side.
+
+Both messages and requests will trigger a named callback on the receiver side.
+These callbacks will be called from within a dedicated IPC or interrupt thread
+that are not part of EAL lcore threads.
+
+Registering for incoming messages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Before any messages can be received, a callback will need to be registered.
+This is accomplished by calling ``rte_mp_action_register()`` function. This
+function accepts a unique callback name, and a function pointer to a callback
+that will be called when a message or a request matching this callback name
+arrives.
+
+If the application is no longer willing to receive messages intended for a
+specific callback function, ``rte_mp_action_unregister()`` function can be
+called to ensure that callback will not be triggered again.
+
+Sending messages
+~~~~~~~~~~~~~~~~
+
+To send a message, a ``rte_mp_msg`` descriptor must be populated first. The list
+of fields to be populated are as follows:
+
+* ``name`` - message name. This name must match receivers' callback name.
+* ``param`` - message data (up to 256 bytes).
+* ``len_param`` - length of message data.
+* ``fds`` - file descriptors to pass long with the data (up to 8 fd's).
+* ``num_fds`` - number of file descriptors to send.
+
+Once the structure is populated, calling ``rte_mp_sendmsg()`` will send the
+descriptor either to all secondary processes (if sent from primary process), or
+to primary process (if sent from secondary process). The function will return
+a value indicating whether sending the message succeeded or not.
+
+Sending requests
+~~~~~~~~~~~~~~~~
+
+Sending requests involves waiting for the other side to reply, so they can block
+for a relatively long time.
+
+To send a request, a message descriptor ``rte_mp_msg`` must be populated.
+Additionally, a ``timespec`` value must be specified as a timeout, after which
+IPC will stop waiting and return.
+
+For synchronous requests, the ``rte_mp_reply`` descriptor must also be created.
+This is where the responses will be stored.
+The list of fields that will be populated by IPC are as follows:
+
+* ``nb_sent`` - number indicating how many requests were sent (i.e. how many
+  peer processes were active at the time of the request).
+* ``nb_received`` - number indicating how many responses were received (i.e. of
+  those peer processes that were active at the time of request, how many have
+  replied)
+* ``msgs`` - pointer to where all of the responses are stored. The order in
+  which responses appear is undefined. When doing synchronous requests, this
+  memory must be freed by the requestor after request completes!
+
+For asynchronous requests, a function pointer to the callback function must be
+provided instead. This callback will be called when the request either has timed
+out, or will have received a response to all the messages that were sent.
+
+.. warning::
+
+    When an asynchronous request times out, the callback will be called not by
+    a dedicated IPC thread, but rather from EAL interrupt thread. Because of
+    this, it may not be possible for DPDK to trigger another interrupt-based
+    event (such as an alarm) while handling asynchronous IPC callback.
+
+When the callback is called, the original request descriptor will be provided
+(so that it would be possible to determine for which sent message this is a
+callback to), along with a response descriptor like the one described above.
+When doing asynchronous requests, there is no need to free the resulting
+``rte_mp_reply`` descriptor.
+
+Receiving and responding to messages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To receive a message, a name callback must be registered using the
+``rte_mp_action_register()`` function. The name of the callback must match the
+``name`` field in sender's ``rte_mp_msg`` message descriptor in order for this
+message to be delivered and for the callback to be trigger.
+
+The callback's definition is ``rte_mp_t``, and consists of the incoming message
+pointer ``msg``, and an opaque pointer ``peer``. Contents of ``msg`` will be
+identical to ones sent by the sender.
+
+If a response is required, a new ``rte_mp_msg`` message descriptor must be
+constructed and sent via ``rte_mp_reply()`` function, along with ``peer``
+pointer. The resulting response will then be delivered to the correct requestor.
+
+.. warning::
+    Simply returning a value when processing a request callback will not send a
+    response to the request - it must always be explicitly sent even in case
+    of errors. Implementation of error signalling rests with the application,
+    there is no built-in way to indicate success or error for a request. Failing
+    to do so will cause the requestor to time out while waiting on a response.
+
+Misc considerations
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Due to the underlying IPC implementation being single-threaded, recursive
+requests (i.e. sending a request while responding to another request) is not
+supported. However, since sending messages (not requests) does not involve an
+IPC thread, sending messages while processing another message or request is
+supported.
+
+Since the memory sybsystem uses IPC internally, memory allocations and IPC must
+not be mixed: it is not safe to use IPC inside a memory-related callback, nor is
+it safe to allocate/free memory inside IPC callbacks. Attempting to do so may
+lead to a deadlock.
+
+Asynchronous request callbacks may be triggered either from IPC thread or from
+interrupt thread, depending on whether the request has timed out. It is
+therefore suggested to avoid waiting for interrupt-based events (such as alarms)
+inside asynchronous IPC request callbacks. This limitation does not apply to
+messages or synchronous requests.
+
+If callbacks spend a long time processing the incoming requests, the requestor
+might time out, so setting the right timeout value on the requestor side is
+imperative.
+
+If some of the messages timed out, ``nb_sent`` and ``nb_received`` fields in the
+``rte_mp_reply`` descriptor will not have matching values. This is not treated
+as error by the IPC API, and it is expected that the user will be responsible
+for deciding how to handle such cases.
+
+If a callback has been registered, IPC will assume that it is safe to call it.
+This is important when registering callbacks during DPDK initialization.
+During initialization, IPC will consider the receiving side as non-existing if
+the callback has not been registered yet. However, once the callback has been
+registered, it is expected that IPC should be safe to trigger it, even if the
+rest of the DPDK initialization hasn't finished yet.