doc/guides/prog_guide/multi_proc_support.rst

   1 ..  SPDX-License-Identifier: BSD-3-Clause
   2     Copyright(c) 2010-2014 Intel Corporation.
   3
   4 .. _Multi-process_Support:
   5
   6 Multi-process Support
   7 =====================
   8
   9 In the DPDK, multi-process support is designed to allow a group of DPDK processes
  10 to work together in a simple transparent manner to perform packet processing,
  11 or other workloads.
  12 To support this functionality,
  13 a number of additions have been made to the core DPDK Environment Abstraction Layer (EAL).
  14
  15 The EAL has been modified to allow different types of DPDK processes to be spawned,
  16 each with different permissions on the hugepage memory used by the applications.
  17 For now, there are two types of process specified:
  18
  19 *   primary processes, which can initialize and which have full permissions on shared memory
  20
  21 *   secondary processes, which cannot initialize shared memory,
  22     but can attach to pre- initialized shared memory and create objects in it.
  23
  24 Standalone DPDK processes are primary processes,
  25 while secondary processes can only run alongside a primary process or
  26 after a primary process has already configured the hugepage shared memory for them.
  27
  28 .. note::
  29
  30     Secondary processes should run alongside primary process with same DPDK version.
  31
  32     Secondary processes which requires access to physical devices in Primary process, must
  33     be passed with the same whitelist and blacklist options.
  34
  35 To support these two process types, and other multi-process setups described later,
  36 two additional command-line parameters are available to the EAL:
  37
  38 *   ``--proc-type:`` for specifying a given process instance as the primary or secondary DPDK instance
  39
  40 *   ``--file-prefix:`` to allow processes that do not want to co-operate to have different memory regions
  41
  42 A number of example applications are provided that demonstrate how multiple DPDK processes can be used together.
  43 These are more fully documented in the "Multi- process Sample Application" chapter
  44 in the *DPDK Sample Application's User Guide*.
  45
  46 Memory Sharing
  47 --------------
  48
  49 The key element in getting a multi-process application working using the DPDK is to ensure that
  50 memory resources are properly shared among the processes making up the multi-process application.
  51 Once there are blocks of shared memory available that can be accessed by multiple processes,
  52 then issues such as inter-process communication (IPC) becomes much simpler.
  53
  54 On application start-up in a primary or standalone process,
  55 the DPDK records to memory-mapped files the details of the memory configuration it is using - hugepages in use,
  56 the virtual addresses they are mapped at, the number of memory channels present, etc.
  57 When a secondary process is started, these files are read and the EAL recreates the same memory configuration
  58 in the secondary process so that all memory zones are shared between processes and all pointers to that memory are valid,
  59 and point to the same objects, in both processes.
  60
  61 .. note::
  62
  63     Refer to `Multi-process Limitations`_ for details of
  64     how Linux kernel Address-Space Layout Randomization (ASLR) can affect memory sharing.
  65
  66 .. _figure_multi_process_memory:
  67
  68 .. figure:: img/multi_process_memory.*
  69
  70    Memory Sharing in the DPDK Multi-process Sample Application
  71
  72
  73 The EAL also supports an auto-detection mode (set by EAL ``--proc-type=auto`` flag ),
  74 whereby an DPDK process is started as a secondary instance if a primary instance is already running.
  75
  76 Deployment Models
  77 -----------------
  78
  79 Symmetric/Peer Processes
  80 ~~~~~~~~~~~~~~~~~~~~~~~~
  81
  82 DPDK multi-process support can be used to create a set of peer processes where each process performs the same workload.
  83 This model is equivalent to having multiple threads each running the same main-loop function,
  84 as is done in most of the supplied DPDK sample applications.
  85 In this model, the first of the processes spawned should be spawned using the ``--proc-type=primary`` EAL flag,
  86 while all subsequent instances should be spawned using the ``--proc-type=secondary`` flag.
  87
  88 The simple_mp and symmetric_mp sample applications demonstrate this usage model.
  89 They are described in the "Multi-process Sample Application" chapter in the *DPDK Sample Application's User Guide*.
  90
  91 Asymmetric/Non-Peer Processes
  92 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  93
  94 An alternative deployment model that can be used for multi-process applications
  95 is to have a single primary process instance that acts as a load-balancer or
  96 server distributing received packets among worker or client threads, which are run as secondary processes.
  97 In this case, extensive use of rte_ring objects is made, which are located in shared hugepage memory.
  98
  99 The client_server_mp sample application shows this usage model.
 100 It is described in the "Multi-process Sample Application" chapter in the *DPDK Sample Application's User Guide*.
 101
 102 Running Multiple Independent DPDK Applications
 103 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 104
 105 In addition to the above scenarios involving multiple DPDK processes working together,
 106 it is possible to run multiple DPDK processes side-by-side,
 107 where those processes are all working independently.
 108 Support for this usage scenario is provided using the ``--file-prefix`` parameter to the EAL.
 109
 110 By default, the EAL creates hugepage files on each hugetlbfs filesystem using the rtemap_X filename,
 111 where X is in the range 0 to the maximum number of hugepages -1.
 112 Similarly, it creates shared configuration files, memory mapped in each process, using the /var/run/.rte_config filename,
 113 when run as root (or $HOME/.rte_config when run as a non-root user;
 114 if filesystem and device permissions are set up to allow this).
 115 The rte part of the filenames of each of the above is configurable using the file-prefix parameter.
 116
 117 In addition to specifying the file-prefix parameter,
 118 any DPDK applications that are to be run side-by-side must explicitly limit their memory use.
 119 This is done by passing the -m flag to each process to specify how much hugepage memory, in megabytes,
 120 each process can use (or passing ``--socket-mem`` to specify how much hugepage memory on each socket each process can use).
 121
 122 .. note::
 123
 124     Independent DPDK instances running side-by-side on a single machine cannot share any network ports.
 125     Any network ports being used by one process should be blacklisted in every other process.
 126
 127 Running Multiple Independent Groups of DPDK Applications
 128 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 129
 130 In the same way that it is possible to run independent DPDK applications side- by-side on a single system,
 131 this can be trivially extended to multi-process groups of DPDK applications running side-by-side.
 132 In this case, the secondary processes must use the same ``--file-prefix`` parameter
 133 as the primary process whose shared memory they are connecting to.
 134
 135 .. note::
 136
 137     All restrictions and issues with multiple independent DPDK processes running side-by-side
 138     apply in this usage scenario also.
 139
 140 Multi-process Limitations
 141 -------------------------
 142
 143 There are a number of limitations to what can be done when running DPDK multi-process applications.
 144 Some of these are documented below:
 145
 146 *   The multi-process feature requires that the exact same hugepage memory mappings be present in all applications.
 147     The Linux security feature - Address-Space Layout Randomization (ASLR) can interfere with this mapping,
 148     so it may be necessary to disable this feature in order to reliably run multi-process applications.
 149
 150 .. warning::
 151
 152     Disabling Address-Space Layout Randomization (ASLR) may have security implications,
 153     so it is recommended that it be disabled only when absolutely necessary,
 154     and only when the implications of this change have been understood.
 155
 156 *   All DPDK processes running as a single application and using shared memory must have distinct coremask/corelist arguments.
 157     It is not possible to have a primary and secondary instance, or two secondary instances,
 158     using any of the same logical cores.
 159     Attempting to do so can cause corruption of memory pool caches, among other issues.
 160
 161 *   The delivery of interrupts, such as Ethernet* device link status interrupts, do not work in secondary processes.
 162     All interrupts are triggered inside the primary process only.
 163     Any application needing interrupt notification in multiple processes should provide its own mechanism
 164     to transfer the interrupt information from the primary process to any secondary process that needs the information.
 165
 166 *   The use of function pointers between multiple processes running based of different compiled binaries is not supported,
 167     since the location of a given function in one process may be different to its location in a second.
 168     This prevents the librte_hash library from behaving properly as in a multi-threaded instance,
 169     since it uses a pointer to the hash function internally.
 170
 171 To work around this issue, it is recommended that multi-process applications perform the hash calculations by directly calling
 172 the hashing function from the code and then using the rte_hash_add_with_hash()/rte_hash_lookup_with_hash() functions
 173 instead of the functions which do the hashing internally, such as rte_hash_add()/rte_hash_lookup().
 174
 175 *   Depending upon the hardware in use, and the number of DPDK processes used,
 176     it may not be possible to have HPET timers available in each DPDK instance.
 177     The minimum number of HPET comparators available to Linux* userspace can be just a single comparator,
 178     which means that only the first, primary DPDK process instance can open and mmap  /dev/hpet.
 179     If the number of required DPDK processes exceeds that of the number of available HPET comparators,
 180     the TSC (which is the default timer in this release) must be used as a time source across all processes instead of the HPET.
 181
 182 Communication between multiple processes
 183 ----------------------------------------
 184
 185 While there are multiple ways one can approach inter-process communication in
 186 DPDK, there is also a native DPDK IPC API available. It is not intended to be
 187 performance-critical, but rather is intended to be a convenient, general
 188 purpose API to exchange short messages between primary and secondary processes.
 189
 190 DPDK IPC API supports the following communication modes:
 191
 192 * Unicast message from secondary to primary
 193 * Broadcast message from primary to all secondaries
 194
 195 In other words, any IPC message sent in a primary process will be delivered to
 196 all secondaries, while any IPC message sent in a secondary process will only be
 197 delivered to primary process. Unicast from primary to secondary or from
 198 secondary to secondary is not supported.
 199
 200 There are three types of communications that are available within DPDK IPC API:
 201
 202 * Message
 203 * Synchronous request
 204 * Asynchronous request
 205
 206 A "message" type does not expect a response and is meant to be a best-effort
 207 notification mechanism, while the two types of "requests" are meant to be a two
 208 way communication mechanism, with the requester expecting a response from the
 209 other side.
 210
 211 Both messages and requests will trigger a named callback on the receiver side.
 212 These callbacks will be called from within a dedicated IPC thread that is not
 213 part of EAL lcore threads.
 214
 215 Registering for incoming messages
 216 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 217
 218 Before any messages can be received, a callback will need to be registered.
 219 This is accomplished by calling ``rte_mp_action_register()`` function. This
 220 function accepts a unique callback name, and a function pointer to a callback
 221 that will be called when a message or a request matching this callback name
 222 arrives.
 223
 224 If the application is no longer willing to receive messages intended for a
 225 specific callback function, ``rte_mp_action_unregister()`` function can be
 226 called to ensure that callback will not be triggered again.
 227
 228 Sending messages
 229 ~~~~~~~~~~~~~~~~
 230
 231 To send a message, a ``rte_mp_msg`` descriptor must be populated first. The list
 232 of fields to be populated are as follows:
 233
 234 * ``name`` - message name. This name must match receivers' callback name.
 235 * ``param`` - message data (up to 256 bytes).
 236 * ``len_param`` - length of message data.
 237 * ``fds`` - file descriptors to pass long with the data (up to 8 fd's).
 238 * ``num_fds`` - number of file descriptors to send.
 239
 240 Once the structure is populated, calling ``rte_mp_sendmsg()`` will send the
 241 descriptor either to all secondary processes (if sent from primary process), or
 242 to primary process (if sent from secondary process). The function will return
 243 a value indicating whether sending the message succeeded or not.
 244
 245 Sending requests
 246 ~~~~~~~~~~~~~~~~
 247
 248 Sending requests involves waiting for the other side to reply, so they can block
 249 for a relatively long time.
 250
 251 To send a request, a message descriptor ``rte_mp_msg`` must be populated.
 252 Additionally, a ``timespec`` value must be specified as a timeout, after which
 253 IPC will stop waiting and return.
 254
 255 For synchronous synchronous requests, the ``rte_mp_reply`` descriptor must also
 256 be created. This is where the responses will be stored. The list of fields that
 257 will be populated by IPC are as follows:
 258
 259 * ``nb_sent`` - number indicating how many requests were sent (i.e. how many
 260   peer processes were active at the time of the request).
 261 * ``nb_received`` - number indicating how many responses were received (i.e. of
 262   those peer processes that were active at the time of request, how many have
 263   replied)
 264 * ``msgs`` - pointer to where all of the responses are stored. The order in
 265   which responses appear is undefined. Whendoing sycnrhonous requests, this
 266   memory must be freed by the requestor after request completes!
 267
 268 For asynchronous requests, a function pointer to the callback function must be
 269 provided instead. This callback will be called when the request either has timed
 270 out, or will have received a response to all the messages that were sent.
 271
 272 When the callback is called, the original request descriptor will be provided
 273 (so that it would be possible to determine for which sent message this is a
 274 callback to), along with a response descriptor like the one described above.
 275 When doing asynchronous requests, there is no need to free the resulting
 276 ``rte_mp_reply`` descriptor.
 277
 278 Receiving and responding to messages
 279 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 280
 281 To receive a message, a name callback must be registered using the
 282 ``rte_mp_action_register()`` function. The name of the callback must match the
 283 ``name`` field in sender's ``rte_mp_msg`` message descriptor in order for this
 284 message to be delivered and for the callback to be trigger.
 285
 286 The callback's definition is ``rte_mp_t``, and consists of the incoming message
 287 pointer ``msg``, and an opaque pointer ``peer``. Contents of ``msg`` will be
 288 identical to ones sent by the sender.
 289
 290 If a response is required, a new ``rte_mp_msg`` message descriptor must be
 291 constructed and sent via ``rte_mp_reply()`` function, along with ``peer``
 292 pointer. The resulting response will then be delivered to the correct requestor.
 293
 294 Misc considerations
 295 ~~~~~~~~~~~~~~~~~~~~~~~~
 296
 297 Due to the underlying IPC implementation being single-threaded, recursive
 298 requests (i.e. sending a request while responding to another request) is not
 299 supported. However, since sending messages (not requests) does not involve an
 300 IPC thread, sending messages while processing another message or request is
 301 supported.
 302
 303 If callbacks spend a long time processing the incoming requests, the requestor
 304 might time out, so setting the right timeout value on the requestor side is
 305 imperative.
 306
 307 If some of the messages timed out, ``nb_sent`` and ``nb_received`` fields in the
 308 ``rte_mp_reply`` descriptor will not have matching values. This is not treated
 309 as error by the IPC API, and it is expected that the user will be responsible
 310 for deciding how to handle such cases.
 311
 312 If a callback has been registered, IPC will assume that it is safe to call it.
 313 This is important when registering callbacks during DPDK initialization.
 314 During initialization, IPC will consider the receiving side as non-existing if
 315 the callback has not been registered yet. However, once the callback has been
 316 registered, it is expected that IPC should be safe to trigger it, even if the
 317 rest of the DPDK initialization hasn't finished yet.