From: Sergio Gonzalez Monroy Date: Thu, 16 Jul 2015 07:37:11 +0000 (+0100) Subject: doc: update malloc guide X-Git-Tag: spdx-start~8732 X-Git-Url: http://git.droids-corp.org/?p=dpdk.git;a=commitdiff_plain;h=56297061dfad19a045754ec29de30e5be369bf10 doc: update malloc guide Update malloc documentation to reflect new implementation details. Signed-off-by: Sergio Gonzalez Monroy --- diff --git a/MAINTAINERS b/MAINTAINERS index 68d9857fb9..71e01bb10e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -103,7 +103,7 @@ F: lib/librte_eal/common/*malloc* F: lib/librte_eal/common/eal_common_mem* F: lib/librte_eal/common/eal_hugepages.h F: lib/librte_malloc/ -F: doc/guides/prog_guide/malloc_lib.rst +F: doc/guides/prog_guide/env_abstraction_layer.rst F: app/test/test_func_reentrancy.c F: app/test/test_malloc.c F: app/test/test_memory.c diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 25eb281fda..cd4d66686f 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -116,7 +116,6 @@ The physical address of the reserved memory for that memory zone is also returne .. note:: Memory reservations done using the APIs provided by the rte_malloc library are also backed by pages from the hugetlbfs filesystem. - However, physical address information is not available for the blocks of memory allocated in this way. Xen Dom0 support without hugetbls ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -366,3 +365,222 @@ We expect only 50% of CPU spend on packet IO. echo 50000 > pkt_io/cpu.cfs_quota_us +Malloc +------ + +The EAL provides a malloc API to allocate any-sized memory. + +The objective of this API is to provide malloc-like functions to allow +allocation from hugepage memory and to facilitate application porting. +The *DPDK API Reference* manual describes the available functions. + +Typically, these kinds of allocations should not be done in data plane +processing because they are slower than pool-based allocation and make +use of locks within the allocation and free paths. +However, they can be used in configuration code. + +Refer to the rte_malloc() function description in the *DPDK API Reference* +manual for more information. + +Cookies +~~~~~~~ + +When CONFIG_RTE_MALLOC_DEBUG is enabled, the allocated memory contains +overwrite protection fields to help identify buffer overflows. + +Alignment and NUMA Constraints +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The rte_malloc() takes an align argument that can be used to request a memory +area that is aligned on a multiple of this value (which must be a power of two). + +On systems with NUMA support, a call to the rte_malloc() function will return +memory that has been allocated on the NUMA socket of the core which made the call. +A set of APIs is also provided, to allow memory to be explicitly allocated on a +NUMA socket directly, or by allocated on the NUMA socket where another core is +located, in the case where the memory is to be used by a logical core other than +on the one doing the memory allocation. + +Use Cases +~~~~~~~~~ + +This API is meant to be used by an application that requires malloc-like +functions at initialization time. + +For allocating/freeing data at runtime, in the fast-path of an application, +the memory pool library should be used instead. + +Internal Implementation +~~~~~~~~~~~~~~~~~~~~~~~ + +Data Structures +^^^^^^^^^^^^^^^ + +There are two data structure types used internally in the malloc library: + +* struct malloc_heap - used to track free space on a per-socket basis + +* struct malloc_elem - the basic element of allocation and free-space + tracking inside the library. + +Structure: malloc_heap +"""""""""""""""""""""" + +The malloc_heap structure is used to manage free space on a per-socket basis. +Internally, there is one heap structure per NUMA node, which allows us to +allocate memory to a thread based on the NUMA node on which this thread runs. +While this does not guarantee that the memory will be used on that NUMA node, +it is no worse than a scheme where the memory is always allocated on a fixed +or random node. + +The key fields of the heap structure and their function are described below +(see also diagram above): + +* lock - the lock field is needed to synchronize access to the heap. + Given that the free space in the heap is tracked using a linked list, + we need a lock to prevent two threads manipulating the list at the same time. + +* free_head - this points to the first element in the list of free nodes for + this malloc heap. + +.. note:: + + The malloc_heap structure does not keep track of in-use blocks of memory, + since these are never touched except when they are to be freed again - + at which point the pointer to the block is an input to the free() function. + +.. _figure_malloc_heap: + +.. figure:: img/malloc_heap.* + + Example of a malloc heap and malloc elements within the malloc library + + +.. _malloc_elem: + +Structure: malloc_elem +"""""""""""""""""""""" + +The malloc_elem structure is used as a generic header structure for various +blocks of memory. +It is used in three different ways - all shown in the diagram above: + +#. As a header on a block of free or allocated memory - normal case + +#. As a padding header inside a block of memory + +#. As an end-of-memseg marker + +The most important fields in the structure and how they are used are described below. + +.. note:: + + If the usage of a particular field in one of the above three usages is not + described, the field can be assumed to have an undefined value in that + situation, for example, for padding headers only the "state" and "pad" + fields have valid values. + +* heap - this pointer is a reference back to the heap structure from which + this block was allocated. + It is used for normal memory blocks when they are being freed, to add the + newly-freed block to the heap's free-list. + +* prev - this pointer points to the header element/block in the memseg + immediately behind the current one. When freeing a block, this pointer is + used to reference the previous block to check if that block is also free. + If so, then the two free blocks are merged to form a single larger block. + +* next_free - this pointer is used to chain the free-list of unallocated + memory blocks together. + It is only used in normal memory blocks; on ``malloc()`` to find a suitable + free block to allocate and on ``free()`` to add the newly freed element to + the free-list. + +* state - This field can have one of three values: ``FREE``, ``BUSY`` or + ``PAD``. + The former two are to indicate the allocation state of a normal memory block + and the latter is to indicate that the element structure is a dummy structure + at the end of the start-of-block padding, i.e. where the start of the data + within a block is not at the start of the block itself, due to alignment + constraints. + In that case, the pad header is used to locate the actual malloc element + header for the block. + For the end-of-memseg structure, this is always a ``BUSY`` value, which + ensures that no element, on being freed, searches beyond the end of the + memseg for other blocks to merge with into a larger free area. + +* pad - this holds the length of the padding present at the start of the block. + In the case of a normal block header, it is added to the address of the end + of the header to give the address of the start of the data area, i.e. the + value passed back to the application on a malloc. + Within a dummy header inside the padding, this same value is stored, and is + subtracted from the address of the dummy header to yield the address of the + actual block header. + +* size - the size of the data block, including the header itself. + For end-of-memseg structures, this size is given as zero, though it is never + actually checked. + For normal blocks which are being freed, this size value is used in place of + a "next" pointer to identify the location of the next block of memory that + in the case of being ``FREE``, the two free blocks can be merged into one. + +Memory Allocation +^^^^^^^^^^^^^^^^^ + +On EAL initialisation, all memsegs are setup as part of the malloc heap. +This setup involves placing a dummy structure at the end with ``BUSY`` state, +which may contain a sentinel value if ``CONFIG_RTE_MALLOC_DEBUG`` is enabled, +and a proper :ref:`element header` with ``FREE`` at the start +for each memseg. +The ``FREE`` element is then added to the ``free_list`` for the malloc heap. + +When an application makes a call to a malloc-like function, the malloc function +will first index the ``lcore_config`` structure for the calling thread, and +determine the NUMA node of that thread. +The NUMA node is used to index the array of ``malloc_heap`` structures which is +passed as a parameter to the ``heap_alloc()`` function, along with the +requested size, type, alignment and boundary parameters. + +The ``heap_alloc()`` function will scan the free_list of the heap, and attempt +to find a free block suitable for storing data of the requested size, with the +requested alignment and boundary constraints. + +When a suitable free element has been identified, the pointer to be returned +to the user is calculated. +The cache-line of memory immediately preceding this pointer is filled with a +struct malloc_elem header. +Because of alignment and boundary constraints, there could be free space at +the start and/or end of the element, resulting in the following behavior: + +#. Check for trailing space. + If the trailing space is big enough, i.e. > 128 bytes, then the free element + is split. + If it is not, then we just ignore it (wasted space). + +#. Check for space at the start of the element. + If the space at the start is small, i.e. <=128 bytes, then a pad header is + used, and the remaining space is wasted. + If, however, the remaining space is greater, then the free element is split. + +The advantage of allocating the memory from the end of the existing element is +that no adjustment of the free list needs to take place - the existing element +on the free list just has its size pointer adjusted, and the following element +has its "prev" pointer redirected to the newly created element. + +Freeing Memory +^^^^^^^^^^^^^^ + +To free an area of memory, the pointer to the start of the data area is passed +to the free function. +The size of the ``malloc_elem`` structure is subtracted from this pointer to get +the element header for the block. +If this header is of type ``PAD`` then the pad length is further subtracted from +the pointer to get the proper element header for the entire block. + +From this element header, we get pointers to the heap from which the block was +allocated and to where it must be freed, as well as the pointer to the previous +element, and via the size field, we can calculate the pointer to the next element. +These next and previous elements are then checked to see if they are also +``FREE``, and if so, they are merged with the current element. +This means that we can never have two ``FREE`` memory blocks adjacent to one +another, as they are always merged into a single block. diff --git a/doc/guides/prog_guide/img/malloc_heap.png b/doc/guides/prog_guide/img/malloc_heap.png deleted file mode 100644 index 4449fda0dd..0000000000 Binary files a/doc/guides/prog_guide/img/malloc_heap.png and /dev/null differ diff --git a/doc/guides/prog_guide/img/malloc_heap.svg b/doc/guides/prog_guide/img/malloc_heap.svg new file mode 100644 index 0000000000..d6bcc84899 --- /dev/null +++ b/doc/guides/prog_guide/img/malloc_heap.svg @@ -0,0 +1,1052 @@ + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + struct malloc_heap + prev + prev + prev + prev + prev + prev + Free element header(struct malloc_elem, state = FREE) + Used element header(struct malloc_elem, state = BUSY) + size + Memseg 0 + Memseg 1 + prev + prev + next_free + next_free + free_head + prev + Dummy Elements:Size = 0State = BUSY + pad + Pad element header(struct malloc_elem, state = PAD) + Generic element pointers + Malloc element header:state = BUSYsize = <size>pad = <padsize> + size + Free / Unallocated data space + Pad element header:state = PADpad = padsize + Used / allocated data space + Padding / unavailable space + + diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst index 036640cc57..176f2c2fad 100644 --- a/doc/guides/prog_guide/index.rst +++ b/doc/guides/prog_guide/index.rst @@ -43,7 +43,6 @@ Programmer's Guide intro overview env_abstraction_layer - malloc_lib ring_lib mempool_lib mbuf_lib diff --git a/doc/guides/prog_guide/malloc_lib.rst b/doc/guides/prog_guide/malloc_lib.rst deleted file mode 100644 index 6418fab06c..0000000000 --- a/doc/guides/prog_guide/malloc_lib.rst +++ /dev/null @@ -1,233 +0,0 @@ -.. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of Intel Corporation nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -.. _Malloc_Library: - -Malloc Library -============== - -The librte_malloc library provides an API to allocate any-sized memory. - -The objective of this library is to provide malloc-like functions to allow allocation from hugepage memory -and to facilitate application porting. -The *DPDK API Reference* manual describes the available functions. - -Typically, these kinds of allocations should not be done in data plane processing -because they are slower than pool-based allocation and make use of locks within the allocation -and free paths. -However, they can be used in configuration code. - -Refer to the rte_malloc() function description in the *DPDK API Reference* manual for more information. - -Cookies -------- - -When CONFIG_RTE_MALLOC_DEBUG is enabled, the allocated memory contains overwrite protection fields -to help identify buffer overflows. - -Alignment and NUMA Constraints ------------------------------- - -The rte_malloc() takes an align argument that can be used to request a memory area -that is aligned on a multiple of this value (which must be a power of two). - -On systems with NUMA support, a call to the rte_malloc() function will return memory -that has been allocated on the NUMA socket of the core which made the call. -A set of APIs is also provided, to allow memory to be explicitly allocated on a NUMA socket directly, -or by allocated on the NUMA socket where another core is located, -in the case where the memory is to be used by a logical core other than on the one doing the memory allocation. - -Use Cases ---------- - -This library is needed by an application that requires malloc-like functions at initialization time, -and does not require the physical address information for the individual memory blocks. - -For allocating/freeing data at runtime, in the fast-path of an application, -the memory pool library should be used instead. - -If a block of memory with a known physical address is needed, -e.g. for use by a hardware device, a memory zone should be used. - -Internal Implementation ------------------------ - -Data Structures -~~~~~~~~~~~~~~~ - -There are two data structure types used internally in the malloc library: - -* struct malloc_heap - used to track free space on a per-socket basis - -* struct malloc_elem - the basic element of allocation and free-space tracking inside the library. - -Structure: malloc_heap -^^^^^^^^^^^^^^^^^^^^^^ - -The malloc_heap structure is used in the library to manage free space on a per-socket basis. -Internally in the library, there is one heap structure per NUMA node, -which allows us to allocate memory to a thread based on the NUMA node on which this thread runs. -While this does not guarantee that the memory will be used on that NUMA node, -it is no worse than a scheme where the memory is always allocated on a fixed or random node. - -The key fields of the heap structure and their function are described below (see also diagram above): - -* mz_count - field to count the number of memory zones which have been allocated for heap memory on this NUMA node. - The sole use of this value is, in combination with the numa_socket value, - to generate a suitable, unique name for each memory zone. - -* lock - the lock field is needed to synchronize access to the heap. - Given that the free space in the heap is tracked using a linked list, - we need a lock to prevent two threads manipulating the list at the same time. - -* free_head - this points to the first element in the list of free nodes for this malloc heap. - -.. note:: - - The malloc_heap structure does not keep track of either the memzones allocated, - since there is little point as they cannot be freed. - Neither does it track the in-use blocks of memory, - since these are never touched except when they are to be freed again - - at which point the pointer to the block is an input to the free() function. - -.. _figure_malloc_heap: - -.. figure:: img/malloc_heap.* - - Example of a malloc heap and malloc elements within the malloc library - - -Structure: malloc_elem -^^^^^^^^^^^^^^^^^^^^^^ -The malloc_elem structure is used as a generic header structure for various blocks of memory in a memzone. -It is used in three different ways - all shown in the diagram above: - -#. As a header on a block of free or allocated memory - normal case - -#. As a padding header inside a block of memory - -#. As an end-of-memzone marker - -The most important fields in the structure and how they are used are described below. - -.. note:: - - If the usage of a particular field in one of the above three usages is not described, - the field can be assumed to have an undefined value in that situation, for example, - for padding headers only the "state" and "pad" fields have valid values. - -* heap - this pointer is a reference back to the heap structure from which this block was allocated. - It is used for normal memory blocks when they are being freed, - to add the newly-freed block to the heap's free-list. - -* prev - this pointer points to the header element/block in the memzone immediately behind the current one. - When freeing a block, this pointer is used to reference the previous block to check if that block is also free. - If so, then the two free blocks are merged to form a single larger block. - -* next_free - this pointer is used to chain the free-list of unallocated memory blocks together. - Again, it is only used in normal memory blocks - on malloc() to find a suitable free block to allocate, - and on free() to add the newly freed element to the free-list. - -* state - This field can have one of three values: "Free", "Busy" or "Pad". - The former two, are to indicate the allocation state of a normal memory block, - and the latter is to indicate that the element structure is a dummy structure at the end of the start-of-block padding - (i.e. where the start of the data within a block is not at the start of the block itself, due to alignment constraints). - In this case, the pad header is used to locate the actual malloc element header for the block. - For the end-of-memzone structure, this is always a "busy" value, which ensures that no element, - on being freed, searches beyond the end of the memzone for other blocks to merge with into a larger free area. - -* pad - this holds the length of the padding present at the start of the block. - In the case of a normal block header, it is added to the address of the end of the header - to give the address of the start of the data area i.e. - the value passed back to the application on a malloc. - Within a dummy header inside the padding, this same value is stored, - and is subtracted from the address of the dummy header to yield the address of the actual block header. - -* size - the size of the data block, including the header itself. - For end-of-memzone structures, this size is given as zero, though it is never actually checked. - For normal blocks which are being freed, - this size value is used in place of a "next" pointer to identify the location of the next block of memory - (so that if it too is free, the two free blocks can be merged into one). - -Memory Allocation -~~~~~~~~~~~~~~~~~ - -When an application makes a call to a malloc-like function, -the malloc function will first index the lcore_config structure for the calling thread, -and determine the NUMA node idea of that thread. -That is used to index the array of malloc_heap structures, -and the heap_alloc () function is called with that heap as parameter, -along with the requested size, type and alignment parameters. - -The heap_alloc() function will scan the free_list for the heap, -and attempt to find a free block suitable for storing data of the requested size, -with the requested alignment constraints. -If no suitable block is found - for example, the first time malloc is called for a node, -and the free-list is NULL - a new memzone is reserved and set up as heap elements. -The setup involves placing a dummy structure at the end of the memzone -to act as a sentinel to prevent accesses beyond the end -(as the sentinel is marked as BUSY, the malloc library code will never attempt to reference it further), -and a proper element header at the start of the memzone. -This latter header identifies all space in the memzone, bar the sentinel value at the end, -as a single free heap element, and it is then added to the free_list for the heap. - -Once the new memzone has been set up, the scan of the free-list for the heap is redone, -and on this occasion should find the newly created, -suitable element as the size of memory reserved in the memzone is set to be -at least the size of the requested data block plus the alignment - -subject to a minimum size specified in the DPDK compile-time configuration. - -When a suitable, free element has been identified, the pointer to be returned to the user is calculated, -with the space to be provided to the user being at the end of the free block. -The cache-line of memory immediately preceding this space is filled with a struct malloc_elem header: -if the remaining space within the block is small e.g. <=128 bytes, -then a pad header is used, and the remaining space is wasted. -If, however, the remaining space is greater than this, then the single free element block is split into two, -and a new, proper, malloc_elem header is put before the returned data space. -[The advantage of allocating the memory from the end of the existing element is that -in this case no adjustment of the free list needs to take place - -the existing element on the free list just has its size pointer adjusted, -and the following element has its "prev" pointer redirected to the newly created element]. - -Freeing Memory -~~~~~~~~~~~~~~ - -To free an area of memory, the pointer to the start of the data area is passed to the free function. -The size of the malloc_elem structure is subtracted from this pointer to get the element header for the block. -If this header is of type "PAD" then the pad length is further subtracted from the pointer -to get the proper element header for the entire block. - -From this element header, we get pointers to the heap from which the block came -- and to where it must be freed, -as well as the pointer to the previous element, and, via the size field, -we can calculate the pointer to the next element. -These next and previous elements are then checked to see if they too are free, -and if so, they are merged with the current elements. -This means that we can never have two free memory blocks adjacent to one another, -they are always merged into a single block. diff --git a/doc/guides/prog_guide/overview.rst b/doc/guides/prog_guide/overview.rst index cef6ca76b5..5d378e50ea 100644 --- a/doc/guides/prog_guide/overview.rst +++ b/doc/guides/prog_guide/overview.rst @@ -112,6 +112,8 @@ The services provided by the EAL are: * Alarm operations +* Memory managenent (malloc) + The EAL is fully described in :ref:`Environment Abstraction Layer `. Core Components @@ -127,15 +129,6 @@ for high-performance packet processing applications. Core Components Architecture -Memory Manager (librte_malloc) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The librte_malloc library provides an API to allocate memory from the memzones created from the hugepages instead of the heap. -This helps when allocating large numbers of items that may become susceptible to TLB misses -when using typical 4k heap pages in the Linux user space environment. - -This memory allocator is fully described in :ref:`Malloc Library `. - Ring Manager (librte_ring) ~~~~~~~~~~~~~~~~~~~~~~~~~~