git.droids-corp.org - dpdk.git/log

test/mbuf: fix access to freed memory

Seen by ASan.

In the external buffer mbuf test, we check that the buffer is freed
by checking that its refcount is 0. This is not a valid condition,
because it accesses to an already freed area.

Fix this by setting a boolean flag in the callback when rte_free()
is actually called, and check this flag instead.

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

eal/memory: fix unused SIGBUS handler

Since its introduction in 2018, the SIGBUS handler was never registered,
and all related functions were unused.

A SIGBUS can be received by the application when accessing to hugepages
even if mmap() was successful, This happens especially when running
inside containers when there is not enough hugepages. In this case, we
need to recover. A similar scheme can be found in eal_memory.c.

Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

eal: fix mem alloc from control thread if socket 0 is unused

When using rte_malloc() from a control thread, the used heap is the one
from numa socket 0, which may not have available memory.

Fix this by selecting the first socket which has available memory.

Note: malloc_get_numa_socket() is only used from one .c file, so move
it there, and remove the inline keyword.

Fixes: b94580d6887e ("malloc: avoid unknown socket id")
Cc: stable@dpdk.org
Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

doc: clarify SRIOV activation with built-in VFIO

Currently, the documentation only contains instructions for enabling
SRIOV support for VFIO compiled as a module, but doesn't have any
instructions on how to do the same for cases where VFIO is built-in.
Add these instructions.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>

vfio: fix partial unmap

Partial unmap support was introduced in commit c13ca4e81cac
("vfio: fix DMA mapping granularity for IOVA as VA"), and with it
was added a check that dereferenced the IOMMU type to determine whether
partial ummapping is supported for currently configured IOMMU type. In
certain circumstances (such as when VFIO is supported, but no devices
were bound to the VFIO driver), the IOMMU type pointer can be NULL.

However, dereferencing of IOMMU type was guarded by access to the user
maps list - that is, we were always checking the user map list first,
and then, if we found a memory region that encloses the one we're trying
to unmap, we would have performed the IOMMU type check.

This ensured that the IOMMU type check will not cause any NULL pointer
dereferences, because in order for an IOMMU type check to have been
performed, there necessarily must have been at least one memory region
that was previously mapped successfully, and that implies having a
defined IOMMU type.

When commit 56259f7fc010 ("vfio: allow partially unmapping adjacent
memory") was introduced, the IOMMU type check was moved to
before we were traversing the user mem maps list, thereby introducing a
potential NULL dereference, because the IOMMU type access was no longer
guarded by the user mem maps list traversal.

Fix the issue by moving the IOMMU type check to after the user mem maps
traversal, thereby ensuring that by the time the check happens, the
IOMMU type is always valid.

Fixes: 56259f7fc010 ("vfio: allow partially unmapping adjacent memory")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Xuan Ding <xuan.ding@intel.com>

dma/idxd: fix truncated error code in status check

When checking if the DMA device is active, the result of the operand will
always be zero since the err_code is truncated to 8 bits which makes
checking the 31st bit impossible.

This is fixed by changing the type of err_code to uint32_t so that it is
not truncated.

Coverity issue: 373657
Fixes: 9449330a8458 ("dma/idxd: create dmadev instances on PCI probe")

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>

examples/dma: rename ioat application example

Since the APIs have been updated from rawdev to dmadev, the application
should also be renamed to match. This patch also includes the documentation
updates for the renaming.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>

examples/ioat: update naming to match change to dmadev

Existing functions, structures, defines etc need to be updated to reflect
the change to using the dmadev APIs.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>

examples/ioat: port application to dmadev API

The dmadev library abstraction allows applications to use the same APIs for
all DMA device drivers in DPDK. This patch updates the ioatfwd application
to make use of the new dmadev APIs, in turn making it a generic application
which can be used with any of the DMA device drivers.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>

examples/ioat: add signal-triggered device dump

Enable dumping device info via the signal handler. With this change, when a
SIGUSR1 is issued, the application will print a dump of all devices being
used by the application.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>

examples/ioat: add option to control stats print interval

Add a command line option to control the interval between stats prints.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>

examples/ioat: add option to control maximum frame size

Add command line option for setting the max frame size.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>

examples/ioat: add option to control DMA batch size

Add a commandline options to control the HW copy batch size in the
application.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>

examples/ioat: use always same lcore for enqueue/dequeue

Few changes in ioat sample behaviour:
- Always do SW copy for packet metadata (mbuf fields)
- Always use same lcore for both DMA requests enqueue and dequeue

Main reasons for that:
a) it is safer, as idxd PMD doesn't support MT safe enqueue/dequeue (yet).
b) sort of more apples to apples comparison with sw copy.
c) from my testing things are faster that way.

Documentation updates to reflect these changes are also included.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>

test: fix ring PMD initialisation

(bitratestats_autotest|latencystats_autotest|pdump_autotest) tests
generate a log of error messages like that:

test_packet_forward() line 104: Error sending packet to port 0
Send pkts Failed

These tests use of app/test/sample_packet_forward.* code.
This code creates a portid from a ring, but doesn't properly
configure/start it.
The fix adds code to configure/start given port before usage.

Fixes: 7a0935239b9e ("ethdev: make fast-path functions to use new flat array")
Fixes: a52966cd48fd ("test: add helpers using ring PMD Rx/Tx")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Marchand <david.marchand@redhat.com>

version: 21.11-rc1

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

maintainers: update for oxteontx2 regex

Removing Guy Kaneti
Adding Liron Himi

Signed-off-by: Liron Himi <lironh@marvell.com>

maintainers: update for NTB

Remove Xiaoyun and add Junfeng.

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>

usertools/devbind: conform to PEP8 recommended style

This fixes most of the warnings from the Flake8 style checker.
The ones remaining are long lines (we allow > 79 characters)
and a line break warning. The line break style changed in later
versions of PEP 8 and the tool is not updated.

https://www.flake8rules.com/rules/W503.html

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>

examples/l2fwd: add promiscuous mode option

The default behaviour of l2fwd is to exit if we are unable to turn
promiscuous mode on. On some aws instances turning promiscuous mode
on is not permitted. In such cases there should be a way to run the
application without promiscuous mode.

This patch allows user to turn promiscuous mode on via command line
parameter. l3fwd has a similar option available.

Signed-off-by: Sarosh Arif <sarosh.arif@emumba.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

app/flow-perf: export some config as runtime options

Some options are needed in the runtime many times, so leaving
it during compilation is not correct. As a result some options
has been exported into command line options to be used at run
time.

The options exported are:
--txq=N
--rxq=N
--txd=N
--rxd=N
--mbuf-size=N
--mbuf-cache-size=N
--total-mbuf-count=N

Signed-off-by: Wisam Jaddo <wisamm@nvidia.com>
Reviewed-by: Alexander Kozyrev <akozyrev@nvidia.com>

test: test control thread creation

Add a testcase to test launching of control threads.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>

eal: simplify control thread creation

Remove the usage of pthread barrier and replace it with
synchronization using atomic variable.
This also removes the use of reference count required to synchronize
freeing the memory.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>

interrupts: extend event list

Dynamically allocating the efds and elist array of intr_handle
structure, based on size provided by user. Eg size can be
MSIX interrupts supported by a PCI device.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>

interrupts: rename device specific file descriptor

VFIO/UIO are mutually exclusive, storing file descriptor in a single
field is enough.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>

interrupts: make interrupt handle structure opaque

Moving interrupt handle structure definition inside a EAL private
header to make its fields totally opaque to the outside world.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>

drivers: remove direct access to interrupt handle

Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the drivers access the interrupt handle fields.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>

lib: remove direct access to interrupt handle

Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the libraries access the interrupt handle fields.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>

alarm: remove direct access to interrupt handle

Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the libraries access the interrupt handle fields.

Implementing alarm cleanup routine, where the memory allocated
for interrupt instance can be freed.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>

test/interrupts: remove direct access to interrupt handle

Updating the interrupt testsuite to make use of interrupt
handle get set APIs.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>

interrupts: remove direct access to interrupt handle

Making changes to the interrupt framework to use interrupt handle
APIs to get/set any field.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>

interrupts: add allocator and accessors

Prototype/Implement get set APIs for interrupt handle fields.
User won't be able to access any of the interrupt handle fields
directly while should use these get/set APIs to access/manipulate
them.

Internal interrupt header i.e. rte_eal_interrupt.h is rearranged,
as APIs defined are moved to rte_interrupts.h and epoll specific
definitions are moved to a new header rte_epoll.h.
Later in the series rte_eal_interrupt.h will be removed.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>

eal/windows: fix IOVA mode detection and handling

Windows EAL did not detect IOVA mode and worked incorrectly
if physical addresses could not be obtained
(if virt2phys driver was missing or inaccessible).
In this case, rte_mem_virt2iova() reported RTE_BAD_IOVA for any address.
Inability to obtain IOVA, be it PA or VA, should cause a failure
for the DPDK allocator, but it was hidden by the implementation,
so allocations did not fail when they should.
The mode when DPDK cannot obtain PA but can work is IOVA-as-VA mode.
However, rte_eal_iova_mode() always returned RTE_IOVA_DC
(while it should only ever return RTE_IOVA_PA or RTE_IOVA_VA),
because IOVA mode detection was not implemented.

Implement IOVA mode detection:
1. Always allow to force --iova-mode=va.
2. Allow to force --iova-mode=pa only if virt2phys is available.
3. If no mode is forced and virt2phys is available,
select the mode according to bus requests, default to PA.
4. If no mode is forced but virt2phys is unavailable, default to VA.
Fix rte_mem_virt2iova() by returning VA when using IOVA-as-VA.
Fix rte_eal_iova_mode() by returning the selected mode.

Fixes: 2a5d547a4a9b ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org
Reported-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>

mem: add telemetry infos

Registering new telemetry callbacks to list named (memzones)
and unnamed (malloc) memory reserved and return information
based on arguments provided by user.

Example:
Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 21.11.0-rc0", "pid": 59754, "max_output_len": 16384}
Connected to application: "dpdk-testpmd"
-->
--> /eal/memzone_list
{"/eal/memzone_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}
-->
-->
--> /eal/memzone_info,0
{"/eal/memzone_info": {"Zone": 0, "Name": "rte_eth_dev_data",    \
"Length": 225408, "Address": "0x13ffc0280", "Socket": 0, "Flags": 0, \
"Hugepage_size": 536870912, "Hugepage_base": "0x120000000",   \
"Hugepage_used": 1}}
-->
-->
--> /eal/memzone_info,6
{"/eal/memzone_info": {"Zone": 6, "Name": "MP_mb_pool_0_0",  \
"Length": 669918336, "Address": "0x15811db80", "Socket": 0,  \
"Flags": 0, "Hugepage_size": 536870912, "Hugepage_base": "0x140000000", \
"Hugepage_used": 2}}
-->
-->
--> /eal/memzone_info,14
{"/eal/memzone_info": null}
-->
-->
--> /eal/heap_list
{"/eal/heap_list": [0]}
-->
-->
--> /eal/heap_info,0
{"/eal/heap_info": {"Head id": 0, "Name": "socket_0",     \
"Heap_size": 1610612736, "Free_size": 927645952,          \
"Alloc_size": 682966784, "Greatest_free_size": 529153152, \
"Alloc_count": 482, "Free_count": 2}}

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

rib: fix IPv6 depth mask

Fixes: 03b8372a9a73 ("rib: fix max depth IPv6 lookup")
Cc: stable@dpdk.org
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

lpm6: fix buffer overflow

This patch fixes buffer overflow reported by ASAN,
please reference https://bugs.dpdk.org/show_bug.cgi?id=819

The rte_lpm6 keeps routing information for control plane purpose
inside the rte_hash table which uses rte_jhash() as a hash function.
From the rte_jhash() documentation: If input key is not aligned to
four byte boundaries or a multiple of four bytes in length,
the memory region just after may be read (but not used in the
computation).
rte_lpm6 uses 17 bytes keys consisting of IPv6 address (16 bytes) +
depth (1 byte).

This patch increases the size of the depth field up to uint32_t
and sets the alignment to 4 bytes.

Bugzilla ID: 819
Fixes: 86b3b21952a8 ("lpm6: store rules in hash table")
Cc: stable@dpdk.org
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

hash: fix Doxygen comment of Toeplitz file

Fixes: 7574c3ef7428 ("hash: add toeplitz algorithm used by RSS")
Cc: stable@dpdk.org
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>

test/ring: relax memory ordering for stress test

wrk_cmd variable is used to signal the worker thread to start
or stop the stress test loop. Relaxed barriers are used
to achieve the same.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>

eal: fix memory ordering around lcore task accesses

Ensure that the memory operations before the call to
rte_eal_remote_launch are visible to the worker thread.
Use the function pointer to execute in worker thread
as the guard variable.

Ensure that the memory operations in worker thread, that happen
before it returns the status of the assigned function, are
visible to the main thread. Use the variable containing the
lcore's state as the guard variable.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>

eal: remove FINISHED lcore state

FINISHED state seems to be used to indicate that the worker's update
of the 'state' is not visible to other threads. There seems to be no
requirement to have such a state.

Since the FINISHED state is removed, the API rte_eal_wait_lcore
is updated to always return the status of the last function that
ran in the worker core.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>

eal: reset lcore task callback and argument

In the rte_eal_remote_launch function, the lcore function
pointer is checked for NULL. However, the pointer is never
reset to NULL. Reset the lcore function pointer and argument
after the worker has completed executing the lcore function.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>

config: add option for atomic mbuf reference counting

RTE_MBUF_REFCNT_ATOMIC = 0 is not necessary for applications like
Seastar, where it's safe to assume that the mbuf refcnt is only
updated by a single core only.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

ci: update Meson option for generic build

The way we're building DPDK in CI, with -Dmachine=default, has not been
updated when the option got replaced to preserve a backwards-complatible
build call to facilitate ABI verification between DPDK versions. Update
the call to use -Dplatform=generic, which is the most up to date way to
execute the same build which is now present in all DPDK versions the ABI
check verifies.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Acked-by: Aaron Conole <aconole@redhat.com>

eal/x86: avoid cast-align warning in memcpy functions

Functions and macros in x86 rte_memcpy.h may cause cast-align warnings,
when using strict cast align flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static

For example:
In file included from main.c:24:
/dpdk/build/include/rte_memcpy.h: In function 'rte_mov16':
/dpdk/build/include/rte_memcpy.h:306:25: warning: cast increases
required alignment of target type [-Wcast-align]
306 | xmm0 = _mm_loadu_si128((const __m128i *)src);
| ^

As the code assumes correct alignment, add first a (void *) or (const
void *) castings, to avoid the warnings.

Fixes: 9484092baad3 ("eal/x86: optimize memcpy for AVX512 platforms")
Cc: stable@dpdk.org
Signed-off-by: Eli Britstein <elibr@nvidia.com>

mbuf: avoid cast-align warning in data offset macro

In rte_pktmbuf_mtod_offset macro, there is a casting from char * to type
't', which may cause cast-align warning when using strict cast align
flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static

main.c: In function 'l2fwd_mac_updating':
/dpdk/build/include/rte_mbuf_core.h:719:3: warning: cast increases
required alignment of target type [-Wcast-align]
  719 |  ((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
      |   ^
/dpdk/build/include/rte_mbuf_core.h:733:32: note: in expansion of macro
'rte_pktmbuf_mtod_offset'
  733 | #define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)
      |                                ^~~~~~~~~~~~~~~~~~~~~~~

As the code assumes correct alignment, add first a (void *) casting, to
avoid the warning.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

net: avoid cast-align warning in VLAN insert function

In rte_vlan_insert there is a casting of rte_pktmbuf_prepend returned
value to (struct rte_ether_hdr *), which causes cast-align warning when
using strict cast align flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static

In file included from main.c:35:
/dpdk/build/include/rte_ether.h:370:7: warning: cast increases required
alignment of target type [-Wcast-align]
370 | nh = (struct rte_ether_hdr *)
| ^

As the code assumes correct alignment, add first a (void *) casting, to
avoid the warning.

Fixes: c974021a5949 ("ether: add soft vlan encap/decap")
Cc: stable@dpdk.org
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

doc: fix default mempool option in guides

This option should be prefixed with -- for consistency with others.

Fixes: a103a97e7191 ("eal: allow user to override default mempool driver")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>

usertools/pmdinfo: fix plugin auto scan

Migration to argparse was incomplete.

$ dpdk-pmdinfo.py -p $(which dpdk-testpmd)
Traceback (most recent call last):
  File "/usr/bin/dpdk-pmdinfo.py", line 626, in <module>
    main()
  File "/usr/bin/dpdk-pmdinfo.py", line 596, in main
    exit(scan_for_autoload_pmds(args[0]))
TypeError: 'Namespace' object does not support indexing

Fixes: 81255f27c65c ("usertools: replace optparse with argparse")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Robin Jarry <robin.jarry@6wind.com>

mempool: fix non-IO flag inference

When mempool had been created with RTE_MEMPOOL_F_NO_IOVA_CONTIG flag
but later populated with valid IOVA, RTE_MEMPOOL_F_NON_IO was unset,
while it should be kept. The unit test did not catch this
because rte_mempool_populate_default() it used was populating
with RTE_BAD_IOVA.

Keep setting RTE_MEMPOOL_NON_IO at an empty mempool creation
and add an assert for it in the unit test (remove the separate case).
Do not reset the flag if RTE_MEMPOOL_F_ON_IOVA_CONTIG is set.

Fixes: 11541c5c81dd ("mempool: add non-IO flag")

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

kni: fix build for SLES15-SP3

As suse version numbering is inconsistent to determine Linux kernel
API to be used. In this patch we check parameter of 'ndo_tx_timeout'
API directly from the kernel source. This is done only for suse build.

Bugzilla ID: 812
Cc: stable@dpdk.org
Signed-off-by: Aman Singh <aman.deep.singh@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Longfeng Liang <longfengx.liang@intel.com>

sched: promote a function as stable

This API was introduced in 18.05, therefore removing
experimental tag to promote it to stable state

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>

pipeline: support action annotations

Enable restricting the scope of an action to regular table entries or
to the table default entry in order to support the P4 language
tableonly or defaultonly annotations.

Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

port: configure loop count for source port

Add support for configurable number of loops through the input PCAP
file for the source port. Added an additional parameter to source
port CLI command.

Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

pipeline: fix instruction label check

The instruction_data array was incorrectly indexed, which resulted in
the array index getting out of bounds and sometimes segfault.

Fixes: a1711f (“pipeline: add SWX Rx and extract instructions“)
Cc: stable@dpdk.org
Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

test/event: fix timer adapter creation test

Removed freeing of unallocated mempool in event timer adapter create
unit test.

Fixes: d1f3385d0076 ("test: add event timer adapter auto-test")
Cc: stable@dpdk.org
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>

test/devargs: fix memory leak

In layer argument test function, kvargs are parsed and checked without
free. This patch calls rte_kvargs_free() function to avoid memory leak.

Coverity issue: 373631
Fixes: a4975cd20dca ("test: add devargs test cases")

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

net: fix build with pedantic for L2TPv2 definitions

Build is broken on RHEL7 following introduction of this new protocol.

Fixes: 3a929df1f286 ("ethdev: support L2TPv2 and PPP procotol")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>

mbuf: add namespace to offload flags

Fix the mbuf offload flags namespace by adding an RTE_ prefix to the
name. The old flags remain usable, but a deprecation warning is issued
at compilation.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>

devtools: add cocci script to rename mbuf offload flags

The mbuf offload flags do not match the DPDK namespace (they are not
prefixed by RTE_). This coccinelle script is used in the next commit to
do the replacement in the code.

A draft script was initially submitted [1] in commit d7595795b760 ("doc:
announce renaming of mbuf offload flags"), but dropped by mistake at
commit.

1: http://inbox.dpdk.org/dev/20210730155700.32574-1-olivier.matz@6wind.com

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

mbuf: mark old VLAN offload flags as deprecated

The flags PKT_TX_VLAN_PKT and PKT_TX_QINQ_PKT are
marked as deprecated since commit 380a7aab1ae2 ("mbuf: rename deprecated
VLAN flags") (2017). But they were not using the RTE_DEPRECATED
macro, because it did not exist at this time. Add it, and replace
usage of these flags.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

mbuf: remove duplicate definition of cksum offload flags

The flags PKT_RX_L4_CKSUM_BAD and PKT_RX_IP_CKSUM_BAD are defined
twice with the same value. Remove one of the occurrence, which was
marked as "deprecated".

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

compress/mlx5: support partial transformation

Currently compress, decompress and dma are allowed
only when all 3 capabilities are on.
A case where the user wants decompress offload, if
decompress capability is on but one of compress,
dma is off, is not allowed.
Split compress/decompress/dma support check to allow
partial transformations.

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

crypto/cnxk: allow different cores in pending queue

Rework pending queue to allow producer and consumer cores to be
different.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>

common/cnxk: align CPT queue depth to power of 2

Use CPT LF queue depth as power of 2 to aid in masked checks for pending
queue.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

ipsec: fix telemetry text

Set correct tunnel type telemetry text - tunnel type
was wrongly set as IPv4-UDP for all types.

Fixes: bf5b65a8e781 ("ipsec: support SA telemetry")

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

cryptodev: move device-specific structures

The device specific structures - rte_cryptodev
and rte_cryptodev_data are moved to cryptodev_pmd.h
to hide it from the applications.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Tested-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

cryptodev: use new flat array in fast path API

Rework fast-path cryptodev functions to use rte_crypto_fp_ops[].
While it is an API/ABI breakage, this change is intended to be
transparent for both users (no changes in user app is required) and
PMD developers (no changes in PMD is required).

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

drivers/crypto: invoke probing finish function

Invoke event_dev_probing_finish() function at the end of probing,
this function sets the function pointers in the fp_ops flat array
in case of secondary process.
For primary process, fp_ops is updated in rte_cryptodev_start().

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Matan Azrad <matan@nvidia.com>

cryptodev: add device probing finish function

Added a rte_cryptodev_pmd_probing_finish API which
need to be called by the PMD after the device is initialized
completely. This will set the fast path function pointers
in the flat array for secondary process. For primary process,
these are set in rte_cryptodev_start.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>

crypto/scheduler: use proper API for device start/stop

The worker PMDs were using direct device start/stop
functions rather than rte_cryptodev_start(),
so rte_crypto_fp_ops never get set. This patch calls
the rte_cryptodev_start and stop APIs which start and
stop devices properly and fp_ops get set.

Reported-by: Ciara Power <ciara.power@intel.com>
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

cryptodev: move inline APIs into separate structure

Move fastpath inline function pointers from rte_cryptodev into a
separate structure accessed via a flat array.
The intention is to make rte_cryptodev and related structures private
to avoid future API/ABI breakages.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Tested-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

cryptodev: allocate max space for internal queue array

At queue_pair config stage, allocate memory for maximum
number of queue pair pointers that a device can support.

This will allow fast path APIs(enqueue_burst/dequeue_burst) to
refer pointer to internal QP data without checking for currently
configured QPs.
This is required to hide the rte_cryptodev and rte_cryptodev_data
structure from user.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

cryptodev: separate out internal structures

A new header file rte_cryptodev_core.h is added and all
internal data structures which need not be exposed directly to
application are moved to this file. These structures are mostly
used by drivers, but they need to be in the public header file
as they are accessed by datapath inline functions for
performance reasons.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Tested-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

test/crypto: enable chacha_poly PMD

An autotest is added for the new chacha20_poly1305 PMD.
A new test case is also added for SGL test.

Signed-off-by: Kai Ji <kai.ji@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

crypto/ipsec_mb: add chacha_poly PMD

Add in new chacha20_poly1305 PMD to the ipsec_mb framework.

Signed-off-by: Kai Ji <kai.ji@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

crypto/ipsec_mb: move zuc PMD

This patch removes the crypto/zuc folder and gathers all zuc PMD
implementation specific details into two files,
pmd_zuc.c and pmd_zuc_priv.h in the crypto/ipsec_mb folder.

Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

crypto/ipsec_mb: support snow3g digest appended ops

This patch enables out-of-place auth-cipher operations where
digest should be encrypted along with the rest of raw data.
It also adds support for partially encrypted digest when using
auth-cipher operations.

Signed-off-by: Damian Nowak <damianx.nowak@intel.com>
Signed-off-by: Kai Ji <kai.ji@intel.com>
Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

crypto/ipsec_mb: move snow3g PMD

This patch removes the crypto/snow3g folder and gathers all snow3g PMD
implementation specific details into a single file,
pmd_snow3g.c in the crypto/ipsec_mb folder.

Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

crypto/ipsec_mb: move kasumi PMD

This patch removes the crypto/kasumi folder and gathers all kasumi PMD
implementation specific details into a single file,
pmd_kasumi.c in the crypto/ipsec_mb folder.

Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

crypto/ipsec_mb: move aesni_gcm PMD

This patch removes the crypto/aesni_gcm folder and gathers all
aesni-gcm PMD implementation specific details into a single file,
pmd_aesni_gcm.c in the crypto/ipsec_mb folder.
A redundant check for iv length is removed.

GCM ops are stored in the queue pair for multi process support, they
are updated during queue pair setup for both primary and secondary
processes.

GCM ops are also set per lcore for the CPU crypto mode.

Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

test/crypto: add ZUC-256 vectors

Add extra ZUC-EIA3-256 and ZUC-EEA3-256 test vectors.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

test/crypto: check auth parameters

Check for auth parameters in the transform to verify if a test case is
supported by the crypto device under test.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

test/crypto: check cipher parameters

Check for cipher parameters in the transform to verify if a test case
is supported by the crypto device under test.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

crypto/ipsec_mb: support ZUC-256 for aesni_mb

Add support for ZUC-EEA3-256 and ZUC-EIA3-256.
Only 4-byte tags supported for now.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

crypto/ipsec_mb: move aesni_mb PMD

This patch removes the crypto/aesni_mb folder and gathers all
aesni-mb PMD implementation specific details into a single file,
pmd_aesni_mb.c in crypto/ipsec_mb.

Now that intel-ipsec-mb v1.0 is the minimum supported version, old
macros can be replaced with the newer macros supported by this version.

Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

crypto/ipsec_mb: support multi-process

The ipsec_mb SW PMD now has multiprocess support.
The queue-pair IMB_MGR is stored in a memzone instead of being allocated
externally by the Intel IPSec MB library, when v1.1 is used.
If v1.0 is used, multi process is not supported, and allocation is
done as before.
The secondary process needs to reconfigure the queue-pair to allow for
IMB_MGR function pointers be updated.

Intel IPsec MB library version 1.1 is required for this support.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

crypto/ipsec_mb: introduce IPsec_mb framework

This patch introduces the new framework to share common code between
the SW crypto PMDs that depend on the intel-ipsec-mb library.
This change helps to reduce future effort on the code maintenance and
feature updates.

The PMDs that will be added to this framework in subsequent patches are:
  - AESNI MB
  - AESNI GCM
  - CHACHA20_POLY1305
  - KASUMI
  - SNOW3G
  - ZUC

The use of these PMDs will not change, they will still be supported for
x86, and will use the same EAL args as before.

The minimum required version for the intel-ipsec-mb library is now v1.0.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Akhil Goyal <gakhil@marvell.com>

ethdev: avoid usage of ULL for 64-bit unsigned constants

Use UINT64_C() macro instead.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: replace single bit masks with macros

The macros RTE_BIT32 and RTE_BIT64 are used to replace single bit masks.

Do not switch VLAN offload flags since type is not fixed size.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: add namespace

Add 'RTE_ETH' namespace to all enums & macros in a backward compatible
way. The macros for backward compatibility can be removed in next LTS.
Also updated some struct names to have 'rte_eth' prefix.

All internal components switched to using new names.

Syntax fixed on lines that this patch touches.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Wisam Jaddo <wisamm@nvidia.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>

test/bonding: fix after hiding ethdev internal structures

link bounding auto-test internally creates emulated ethdev.
Some tests change Rx/Tx functions of this emulated device on the fly:
by directly modifying rte_eth_dev fields and without doing stop/start
for these devices.
As now ethdev uses rte_eth_fp_ops[] for fast-path functions, these
direct changes doesn't make expected effect.
Fix the problem by guarding fast-path functions changes with
rte_eth_dev_stop()/rte_eth_dev_start().

Fixes: 7a0935239b9e ("ethdev: make fast-path functions to use new flat array")

Reported-by: Lewei Yang <leweix.yang@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

drivers/net: fix removing jumbo offload flag

After DEV_RX_OFFLOAD_JUMBO_FRAME flag removed, drivers give jumbo frame
decisions based on MTU value checks, but some of the checks were wrong
by mistake, causing device initialization to fail, fixing them.

Fixes: b563c1421282 ("ethdev: remove jumbo offload flag")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Yu Jiang <yux.jiang@intel.com>

doc: remove jumbo offload feature

Jumbo offload is no more announced as capability, and
'DEV_RX_OFFLOAD_JUMBO_FRAME' offload flag is removed.

This patch is also removing 'Jumbo frame' feature from documentation.

Fixes: b563c1421282 ("ethdev: remove jumbo offload flag")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

net/af_xdp: fix max Rx packet length

Commit 1bb4a528c41f ("ethdev: fix max Rx packet length") clarified the
expected usage of the max_rx_pktlen and max_mtu values and implemented
some extra checks on these values to ensure they are sane. After this,
the AF_XDP PMD fails to initialise. The value for max_rx_pktlen which
represents the max size of the Ethernet frame was set to ETH_FRAME_LEN
(1514) and the max_mtu which represents the size of the payload was set
to the max size of the Ethernet frame. This did not make sense, as
naturally the maximum frame size should be greater than the payload
size.

Fix this by setting the max_rx_pktlen equal to the max size of the
Ethernet frame as expected, and the max MTU equal to the max_rx_pktlen
less the overhead which is set to the size of an Ethernet header plus
CRC.

Fixes: 1bb4a528c41f ("ethdev: fix max Rx packet length")

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: forbid MTU set before device configure

rte_eth_dev_configure() always sets MTU to either dev_conf.rxmode.mtu
or RTE_ETHER_MTU if application doesn't provide the value.
So, there is no point to allow rte_eth_dev_set_mtu() before since
set value will be overwritten on configure anyway.

Fixes: 1bb4a528c41f ("ethdev: fix max Rx packet length")

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove unused L2 tunnel mask defines

Fixes: cf47acc0f9ba ("ethdev: remove L2 tunnel offload control API")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

app/testpmd: fix packet burst spreading stats

RX/TX functions (rte_eth_rx_burst/rte_eth_tx_burst) get 'nb_pkts'
argument, which specifies the maximum number to receive/transmit.
It can be 0..nb_pkts, meaning nb_pkts+1 options.
Testpmd can provide statistics of the burst sizes ('set
record-burst-stats on') by incrementing an array cell of index
<burst-size>. This array is mistakenly [MAX_PKT_BURST] size. Receiving
the maximum burst will cause out of bound write.
Enlarge the spread stats array by one cell to fix it.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/hns3: add runtime config for mailbox limit time

Current, the max waiting time for MBX response is 500ms, but in
some scenarios, it is not enough. Since it depends on the response
of the kernel mode driver, and its response time is related to the
scheduling of the system. In this special scenario, most of the
cores are isolated, and only a few cores are used for system
scheduling. When a large number of services are started, the
scheduling of the system will be very busy, and the reply of the
mbx message will time out, which will cause our PMD initialization
to fail.

This patch add a runtime config to set the max wait time. For the
above scenes, users can adjust the waiting time to a suitable value
by themselves.

Fixes: 463e748964f5 ("net/hns3: support mailbox")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

app/testpmd: add forwarding engine for shared Rx queue

To support shared Rx queue, this patch introduces dedicate forwarding
engine. The engine groups received packets by mbuf->port into sub-group,
updates stream statistics and simply frees packets.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>

app/testpmd: force shared Rx queue polled on same core

Shared Rx queue must be polled on same core. This patch checks and stops
forwarding if shared RxQ being scheduled on multiple
cores.

It's suggested to use same number of Rx queues and polling cores.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>