git.droids-corp.org - dpdk.git/log

examples/pipeline: support hash functions

Add example for hash function operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

pipeline: support hash functions

Add support for hash functions that compute a signature for an array
of bytes read from a packet header or meta-data. Useful for flow
affinity-based load balancing.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

examples/pipeline: improve learner table timers

Added the rearm counter to the statistics. Updated the learner table
example to the new learner table timer operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

pipeline: improve learner table timers

Enable the pipeline to use the improved learner table timer operation
through the new "rearm" instruction.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

table: improve learner table timers

Previously, on lookup hit, the hit key had its timer automatically
rearmed with the same timeout in order to prevent its expiration. Now,
a broader set of actions is available on lookup hit, which has to be
managed explicitly: the key can have its timer rearmed with the same
or with a different timeout, or the key timer can be left unmodified.
The latter option allows the key to expire naturally when the timer
eventually runs out, unless the key is hit again and its timer rearmed
at that point. Needed by the TCP connection tracking state machine.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

examples/pipeline: add packet recirculation example

Add example program to illustrate packet recirculation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

pipeline: support packet recirculation

Add support for packet recirculation. The current packet is flagged
for recirculation using the new "recirculate" instruction; on TX, this
flag causes the packet to execute the full pipeline again as if it was
a new packet, except the packet meta-data is preserved. The new
"recircid" instruction can be used to read the pass number in case the
packet goes several times through the pipeline.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

examples/pipeline: add packet mirroring example

Add example program to illustrate packet mirroring.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

examples/pipeline: support packet mirroring

Add CLI commands for packet mirroring.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Signed-off-by: Kamalakannan R <kamalakannan.r@intel.com>

pipeline: support packet mirroring

The packet mirroring is configured through slots and sessions, with
the number of slots and sessions set at init.

The new "mirror" instruction assigns one of the existing sessions to a
specific slot, which results in scheduling a mirror operation for the
current packet to be executed later at the time the packet is either
transmitted or dropped.

Several copies of the same input packet can be mirrored to different
output ports by using multiple slots.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Kamalakannan R <kamalakannan.r@intel.com>

port: support packet mirroring

Add packet clone operation to the output ports in order to support
packet mirroring.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Kamalakannan R <kamalakannan.r@intel.com>

pipeline: support default action arguments

Add support for arguments to the default action of regular and learner
tables at initialization time. Until now, only default actions with no
arguments were accepted in the .spec file.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>

pipeline: fix emit instruction for invalid headers

Fix the emit instruction for the pathological case of all headers to
be emitted being invalid. In this case, the for loop was essentially
skipped and the last emitted header (or an invalid memory location)
getting corrupted by setting its size to 0 through the assignment to
ho->n_bytes right after the for loop.

Fixes: d60dbdc88a3e ("pipeline: create inline functions for emit instruction")
Cc: stable@dpdk.org
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Venkata Suresh Kumar P <venkata.suresh.kumar.p@intel.com>
Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>

devtools: fix null test for NUMA systems

On NUMA systems, default cores (0 and 1) might be on different memory
nodes. Double the amount of memory.

Fixes: 9e6b36c34ce9 ("app/testpmd: reduce memory consumption")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>

doc: rewrite shell scripts in Python

Shell used in documentation generation could not run on Windows.
Rewrite scripts in Python.
New scripts use proper path separators and handle paths with spaces.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>

doc: fix API index Markdown syntax

API documentation index had spaces between link caption and URL,
which may be unsupported by some Markdown implementations.
That is, "[caption](URL)" is valid but "[caption] (URL)" is not.
The problematic behavior is observed with Doxygen on Windows.
Remove the spaces.
Unfortunately, Markdown syntax is not formally specified.

Fixes: 9bf486e606b0 ("doc: generate HTML for API with doxygen")
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

doc: simplify CSS customization for Doxygen

CSS for API documentation was customized by a shell script
modifying the file that Doxygen produces.
This way CSS code is kept in a script and an extra build step is added.
Move custom style to a plain CSS file.
Use Doxygen capability to attach this extra stylesheet.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

mbuf: dump outer VLAN

Enable printing of the outer VLAN if flags indicate it is present.

Cc: stable@dpdk.org
Signed-off-by: Ben Magistro <koncept1@gmail.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

rib: mark error checks with unlikely

Also mark some conditional functions as const.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

rib: fix traversal with /32 route

If a /32 route is entered in the RIB the code to traverse
will not see end of the tree. This is due to trying
to do a negative shift which is an undefined in C.

Fix by checking for max depth as is already done in rib6.

Fixes: 5a5793a5ffa2 ("rib: add RIB library")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

ip_frag: add IPv4 options fragment

According to RFC791,the options may appear or not in datagrams.
They must be implemented by all IP modules (host and gateways).
What is optional is their transmission in any particular datagram,
not their implementation. So we have to deal with it during the
fragmenting process.
Add some test data for the IPv4 header optional field fragmenting.

Signed-off-by: Huichao Cai <chcchc88@163.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

sched: enable traffic class oversubscription conditionally

Added new API for flag to enable or disable TC oversubscription
for best effort traffic class at subport level.

By default TC OV is enabled.

Signed-off-by: Marcin Danilewicz <marcinx.danilewicz@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

maintainers: update for nfp

Add Chaoyong as nfp maintainer.

Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

dma/dpaa2: support statistics

This patch support DMA read and reset statistics operations.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

dma/dpaa2: support DMA operations

This patch support copy, submit, completed and
completed status functionality of DMA driver.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

dma/dpaa2: add driver-specific configuration API

Add additional PMD APIs for DPAA2 QDMA driver for configuring
RBP, Ultra Short format, and Scatter Gather support

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

dma/dpaa2: support basic operations

This patch support basic DMA operations which includes
device capability and channel setup.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

dma/dpaa2: introduce driver skeleton

The DPAA2 DMA driver is an implementation of the dmadev APIs,
that provide means to initiate a DMA transaction from CPU.
Earlier this was part of RAW driver, but with DMA drivers
added as separate flavor of drivers, this driver is being
moved to DMA drivers.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

raw/dpaa2_qdma: remove driver

With DMA devices supported as a separate flavor of devices,
the DPAA2 QDMA driver is moved in the DMA devices.

This change removes the DPAA2 QDMA driver from raw devices.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

app/acl: support different formats for IPv6 address

Within ACL rule IPv6 address can be represented in different ways:
either as 4x4B fields, or as 2x8B fields.
Till now, only first format was supported.
Extend test-acl to support both formats, mainly for testing and
demonstrating purposes.
To control desired behavior '--ipv6' command-line option is extended
to accept an optional argument:
To be more precise:
'--ipv6' - use 4x4B fields format (default behavior)
'--ipv6=4B' - use 4x4B fields format (default behavior)
'--ipv6=8B' - use 2x8B fields format

Also replaced home brewed IPv4/IPv6 address parsing with inet_pton() calls.

Signed-off-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

acl: fix rules with 8-byte field size

In theory ACL library allows fields with 8B long.
Though in practice they usually not used, not tested,
and as was revealed by Ido, this functionality is not working properly.
There are few places inside ACL build code-path that need to be addressed.

Bugzilla ID: 673
Fixes: dc276b5780c2 ("acl: new library")
Cc: stable@dpdk.org
Reported-by: Ido Goshen <ido@cgstowernetworks.com>
Signed-off-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
Tested-by: Ido Goshen <ido@cgstowernetworks.com>

avoid AltiVec keyword vector

The AltiVec header file is defining "vector", except in C++ build.
The keyword "vector" may conflict easily.
As a rule, it is better to use the alternative keyword "__vector",
so we will be able to #undef vector after including AltiVec header.

Later it may become possible to #undef vector in rte_altivec.h
with a compatibility breakage.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>

test: avoid hang if queues are full and Tx fails

Current pmd_perf_autotest() in continuous mode tries
to enqueue MAX_TRAFFIC_BURST completely before starting
the test. Some drivers cannot accept complete
MAX_TRAFFIC_BURST even though rx+tx desc count can fit it.
This patch changes behaviour to stop enqueuing after few
retries.

Fixes: 002ade70e933 ("app/test: measure cycles per packet in Rx/Tx")
Cc: stable@dpdk.org
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>

gpu/cuda: unmap GPU memory while freeing

Enable GPU_REGISTERED flag in gpu/cuda driver in the memory list.
If a GPU memory address CPU mapped is freed before being
unmapped, CUDA driver unmaps it before freeing the memory.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>

eal/freebsd: fix use of newer cpuset macros

FreeBSD has updated its CPU macros to align more with the definitions
used on Linux[1]. Unfortunately, while this makes compatibility better
in future, it means we need to have both legacy and newer definition
support. Use a meson check to determine which set of macros are used.

[1] https://cgit.freebsd.org/src/commit/?id=e2650af157bc

Bugzilla ID: 1014
Fixes: c3568ea37670 ("eal: restrict control threads to startup CPU affinity")
Fixes: b6be16acfeb1 ("eal: fix control thread affinity with --lcores")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Daxue Gao <daxuex.gao@intel.com>

test/ring: remove excessive inlining

Forcing inlining in test_ring_enqueue and test_ring_dequeue can cause
the compiled code to grow extensively when compiled with no optimization
(-O0 or -Og). This is default in the meson's debug configuration. This
can collide with compiler bugs and cause issues during linking of unit
tests where the api_type or esize are non-const variables causing
inlining cascade. In perf tests this is not the case in perf-tests as
esize and api_type are const values.

One such case was discovered when porting DPDK to RISC-V. GCC 11.2 (and
no fix still in 12.1) is generating a short relative jump instruction
(J <offset>) for goto and for loops. When loop body grows extensively in
ring test, the target offset goes beyond supported offfset of +/- 1MB
from PC. This is an obvious bug in the GCC as RISC-V has a
two-instruction construct to jump to any absolute address (AUIPC+JALR).

However there is no reason to force inlining as the test code works
perfectly fine without it.

GCC has a bug report for a similar case (with conditionals):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93062

Fixes: a9fe152363e2 ("test/ring: add custom element size functional tests")
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

examples/l3fwd: fix scalar LPM

The lpm_process_event_pkt() can either process a packet using an
architecture specific (defined for X86/SSE, ARM/Neon and PPC64/Altivec)
path or a scalar one. The choice is however done using an ifdef
pre-processor macro. Because of that the scalar version was apparently
not widely exercised/compiled.
Due to some copy/paste errors, the scalar logic in
lpm_process_event_pkt() retained a "continue" statement where it should
utilize rfc1812_process() and return the port/BAD_PORT.

Fixes: 99fc91d18082 ("examples/l3fwd: add event lpm main loop")
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

devargs: fix leak on hotplug failure

Caught by ASan, if a secondary process tried to attach a device with an
incorrect driver name, devargs was leaked.

Fixes: 64051bb1f144 ("devargs: unify scratch buffer storage")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>

eal/x86: fix unaligned access for small memcpy

Calls to rte_memcpy for 1 < n < 16 could result in unaligned
loads/stores, which is undefined behaviour according to the C
standard, and strict aliasing violations.

The code was changed to use a packed structure that allows aliasing
(using the __may_alias__ attribute) to perform the load/store
operations. This results in code that has the same performance as the
original code and that is also C standards-compliant.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

test/threads: add unit test

Establish unit test for testing thread api. Initial unit tests
for rte_thread_{get,set}_affinity_by_id().

Signed-off-by: Narcisa Vasile <navasile@linux.microsoft.com>
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

eal: get/set thread affinity per thread identifier

Implement functions for getting/setting thread affinity.
Threads can be pinned to specific cores by setting their
affinity attribute.

Windows error codes are translated to errno-style error codes.
The possible return values are chosen so that we have as
much semantic compatibility between platforms as possible.

note: convert_cpuset_to_affinity has a limitation that all cpus of
the set belong to the same processor group.

Signed-off-by: Narcisa Vasile <navasile@linux.microsoft.com>
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

eal: provide current thread identifier

Provide a portable type-safe thread identifier.
Provide rte_thread_self for obtaining current thread identifier.

Signed-off-by: Narcisa Vasile <navasile@linux.microsoft.com>
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

hash: unify CRC32 selection for x86 and Arm

Merge CRC32 hash calculation public API implementation for x86 and Arm.
Select the best available CRC32 algorithm when unsupported algorithm
on a given CPU architecture is requested by an application.

Previously, if an application directly includes `rte_crc_arm64.h`
without including `rte_hash_crc.h` it will fail to compile.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

hash: split x86 and SW hash CRC intrinsics

Split x86 and SW hash crc intrinsics into separate files.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>

event/cnxk: flush event queues over multiple pass

If an event queue flush does not complete after a fixed number of tries,
remaining queues are flushed before retrying the one with incomplete
flush.

Signed-off-by: Shijith Thotton <sthotton@marvell.com>

event/cnxk: support setting queue attributes at runtime

Added API to set queue attributes at runtime and API to get weight and
affinity.

Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: lock when accessing mbox of SSO

Since mailbox is now accessed from multiple threads, use lock to
synchronize access.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>

test/event: set queue attributes at runtime

Added test cases to test changing of queue QoS attributes priority,
weight and affinity at runtime.

Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

eventdev: add weight and affinity to queue attributes

Extended eventdev queue QoS attributes to support weight and affinity.
If queues are of the same priority, events from the queue with highest
weight will be scheduled first. Affinity indicates the number of times,
the subsequent schedule calls from an event port will use the same event
queue. Schedule call selects another queue if current queue goes empty
or schedule count reaches affinity count.

To avoid ABI break, weight and affinity attributes are not yet added to
queue config structure and rely on PMD for managing it. New eventdev op
queue_attr_get can be used to get it from the PMD.

Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

eventdev: support setting queue attributes at runtime

Added a new eventdev API rte_event_queue_attr_set(), to set event queue
attributes at runtime from the values set during initialization using
rte_event_queue_setup(). PMD's supporting this feature should expose the
capability RTE_EVENT_DEV_CAP_RUNTIME_QUEUE_ATTR.

Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

event/cnxk: implement event port quiesce function

Implement event port quiesce function to clean up any lcore
resources used.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

examples: use event port quiescing

Quiesce event ports used by the workers core on exit to free up
any outstanding resources.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

app/eventdev: use port quiescing

Quiesce event ports used by the workers core on exit to free up
any outstanding resources.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

eventdev: quiesce an event port

Add function to quiesce any core specific resources consumed by
the event port.

When the application decides to migrate the event port to another lcore
or teardown the current lcore it may to call `rte_event_port_quiesce`
to make sure that all the data associated with the event port are released
from the lcore, this might also include any prefetched events.

While releasing the event port from the lcore, this function calls the
user-provided flush callback once per event.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

examples/ipsec-secgw: cleanup worker state before exit

Event ports are configured to implicitly release the scheduler contexts
currently held in the next call to rte_event_dequeue_burst().
A worker core might still hold a scheduling context during exit as the
next call to rte_event_dequeue_burst() is never made.
This might lead to deadlock based on the worker exit timing and when
there are very less number of flows.

Add a cleanup function to release any scheduling contexts held by the
worker by using RTE_EVENT_OP_RELEASE.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

examples/l2fwd-event: clean up worker state before exit

Event ports are configured to implicitly release the scheduler contexts
currently held in the next call to rte_event_dequeue_burst().
A worker core might still hold a scheduling context during exit, as the
next call to rte_event_dequeue_burst() is never made.
This might lead to deadlock based on the worker exit timing and when
there are very less number of flows.

Add clean up function to release any scheduling contexts held by the
worker by using RTE_EVENT_OP_RELEASE.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

examples/l3fwd: clean up worker state before exit

Event ports are configured to implicitly release the scheduler contexts
currently held in the next call to rte_event_dequeue_burst().
A worker core might still hold a scheduling context during exit, as the
next call to rte_event_dequeue_burst() is never made.
This might lead to deadlock based on the worker exit timing and when
there are very less number of flows.

Add clean up function to release any scheduling contexts held by the
worker by using RTE_EVENT_OP_RELEASE.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

examples/eventdev: clean up worker state before exit

Event ports are configured to implicitly release the scheduler contexts
currently held in the next call to rte_event_dequeue_burst().
A worker core might still hold a scheduling context during exit, as the
next call to rte_event_dequeue_burst() is never made.
This might lead to deadlock based on the worker exit timing and when
there are very less number of flows.

Add clean up function to release any scheduling contexts held by the
worker by using RTE_EVENT_OP_RELEASE.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

app/eventdev: clean up worker state before exit

Event ports are configured to implicitly release the scheduler contexts
currently held in the next call to rte_event_dequeue_burst().
A worker core might still hold a scheduling context during exit, as the
next call to rte_event_dequeue_burst() is never made.
This might lead to deadlock based on the worker exit timing and when
there are very less number of flows.

Add clean up function to release any scheduling contexts held by the
worker by using RTE_EVENT_OP_RELEASE.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

app/eventdev: simplify signal handling and teardown

Remove rte_*_dev calls from signal handler callback as signal handlers
are supposed to be light weight.

Split ethdev teardown into Rx and Tx sections, wait for
workers to finish processing after disabling Rx to allow workers
to complete processing currently held packets.

Verified SW event device on ARM64 using the following command:

./build/app/dpdk-test-eventdev -l 7-23 -s 0xf00 --vdev=event_sw0
-a 0002:02:00.0 -- --prod_type_ethdev --nb_pkts=0 --verbose 2
--test=pipeline_queue --stlist=o --wlcores 16-23

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

event/cnxk: move post-processing to separate function

Move event post-processing to a separate function.
Do complete event post-processing in tear-down functions to prevent
incorrect memory free.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

event/cnxk: add checks in release operation

Add additional checks while performing RTE_EVENT_OP_RELEASE to
ensure that there are no pending SWTAGs and FLUSHEs in flight.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

eventdev/timer: add telemetry

Adds telemetry support to get timer adapter info and timer adapter
statistics.

Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>

event/cnxk: fix out of bounds access in test

Fix out of bounds array access reported in coverity scan.

Coverity issue: 375817
Fixes: 2351506401e ("event/cnxk: add SSO selftest and dump")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

event/dlb2: allow CQ depths up to 1024

Updated to allow overriding the default CQ depth of 32. Since there are
only 2048 DLB history list entries, increasing the CQ depth decreases
the number of available LDB ports to 2048/max_cq_depth. Resource query
will take this into account and return the correct maximum number of
LDB ports.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>

event/cnxk: fix base pointer for SSO head wait

Function roc_sso_hws_head_wait() expects a base as input pointer, and it
will itself get tag_op from the base. By passing tag_op instead of base
pointer to this function will add SSOW_LF_GWS_TAG register offset twice,
which will lead to accessing wrong register.

Fixes: 1f5b3d55c041 ("event/cnxk: store and reuse workslot status")
Cc: stable@dpdk.org
Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

eventdev/eth_rx: fix telemetry Rx stats reset

Caught by covscan:

1. dpdk-21.11/lib/eventdev/rte_event_eth_rx_adapter.c:3279:
logical_vs_bitwise: "~(*__ctype_b_loc()[(int)*params] & 2048 /*
(unsigned short)_ISdigit */)" is always 1/true regardless of the values
of its operand. This occurs as the logical second operand of "||".
2. dpdk-21.11/lib/eventdev/rte_event_eth_rx_adapter.c:3279: remediation:
Did you intend to use "!" rather than "~"?

While isdigit return value should be compared as an int to 0,
prefer ! since all of this file uses this convention.

Fixes: 814d01709328 ("eventdev/eth_rx: support telemetry")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>

doc: fix build with sphinx 4.5

Latest Sphinx checks C language syntax more aggressively.
Fix the following warning by correcting C language syntax.

doc/guides/prog_guide/event_ethernet_rx_adapter.rst:243:
WARNING: Could not lex literal_block as "c". Highlighting skipped.

Fixes: 3c838062b91f ("eventdev: introduce event vector Rx capability")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>

net/mlx5: support ESP SPI match and RSS hash

In packets with ESP header, the inner IP will be encrypted, and
its fields cannot be used for RSS hashing. So, ESP packets
can be hashed only by the outer IP layer.
So, when using RSS on ESP packets, hashing may not be efficient,
because the fields used by the hash functions are only the outer IPs,
causing all traffic belonging to all tunnels between a given
pair of GWs to land on one core.
Adding the SPI hash field can extend the spreading of IPsec packets.

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix no-green metering with RSS

When a meter with RSS action being used, there might be several
sub-flows using different sub-policies in the flow splitting stage.
If there's no green action, there's an error that will always use the
same sub-policy for all sub-flows, some resources will be
overwritten and cannot be released, leading assert during port close.

This patch fixes this issue by checking both green and yellow queue
index during getting a blank sub-policy, to avoid the incorrect
resource overwrite.

Fixes: b38a12272b3a ("net/mlx5: split meter color policy handling")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/bnxt: skip wait for link up on port start

Invoking bnxt_link_update_op() with wait_for_completion set would
result in the driver waiting for 10s in case the port link is down to
complete port initialization (dev_start_op()).
Change it by not waiting for the completion when invoking it in
dev_start_op()

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix freeing VNIC filters

In bnxt_free_all_filters(), all the filters attached to a vnic are removed.
But each of these filters hold a backreference ptr to the vnic and they
need to be reset to NULL now. Otherwise, during a normal testpmd quit, as
part of dev_close_op(), first bnxt_free_all_filters() is invoked in
dev_stop, followed by bnxt_free_filter_mem() from bnxt_uninit_resources(),
which finds a filter with a vnic back reference ptr and now
bnxt_hwrm_clean_up_l2_filter() also tries to remove the filter from the
vnic's filter list which was already done as part of
bnxt_free_all_filters().

Fixes: f0f6b5e6cf94 ("net/bnxt: fix reusing L2 filter")
Cc: stable@dpdk.org
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: recheck FW readiness if in reset process

If Firmware is still in reset process and returns the error
HWRM_ERR_CODE_HOT_RESET_PROGRESS, retry VER_GET command.
We have to do it in bnxt_handle_if_change_status().

Fixes: 0b533591238f ("net/bnxt: inform firmware about IF state changes")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix link status when port is stopped

Driver forces link down during port stop. But device is not obliged
link down in certain scenarios, even when forced. In that case,
subsequent link queries returns link as up.
Fixed to return link status as down when port is stopped.
Driver is already doing that for VF/NPAR/MH functions.

Fixes: c09f57b49c13 ("net/bnxt: add start/stop/link update operations")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: force PHY update on certain configurations

Device is not obliged link down in certain scenarios, even
when forced. When FW does not allow any user other than the BMC
to shutdown the port, bnxt_get_hwrm_link_config() call always
returns link up. Force phy update always in that case,
else user configuration for speed/autoneg would not get applied
correctly.

Fixes: 7bc8e9a227cc ("net/bnxt: support async link notification")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: fix speed autonegotiation

The "active_fec_signal_mode" in HWRM_PORT_PHY_QCFG response
does not return correct value till the link is up. Driver cannot
rely on active_fec_signal_mode while setting autoneg speed.

While setting autoneg speed, driver is currently checking only
"auto_link_speed_mask". Fixed to check "auto_pam4_link_speed_mask"
as well. Also, while setting auto mode and setting speed mask,
driver will have to set both NRZ and PAM4 mask.

Fixes: c23f9ded0391 ("net/bnxt: support 200G PAM4 link")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: avoid unnecessary endianness conversion

The "active_fec_signal_mode" in HWRM_PORT_PHY_QCFG response is uint8_t.
So no need of endianness conversion while parsing response.
Also, signal_mode is the first 4bits of "active_fec_signal_mode".

Fixes: c23f9ded0391 ("net/bnxt: support 200G PAM4 link")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: handle queue stop during RSS flow create

The programming of the RSS table was not taking into account if
any of the queues in the set were stopped prior to the flow
creation, hence leading to a vnic RSS config cmd failure thrown by
the FW.
Fix by programming only the active queues in the RSS action queue
set.

Fixes: 239695f754cb ("net/bnxt: enhance RSS action support")
Cc: stable@dpdk.org
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>

net/bnxt: check duplicate queue IDs

Currently driver does not have a check for duplicate queue ids.
User must either specify all Rx queues created or no queues in the
flow create command. Repeating the queue index is invalid.

Also, moved the check for invalid queue to the beginning of the function.

Fixes: 239695f754cb ("net/bnxt: enhance RSS action support")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: fix ring group on Rx restart

When an Rx queue is stopped and restarted, as part of that workflow,
for cards that have ring groups, we free and reallocate the ring group.
This new ring group is not communicated to the VNIC though via
HWRM_VNIC_CFG cmd.
Fix to issue HWRM_VNIC_CFG cmd on all adapters now in this scenario.

Fixes: ed0ae3502fc9 ("net/bnxt: update ring group after ring stop start")
Cc: stable@dpdk.org
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>

net/bnxt: fix RSS action

Specifying a subset of Rx queues created by the application in
the "flow create" command is invalid.
User must either specify all Rx queues created or no queues.

Also removed a wrong comment as RSS action will not be supported
if user or application specifies MARK or COUNT action.

Fixes: 239695f754cb ("net/bnxt: enhance RSS action support")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: remove count action support

'Count' action was never really implemented in the legacy/AFM model.
But there was some place holder code, remove it so that the user
will see a failure when a flow with 'count' action is being
created.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix tunnel stateless offloads

The HW only supports tunnel header parsing globally for supported tunnel
types. When a function uses one default VNIC to receive both the tunnel
and non-tunnel packets, applying the same stateless offload operation to
both tunnel and non-tunnel packets can cause problems in certain scenarios.
To workaround these problems, the firmware advertises no tunnel header
parsing capabilities to the driver using the HWRM_FUNC_QCAPS.
The driver must check this flag setting and accordingly not advertise
tunnel packet stateless offload capabilities to the stack.

If the device supports VXLAN, GRE, IPIP and GENEVE tunnel parsing,
then reports RX_OFFLOAD_OUTER_IPV4_CKSUM, RX_OFFLOAD_OUTER_UDP_CKSUM
and TX_OFFLOAD_OUTER_IPV4_CKSUM in the Rx/Tx offload capabilities of
the device.
Also, advertise tunnel TSO capabilities based on FW support.

Fixes: 0a6d2a720078 ("net/bnxt: get device infos")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: fix Rx configuration

We are currently not handling RX/RSS modes correctly.
After launching testpmd with multiple RXQs, if the user tries to set
the number of RXQs to 1, driver is not updating the "hash_type"
and "hash_mode" values of the VNICs. As a result, driver issues
bnxt_vnic_rss_configure() unnecessarily and the FW command fails.

Fixed bnxt_mq_rx_configure() to update VNIC RSS fields unconditionally.

Fixes: 4191bc8f79a8 ("net/bnxt: handle multi queue mode properly")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: remove unused macro

BNXT_FLAG_UPDATE_HASH is redundant now, remove it.

Fixes: 1ebb765090a6 ("net/bnxt: fix config RSS update")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: fix device capability reporting

1. Added two functions bnxt_get_tx_port_offloads() and
   bnxt_get_rx_port_offloads() to report the device
   tx/rx offload capabilities to the application.
2. This avoids few duplicate code in the driver and make
   VF-rep capability the same as VF.
3. This will help in selectively reporting offload capabilities
   based on FW support.

Fixes: 0a6d2a720078 ("net/bnxt: get device infos")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: update HWRM structures

Brought in the latest hsi_struct_def_dpdk.h.
HWRM API is now updated to version "1.10.2.83".

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: fix reordering in NEON Rx

Rx descriptor contains a valid bit which indicates readiness of the rest
of descriptor words. Hence, the word contains valid bit must be read
prior to other words.

In NEON vector path, two contiguous 8B descriptor are loaded to a single
NEON register. Given vector load ensures no 16B atomicity, read of the
word that includes valid bit could be reordered after read of other words.
In this case, data could be invalid.

Reloaded lower 64b after read barrier. This ensures what fetched is
correct.

Also fixed comments that not pertains to Arm platform architecture.

Fixes: deae85145c64 ("net/bnxt: handle multiple packets per loop in vector Rx")
Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: remove redundant ifdefs

NEON vector path is built only when Arm platform is 64-bit.
The ifdefs in NEON path are of no use, hence remove.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: defer completion index update

When no packet is received, there is no need to update completion raw cons.
Moved update down to remove unnecessary store in this case.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

common/cnxk: support per-port RQ in inline device

Add support for per port RQ in inline device thereby using
Aura/Pool attributes from that port specific first RQ.
When inline device is used with channel masking, it will
fallback to single RQ for all ethdev ports.

Also remove clamping up of CQ size for LBK ethdev when
inline inbound is enabled as now backpressure is supported
even on LBK ethdevs.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: fix hotplug detach for first device

Fix hotplug detach sequence to handle case where first PCI
device that is hosting NPA LF is being destroyed while in use.

Fixes: 5a4341c84979 ("net/cnxk: add platform specific probe and remove")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: fix multi-segment extraction in vwqe path

Fix multi-seg extraction in vwqe path to avoid updating mbuf[]
array until it is used via cq0 path.

Fixes: 7fbbc981d54f ("event/cnxk: support vectorized Rx event fast path")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: perform early MTU setup for event mode

Perform early MTU setup for event mode path in order
to update the Rx/Tx offload flags before Rx adapter setup
starts.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: support flow control for outbound inline

Add support for flow control in outbound inline path using
FC updates from CPT.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: support security statistics

Enabled rte_security stats operation based on the configuration
of SA options set while creating session.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: add capabilities for IPsec options

Added supported capabilities for various IPsec SA options.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: add capabilities for IPsec crypto algos

Added supported crypto algorithms for inline IPsec
offload.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: update L3/L4 checksum offload in mbuf

When the packet is processed with inline IPsec offload,
the ol_flags were updated only with RTE_MBUF_F_RX_SEC_OFFLOAD.

But the hardware can also update the L3/L4 csum offload flags.
Hence, ol_flags are updated with RTE_MBUF_F_RX_IP_CKSUM_GOOD,
RTE_MBUF_F_RX_L4_CKSUM_GOOD, etc based on the microcode completion
codes.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: optimize Rx fast path for security offload

Optimize Rx fast path for security packets by preprocessing
most of the operations such as sa pointer compute,
inner WQE pointer fetch and microcode completion translation
before the pkt is characterized as inbound inline pkt.

Preprocessed info will be discarded if packet is not
found to be security pkt. Also fix fetching of CQ word5
for vector mode. Get ucode completion code from CPT parse
header and RLEN from IP4v/IPv6 decrypted packet as it is
in same 64B cacheline as CPT parse header in most of
the cases. By this method, we avoid accessing an extra
cacheline

Fixes: c062f5726f61 ("net/cnxk: support IP reassembly")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>