git.droids-corp.org - dpdk.git/log

vfio: use static window sizing for sPAPR IOMMU

The SPAPR IOMMU requires that a DMA window size be defined before memory
can be mapped for DMA. Current code dynamically modifies the DMA window
size in response to every new memory allocation which is potentially
dangerous because all existing mappings need to be unmapped/remapped in
order to resize the DMA window, leaving hardware holding IOVA addresses
that are temporarily unmapped. The new SPAPR code statically assigns
the DMA window size on first use, using the largest physical memory
memory address when IOVA=PA and the highest existing memseg virtual
address when IOVA=VA.

Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

devtools: fix directory filter in forbidden token check

checkpatches.sh current complains on a patch [1] adding
ALLOW_EXPERIMENTAL_API in an example while this check is for app, lib
and drivers directories:

Warning in examples/ethtool/ethtool-app/Makefile:
Using experimental build flag for in-tree compilation

The regexp on entering files concerned by this filter is incorrect.
In the [1] case, the file full name is matched against "app" rather than
"+++ b/app".

1: https://patchwork.dpdk.org/patch/83902/

Fixes: 7413e7f2aeb3 ("devtools: alert on new calls to exit from libs")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>

examples: stop processing meson file if build impossible

Once it has been determined that an example cannot be built, there is
little point in continuing to process the meson.build file for that
example, so we can use subdir_done() to return to the calling file.
This can potentially prevent problems where later statement in the file
may cause an error on systems where the app cannot be built, e.g. on
Windows or FreeBSD.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

examples/l2fwd-keepalive: skip meson build if no librt

When librt is not present on a system, processing the meson.build file
for this example application causes an error. Make the library
non-mandatory and just mark the example as unbuildable if it is
not present.

Fixes: 89f0711f9ddf ("examples: build some samples with meson")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

examples: fix flattening directory layout on install

By installing the examples one-by-one in a loop in the examples
meson.build file we effectively flattened out the structure of the examples
folder and omitted some common and shared subfolders that were never
directly built. Instead, we can remove the loop and just have the whole
"examples" folder installed as-is in a single statement, preserving its
directory structure, and thereby fixing the build of a number of the
examples.

Fixes: 2daf565f91b5 ("examples: install as part of ninja install")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

mbuf: move pool pointer in first half

According to the Technical Board decision
(http://mails.dpdk.org/archives/dev/2020-November/191859.html),
the mempool pointer in the mbuf struct is moved
from the second to the first half.
It may increase performance in some cases
on systems having 64-byte cache line, i.e. mbuf split in two cache lines.

Due to this change, all fields after "pool" are moved up.
Hopefully no vector data path is impacted.

Moving this field gives more space to dynfield1
while dropping the temporary dynfield0.

This is how the mbuf layout looks like (pahole-style):

word  type                              name                byte  size
0    void *                            buf_addr;         /*   0 +  8 */
1    rte_iova_t                        buf_iova          /*   8 +  8 */
      /* --- RTE_MARKER64               rearm_data;                   */
2    uint16_t                          data_off;         /*  16 +  2 */
      uint16_t                          refcnt;           /*  18 +  2 */
      uint16_t                          nb_segs;          /*  20 +  2 */
      uint16_t                          port;             /*  22 +  2 */
3    uint64_t                          ol_flags;         /*  24 +  8 */
      /* --- RTE_MARKER                 rx_descriptor_fields1;        */
4    uint32_t             union        packet_type;      /*  32 +  4 */
      uint32_t                          pkt_len;          /*  36 +  4 */
5    uint16_t                          data_len;         /*  40 +  2 */
      uint16_t                          vlan_tci;         /*  42 +  2 */
5.5  uint64_t             union        hash;             /*  44 +  8 */
6.5  uint16_t                          vlan_tci_outer;   /*  52 +  2 */
      uint16_t                          buf_len;          /*  54 +  2 */
7    struct rte_mempool *              pool;             /*  56 +  8 */
      /* --- RTE_MARKER                 cacheline1;                   */
8    struct rte_mbuf *                 next;             /*  64 +  8 */
9    uint64_t             union        tx_offload;       /*  72 +  8 */
10    struct rte_mbuf_ext_shared_info * shinfo;           /*  80 +  8 */
11    uint16_t                          priv_size;        /*  88 +  2 */
      uint16_t                          timesync;         /*  90 +  2 */
11.5  uint32_t                          dynfield1[9];     /*  92 + 36 */
16    /* --- END                                             128      */

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

drivers: disable OCTEON TX2 in 32-bit build

The drivers for OCTEON TX2 are not supported in 32-bit mode.

Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Jerin Jacob <jerinj@marvell.com>

devtools: allow custom set of examples in build test

To test the installation process of DPDK using "ninja install"
test-meson-builds.sh builds a subset of the examples using "make". To allow
more flexibility for people testing, allow the set of examples chosen for
this make test to be overridden using variable "DPDK_BUILD_TEST_EXAMPLES"
in the environment.

Since a number of example apps link against drivers directly even for
shared builds, we need to ensure that LD_LIBRARY_PATH points to the main
DPDK lib folder so any dependencies of those drivers can be found e.g. that
the PCI/vdev bus driver .so is found. [All drivers are symlinked from
drivers dir back to lib dir on install, so only one dir rather than two is
needed in the path.]

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>

devtools: fix x86-default build test install env

The x86-default environment was loaded after installing this target.
I did not see any problem with it, yet we should load corresponding
environment before installing a target.

Fixes: bd253daa7717 ("devtools: fix test of ninja install")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>

devtools: reduce build test verbosity

The default verbosity of test-meson-builds.sh is to be quiet.
In order to better apply the verbosity policy, some file descriptors
are open to redirect to stdout or /dev/null accordingly.

The target variable and meson/ninja commands are printed in verbose modes.
The installation commands are printed only in very verbose mode.
The examples build commands are printed only in very verbose mode.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

devtools: fix build test config inheritance from env

The variables DPDK_MESON_OPTIONS, PATH, PKG_CONFIG_PATH,
CPPFLAGS, CFLAGS and LDFLAGS can be customized in the config file
loaded by devtools/load-devel-config at each build.
The configuration can be adjusted per target thanks to the value set
in the DPDK_TARGET variable.

PKG_CONFIG_PATH is specific to each target, so it must be empty
before configuring each build from the file according to DPDK_TARGET.
Inheriting a default PKG_CONFIG_PATH for all targets does not make sense
and is prone to confusion.

DPDK_MESON_OPTIONS might take a global initial value from environment
to customize a build test from the shell. Example:
DPDK_MESON_OPTIONS="b_lto=true"
Some target-specific options can be added in the configuration file:
DPDK_MESON_OPTIONS="$DPDK_MESON_OPTIONS kernel_dir=$MYKERNEL"

Fixes: 272236741258 ("devtools: load target-specific compilation environment")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: David Marchand <david.marchand@redhat.com>

usertools: fix pmdinfo parsing

This script inspects an ELF file (binary or shared library) and its
linked dependencies by following DT_NEEDED tags.
So far a simple librte_pmd prefix was used as a filter to only parse
DPDK drivers dependencies.
While the reason is not clear from the commitlog of the patch that
introduced this filter, it was probably added for performance reasons,
since going through all dependencies can be quite long.
Testing with a DPDK built before the driver name changes:
- running the script takes ~0.3s with the filter,
- running the script takes ~9s without the filter,

Now that we changed the driver library names, it becomes more difficult
to identify only DPDK drivers, but we can just filter on the librte_
prefix to identify DPDK libraries: the script later checks for the
PMD_INFO_STRING string in .rodata and it is enough to differentiate the
DPDK drivers from the other DPDK libraries.

Running the script with this patch takes ~0.5s.

A debug message was logged for each inspected file, it gives no useful
information and is removed.

Fixes: a20b2c01a7a1 ("build: standardize component names and defines")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Robin Jarry <robin.jarry@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

doc: add instructions for building 32-bit DPDK

For users with 32-bit applications who wish to use DPDK we need to provide
instructions on creating a 32-bit build of DPDK with meson. Therefore add a
section with this information to the GSG.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>

devtools: test 32-bit build

It's reasonably common for patches to have issues when built on 32-bits, so
to prevent this, we can add a 32-bit build (if supported) to the
"test-meson-builds.sh" script. The tricky bit is using a valid
PKG_CONFIG_LIBDIR, so for now we use two common possibilities for where that
should point to in order to get a successful build.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

version: 20.11-rc3

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

app/testpmd: revert max Rx packet length adjustment

The fix of max_rx_pkt_len for allowing VLAN packets in all cases
was breaking configuration of some drivers. Example with virtio:

Ethdev port_id=0 max_rx_pkt_len 11229 > max valid value 9728
Fail to configure port 0

Trying to fix the logic was revealing other issues in some drivers.
That's why it is decided to revert.

The workaround for the original issue would be
to set the MTU explicitly from the application
with rte_eth_dev_set_mtu().
See RFC: https://patches.dpdk.org/patch/83756/

Fixes: f6870a7ed6b3 ("app/testpmd: fix max Rx packet length for VLAN packet")
Cc: stable@dpdk.org
Reported-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Lance Richardson <lance.richardson@broadcom.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

mbuf: clean up comments and prefix

The mbuf header files had some commenting style errors that affected the
API documentation.
Also, the RTE_ prefix was missing on a macro and a definition.

Note: This patch does not touch the offload and attachment flags that are
also missing the RTE_ prefix.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

cmdline: avoid name clash with Windows system types

cmdline_numtype member names clash with Windows system identifiers.
Add RTE_ prefix to cmdline constants to avoid this and possible
future conflicts.

Suggested-by: Ranjit Menon <ranjit.menon@intel.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Jie Zhou <jizh@microsoft.com>
Tested-by: Jie Zhou <jizh@microsoft.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

test/lpm: avoid code duplication in RCU perf tests

Avoid code duplication by combining single and multi threaded tests

Also, enable support for more than 2 writers

Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

test/lpm: remove unneeded checks in RCU perf tests

Remove redundant error checking for reader threads
since they never return error.

Fixes: eff30b59cc2e ("test/lpm: add RCU performance tests")
Cc: stable@dpdk.org
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

test/lpm: report errors in RCU perf tests

Return error if Add/Delete fail in multiwriter perf test
Return error if single or multi writer test fails

Fixes: eff30b59cc2e ("test/lpm: add RCU performance tests")
Cc: stable@dpdk.org
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

test/lpm: fix cycle calculation in RCU perf tests

Fix incorrect calculations for LPM adds, LPM deletes,
and average cycles in RCU QSBR perf tests

Since, rcu qsbr tests run for 'RCU_ITERATIONS' and not
'ITERATIONS', replace 'ITERATIONS' with 'RCU_ITERATIONS'
for calculating adds, deletes, and cycles.

Also, for multi-writer perf test, each writer only writes
half of NUM_LDEPTH_ROUTE_ENTRIES.
For 2 writers, total adds (or deletes) should be
(RCU_ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES) instead of
(2 * RCU_ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES).

Since, for both the single and multi writer tests, total adds/deletes
is equal to (RCU_ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES),
this has been replaced with a macro 'TOTAL_WRITES' and furthermore,
'g_writes' has been removed since it is always a fixed value
equal to TOTAL_WRITES.

Fixes: eff30b59cc2e ("test/lpm: add RCU performance tests")
Cc: stable@dpdk.org
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

eal: fix MCS lock and ticketlock headers install

Add missing arch-specific headers in meson.build.

Fixes: 2173f3333b61 ("mcslock: add MCS queued lock implementation")
Fixes: ca49b92079df ("ticketlock: enable generic ticketlock on all arch")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>

version: 20.11-rc2

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

test/telemetry: fix typo at beginning of line

A "+" symbol was incorrectly placed at the beginning of a line,
this is now removed.

Fixes: 52af6ccb2b39 ("telemetry: add utility functions for creating JSON")
Cc: stable@dpdk.org
Signed-off-by: Ciara Power <ciara.power@intel.com>

app/flow-perf: configure rule batches

Currently, flow-perf measures the performance of
rule installation/deletion operations by breaking
down the entire number of operations into windows
of fixed size (i.e., 100000 operations per window).
Then, flow-perf measures the total time per window
and computes an average time across all windows.

This commit allows flow-perf users to configure
the number of rules per window instead of using
a fixed pre-compiled value. To do so, users must
pass --rules-batch=N, where N is the number of
rules per window (or batch).
For consistency reasons, flow_count variable is
now renamed to rules_count. This variable is the
total number of rules to be installed/deleted.

For example, if a user wants to measure how much
time it takes to install 1M rules in a certain NIC,
he/she can input:
--rules-count=1000000
This way flow-perf will break down 1M flow rules into
10 batches of 100k flow rules each (this is the default
batch size) and compute an average across the 10
measurements.
Now, if the user modifies the number of rules per
batch as follows:
--rules-count=1000000 --rules-batch=500000
then flow-perf will break down 1M flow rules into
2 batches of 500k flow rules each and compute the
average across the 2 measurements.

Finally, this commit also adds default variables
to the usage function instead of hardcoded values.

Signed-off-by: Georgios Katsikas <katsikas.gp@gmail.com>
Acked-by: Wisam Jaddo <wisamm@nvidia.com>

doc: remove obsolete deprecation notice for power library

Remove notice announcing an already-implemented change.

In 19.05, rte_power_set_env was changed to return -1 in cases where
the environment was already set up, and for the same release, a
deprecation notice was added.
This patch removes that notice.

The API change was tested by calling rte_power_set_env twice. The first
call succeeded, and the second call failed, as expected.

Fixes: 5a5f3178d4a8 ("power: return error when environment already set")
Cc: stable@dpdk.org
Signed-off-by: David Hunt <david.hunt@intel.com>

doc: fix typo in KNI guide

The typo "withe" should have been "with the". This is now fixed.

Fixes: 89397a01ce4a ("kni: set default carrier state of interface")
Cc: stable@dpdk.org
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>

mbuf: fix dynamic fields and flags with multiprocess

The dynamic flag management is broken if rte_mbuf_dynflag_lookup()
is done in a secondary process because the local pointer to
the memzone is not ever initialized.

Fix it by using the same checks as dynfield_register().
I.e if shared memory zone has not been looked up already,
then discover it.

Fixes: 4958ca3a443a ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

license: remove dual prefix

This patch removes the dual keyword from dual license
definitions to avoid confusion. As the *dual* word is
not required to be added SPDX license.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>

fix spellings that Lintian complains about

Fixes: 103809d032cd ("app/test-fib: add test application for FIB")
Fixes: 1265b5372d9d ("net/hns3: add some definitions for data structure and macro")
Fixes: a85e378cc606 ("net/ixgbe/base: add debug traces")
Fixes: 4861cde46116 ("i40e: new poll mode driver")
Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
Fixes: 86a2265e59d7 ("qede: add SRIOV support")
Fixes: 1db4d2330bc8 ("net/virtio-user: check negotiated features before set")
Cc: stable@dpdk.org
Signed-off-by: Luca Boccassi <luca.boccassi@microsoft.com>

common/mlx5: split PCI relaxed ordering for read and write

The current DevX implementation of the relaxed ordering feature is
enabling relaxed ordering usage only if both relaxed ordering read AND
write are supported. In that case both relaxed ordering read and write
are activated.

This commit will optimize the usage of relaxed ordering by enabling it
when the read OR write features are supported. Each relaxed ordering
type will be activated according to its own capability bit.

This will align the DevX flow with the verbs implementation of
ibv_reg_mr when using the flag IBV_ACCESS_RELAXED_ORDERING

Fixes: 53ac93f71ad1 ("net/mlx5: create relaxed ordering memory regions")
Cc: stable@dpdk.org
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/hinic/base: fix log info for PF command channel

When PF command channel is error, the variables in the log has been
cleared, which is not printed yet.

Fixes: 214164a6bf7f ("net/hinic/base: remove unused function parameters")
Cc: stable@dpdk.org
Signed-off-by: Guoyang Zhou <zhouguoyang@huawei.com>

net/hinic/base: support two or more AEQS for chip

For device initialize, driver only supports four aeqs before,
and now driver can supports two or more aeqs from chip
config file.

Fixes: 611faa5f46cc ("fix various typos found by Lintian")
Cc: stable@dpdk.org
Signed-off-by: Guoyang Zhou <zhouguoyang@huawei.com>

ethdev: fix data type for port id

The ethdev port id is 16 bits now. This patch fixes the data type
of the variable for 'pid', which changing from uint32_t to uint16_t.

RTE_MAX_ETHPORTS is the maximum number of ports, which customized by
the user. To avoid 16-bit unsigned integer overflow, the valid value
of RTE_MAX_ETHPORTS should be set from 0 to UINT16_MAX, and it is
safer to cut one more port from space.

So we use RTE_BUILD_BUG_ON() to ensure that RTE_MAX_ETHPORTS is less
to UINT16_MAX.

Fixes: 5b7ba31148a8 ("ethdev: add port ownership")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

doc: update release notes for iavf

Update release notes with feature of outer IP hash for GTPC and GTPU.

Fixes: 6cd2d6adc783 ("net/iavf: support outer IP hash for GTPC")
Fixes: 262100a34a38 ("net/iavf: support outer IP hash for no inner GTPU")
Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/iavf: fix protocol field for RSS hash

Add PROT field into IPv4 and IPv6 protocol headers for rss hash.

Fixes: 91f27b2e39ab ("net/iavf: refactor RSS")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

ethdev: fix using Rx split config before null check

Coverity flags that 'rx_conf' variable is used before
it's checked for NULL. This patch fixes this issue.

Coverity issue: 363570
Fixes: 4ff702b5dfa9 ("ethdev: introduce Rx buffer split")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

app/testpmd: fix protocol size for copy

The rte_flow_item_eth and rte_flow_item_vlan items are refined.
The structs do not exactly represent the packet bits captured on the
wire anymore so set raw_encap/decap commands should only copy real
header instead of the whole struct.

Replace the rte_flow_item_* with the existing corresponding rte_*_hdr.

Fixes: 09315fc83861 ("ethdev: add VLAN attributes to ethernet and VLAN items")
Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/mlx5: fix tunnel flow destroy

Flow destructor tired to access flow related resources after the
flow object memory was already released and crashed dpdk process.

The patch moves flow memory release to the end of destructor.

Fixes: 4ec6360de37d ("net/mlx5: implement tunnel offload")
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix CQE decompression for Arm and PowerPC

The recent Rx code refactoring moved the incrementing
of the CQ completion index out of the rxq_cq_decompress_v()
function to the rxq_burst_v() function.

The advancing of CQ completion index was removed in SSE
version only causing Neon and Altivec Rx bursts to stall.

Remove the incrementation of CQ completion index for all
the architectures in order to fix the stall.

Fixes: 1ded26239aa0 ("net/mlx5: refactor vectorized Rx")
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/bnxt: fix VXLAN decap offload

This patch fixes a couple of scenarios which were overlooked
by the patch which added VXLAN rte_flow offload support.

1. When a PMD application queries for flow counters, it could ask PMD
   to reset the counters when the application is doing the counters
   accumulation. In this case, PMD should not accumulate rather reset
   the counter.

2. Some of the PMD applications may set the protocol field in the IPv4
   spec but don't set the mask. So, consider the mask in the proto
   value calculation.

4. The cached tunnel inner flow is not getting installed in the
   context of tunnel outer flow create because of the wrong
   error code check when tunnel outer flow is installed in the
   hardware.

5. When a dpdk application offloads the same tunnel inner flow on
   all the uplink ports, other than the first one the driver rejects
   the rest of them. However, the first tunnel inner flow request
   might not be of the correct physical port. This is fixed by
   caching the tunnel inner flow entry for all the ports on which
   the flow offload request has arrived on. The tunnel inner flows
   which were cached on the irrelevant ports will eventually get
   aged out as there won't be any traffic on these ports.

Fixes: 675e31d877b6 ("net/bnxt: support VXLAN decap offload")
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: fix PAM4 link negotiation

In some instances link was not coming up if PAM4 signaling is enabled.
Added check to disable autoneg if FW indicates auto speeds are zero.
Use default auto speeds if PAM4 auto speeds is not set.
Added a fix for forced link setting.

Fixes: c23f9ded0391 ("net/bnxt: support 200G PAM4 link")
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

app/testpmd: support shared flow action attribute transfer

This attribute helps PMDs to tell actions supposed to work
on the so-called hardware e-switch level from regular ones.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>

ethdev: introduce transfer attribute to shared action conf

In a flow rule, attribute "transfer" means operation level
at which both traffic is matched and actions are conducted.

Add the very same attribute to shared action configuration.
If a driver needs to prepare HW resources in two different
ways, depending on the operation level, in order to set up
an action, then this new attribute will indicate the level.
Also, when handling a flow rule insertion, the driver will
be able to turn down a shared action if its level is unfit.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Andrey Vesnovaty <andreyv@nvidia.com>

app/testpmd: fix max Rx packet length for VLAN packet

When the max Rx packet length is smaller than the sum of MTU size and
ether overhead size, it should be enlarged, otherwise the VLAN packets
will be dropped.

Fixes: 35b2d13fd6fd ("net: add rte prefix to ether defines")
Cc: stable@dpdk.org
Signed-off-by: Steve Yang <stevex.yang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/hns3: cleanup includes

Some header files have included by others. Also,
some header files have a header file self-contained
error will trigger building warning. As a result,
it is unnecessary and move it into the correct
location.

Beside, here also remove some unused lines.

Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: check quantity limiter support before using it

If hardware does not support QL (quantity limiter), the int_ql_max
is 0, software should confirm ql_value is less than int_ql_max
before write QL register. This patch add check of int_ql_max
value from firmware and delete the unused variable coalesce_mode.

Fixes: 27911a6e62e5 ("net/hns3: add Rx interrupts compatibility")
Cc: stable@dpdk.org
Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: fix configurations of port-level scheduling rate

Scheduling rate of port-level in hns3 PF driver configured to
hardware is obtained from firmware, which determines the
bandwidth capability of the port. The rate in firmware is
generally configured with the maximum value for network engine
supporting multiple rates, such as 10G and 25G. It may cause
the following issues:
1) When a 10G optical module is used on the network engine, scheduling
   rate of this port will also be configured to hardware with 25G.
   However, the MAC rate of this port is 10G. In this case, it is
   unreasonable that the port scheduling rate is different from the MAC
   rate.
2) If default speed in firmware is not the maximum value, the 25G port
   may not reach the capability of the port.

Therefore, we fix configurations of port-level scheduling rate
according to updating of MAC link speed.

Fixes: 59fad0f32135 ("net/hns3: support link update operation")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: support VXLAN-GPE TSO and checksum

Kupeng920 support tso and checksum offload for VXLAN_GPE with
the next protocol id 3(i.e., Ethernet).

Kupeng930 support TSO and checksum offload for VXLAN_GPE with
the next protocol id 1,2,3(i.e., IPv4, IPv6 and Ethernet).

This patch add support for this tunnel type.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: fix Tx checksum with fixed header length

Currently, the header length of all the layers are fixed, It would
lead to a csum error when the header length changed.

This patch fixes above problem by using the header length in mbuf
instead of the fixed header length to perform the TX cksum offload.

Fixes: bba636698316 ("net/hns3: support Rx/Tx and related operations")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: fix Tx checksum outer header prepare

Currently, there are two mistakes in Tx checksum outer header prepare.
1) Check whether the packet outer header is IPV4 based on PKT_TX_IPV4
   which is incorrect.
2) For HIP08, the outer UDP cksum could not be offloaded. And driver
   should ensure the outer udp cksum filed set to 0. In current code,
   PKT_TX_UDP_CKSUM is used to determine whether the outer layer of
   the packet is a UDP header. Actually, for tunnel TSO, the flag will
   never be set.

For the first mistake, it is fixed by replacing PKT_TX_IPV4 with
PKT_TX_OUTER_IPV4. And the protocol number in L3 header is used to check
whether the outer L4 header is UDP.

Fixes: bba636698316 ("net/hns3: support Rx/Tx and related operations")
Fixes: 6dca716c9e1d ("net/hns3: support TSO")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: limit promiscuous mode for VF

For Kunpeng920, both tx and rx promisc is set when the promisc mode
is enabled. In other words, all the ingress packets and the packets sent
from the PF and other VFs on the same physical port will be copied
to the function which set promisc mode on.

Kunpeng930 support to turn off the tx unicast promisc. A limit promisc
mode is introduced, which means turn off the tx unicast promisc when
promisc is set.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hinic: fix SCTP checksum error

For SCTP checksum offload, pmd driver does not parse payload offset
info, which may cause hardware calculate SCTP checksum failed.

Fixes: 8c8b61234ffd ("net/hinic: refactor checksum functions")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyun Wang <cloud.wangxiaoyun@huawei.com>

net/hinic: fix outer L3 length parse

This patch fixes outer_l3_len parse error when
PKT_TX_OUTER_IP_CKSUM is not set, which does not affect
checksum function, just be consistent with mbuf meta
information description.

The outer_l3_len is calculated wrong because 'vlan_hdr' is calculated
wrong, 'vlan_hdr' fixed and code refactored.

Fixes: 8c8b61234ffd ("net/hinic: refactor checksum functions")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyun Wang <cloud.wangxiaoyun@huawei.com>

doc: fix hyperlink in igc guide

The hyperlink in the IGC documentation showed the whole link in italics
and was not clickable. This is now fixed to have a clickable label.

Fixes: 66fde1b943eb ("net/igc: add skeleton")
Cc: stable@dpdk.org
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

app/testpmd: do not allow dynamic change of core number

When the number of forwarding cores changed in runtime, the issue may
be encountered:
If the nbcore set little than current nbcore, the forwarding thread
will still running on the extra cores. Therefore, trying to stop
forwarding will hang testpmd, since it will wait for the extra cores to
stop.

So do not allow to change nbcore number when forwarding is running.

Fixes: 0c0db76f42ed ("app/testpmd: separate forward config setup from display")
Cc: stable@dpdk.org
Signed-off-by: Zhenghua Zhou <zhenghuax.zhou@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

raw/ifpga: use trusted buffer to free

In rte_fpga_do_pr, calling function read() may taints argument buffer
which turn to an untrusted value as argument of rte_free().

Coverity issue: 279449
Fixes: ef1e8ede3da5 ("raw/ifpga: add Intel FPGA bus rawdev driver")
Cc: stable@dpdk.org
Signed-off-by: Wei Huang <wei.huang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

raw/ifpga: terminate string filled by readlink with null

readlink() does not terminate string, add a null character at the end
of the string if readlink() succeeds.

Coverity issue: 362820
Fixes: 9c006c45d0c5 ("raw/ifpga: scan PCIe BDF device tree")
Cc: stable@dpdk.org
Signed-off-by: Wei Huang <wei.huang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: fix supported RSS type

When a RSS rule with symmetric hash function, the RSS type shouldn't
carry with l3/l4 SRC/DST_ONLY. This patch adds invalid RSS type check
for the case.

Fixes: 91f27b2e39ab ("net/iavf: refactor RSS")
Signed-off-by: Simei Su <simei.su@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/ice: delete unsupported ptypes in default hash set

Ptypes for GTPU with inner SCTP are not supported in current DDP pkg.
Thus, delete them in the default hash set config function.
Also clean up the rss vsi when calling the hash set config function.

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/mlx5: allow age modes combination

ASO age action mode is not supported in group 0 while counter base age
action mode supports group 0.

Allow using the 2 modes of age action in parallel, so group 0 flows will
use counter base age actions and group > 0 flows will use ASO age
actions.

Currently, counter base age action doesn't support shared action API so
group 0 flows cannot share age actions.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Dekel Peled <dekelp@nvidia.com>

net/mlx5: support shared age action

Add support for rte_flow shared action API for ASO age action.

First step here to support validate, create, query and destroy.

The support is only for age ASO mode.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Dekel Peled <dekelp@nvidia.com>

net/mlx5: optimize shared RSS action memory

The RSS shared action was saved in flow memory by a pointer.
It means that every flow memory includes 8B only for optional shared
RSS case.

Move the RSS objects to be used by indexed pool which reduces the flow
handle memory to 4B.

So, now, the shared action handler is also just a 4B index.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Dekel Peled <dekelp@nvidia.com>

net/mlx5: support flow hit action for aging

A new ASO (Advanced Steering Operation) feature was added in the last
mlx5 adapters to support flow hit detection.

Using this new steering action, the driver can detect flow traffic hit
and to reset this indication any time.

The ASO age action cannot support flows in table 0.

Add support for flow aging action in rte_flow using this new feature.

The counter aging mode will be taken only when the ASO feature is not
supported for the user flow groups.

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Signed-off-by: Matan Azrad <matan@nvidia.com>

common/mlx5: add definitions for ASO flow hit

This patch adds different PRM definitions, related to ASO flow hit
feature, in MLX5 PMD code.

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

common/mlx5: add glue function to create flow hit action

Add glue function to create the flow hit action using DV API,
if rdma-core support exists.

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

common/mlx5: add read ASO flow hit HCA capability

Read and store the device capability of FLOW_HIT_ASO general object,
using the DevX API.

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

common/mlx5: use general object type for cap index

PRM defines the general object types using positive numbers.
The same values are used as index for the relevant bit in HCA
capabilities general_obj_types bit mask.

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

common/mlx5: add DevX API to create ASO flow hit object

Add DevX API to create ASO flow hit object.

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: support flow tag and packet header miniCQEs

CQE compression allows us to save the PCI bandwidth and improve
the performance by compressing several CQEs together to a miniCQE.
But the miniCQE size is only 8 bytes and this limits the ability
to successfully keep the compression session in case of various
traffic patterns.

The current miniCQE format only keeps the compression session alive
in case of uniform traffic with the Hash RSS as the only difference.
There are requests to keep the compression session in case of tagged
traffic by RTE Flow Mark Id and mixed UDP/TCP and IPv4/IPv6 traffic.
Add 2 new miniCQE formats in order to achieve the best performance
for these traffic patterns: Flow Tag and Packet Header miniCQEs.

The existing rxq_cqe_comp_en devarg is modified to specify the
desired miniCQE format. Specifying 2 selects Flow Tag format
for better compression rate in case of RTE Flow Mark traffic.
Specifying 3 selects Checksum format (existing format for MPRQ).
Specifying 4 selects L3/L4 Header format for better compression
rate in case of mixed TCP/UDP and IPv4/IPv6 traffic.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx: remove separate ABI version for glue libraries

The glue libraries are tightly bound to the mlx drivers of a dpdk
version and are packaged with them.

Keeping a separate ABI version prevents us from installing two versions
of dpdk.
Maintaining this separate version just adds confusion.
Align the glue library ABI version to the global ABI version.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

common/mlx5/linux: replace malloc and free in glue

This commit replaces mlx5_malloc and mlx5_free calls with Linux calls
malloc and free in file mlx5_glue.c.
The current mlx5_malloc calls have no flags, alignment or socket
selection, so they are equivalent to calling malloc.  Rdma-core itself
is using malloc.  When using mlx5_malloc the glue library is dependent
on common_mlx5 library which must be compiled first.  Not doing so and
in case ibverbs_link=dlopen will result in compilation failure:
mlx5_glue.c: undefined reference to `mlx5_malloc'.
To make all of this simpler and remove the common_mlx5 dependency - this
commit does the alloc/free replacements.

Fixes: 66914d19d135 ("common/mlx5: convert control path memory to unified malloc")
Cc: stable@dpdk.org
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

common/mlx5: fix DevX SQ object creation

Fix wrong assignment of allow_multi_pkt_send_wqe
in mlx5_devx_cmd_create_sq.
The incorrect assignment was introduced in the initial
mlx5_devx_cmd_create_sq implementation.

sq_attr->flush_in_error_en is
mistakenly assigned to both allow_multi_pkt_send_wqe and
flush_in_error_en, it was detected during Windows PMD development.

The fix is simply assigning the right value in mlx5_devx_cmd_create_sq
to sq_attr->allow_multi_pkt_send_wqe

Fixes: ae18a1ae9692 ("net/mlx5: support Tx hairpin queues")
Cc: stable@dpdk.org
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

common/mlx5: fix glue library name

The MLX5 glue library wasn't following the standard
'librte_<class>_<name>.so' naming.

Fixes: a20b2c01a7a1 ("build: standardize component names and defines")
Signed-off-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx4: fix glue library name

The MLX4 library wasn't being successfully initialized with
-Dibverbs_link=dlopen because it expected a shared object file
with a different name.

Fixes: a20b2c01a7a1 ("build: standardize component names and defines")
Signed-off-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/bnxt: fix pass by reference

Pass 'eth_da' pointer instead of pass by value to bnxt_rep_port_probe()

Coverity issue: 360841
Fixes: 322bd6e70272 ("net/bnxt: add port representor infrastructure")
Cc: stable@dpdk.org
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>

net/bnxt: add a failure log

Check and log an error message if switch domain free API fails

Coverity issue: 362757
Fixes: 322bd6e70272 ("net/bnxt: add port representor infrastructure")
Cc: stable@dpdk.org
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>

net/netvsc: control use of external mbuf on Rx

When receiving packets, netvsp puts data in a buffer mapped through UIO.
Depending on packet size, netvsc may attach the buffer as an external
mbuf. This is not a problem if this mbuf is consumed in the application,
and the application can correctly read data out of an external mbuf.

However, there are two problems with data in an external mbuf.
1. Due to the limitation of the kernel UIO implementation, physical
   address of this external buffer is not exposed to the user-mode. If
   this mbuf is passed to another driver, the other driver is unable to
   map this buffer to iova.
2. Some DPDK applications are not aware of external mbuf, and may bug
   when they receive an mbuf with external buffer attached.

Introduce a driver parameter "rx_extmbuf_enable" to control if netvsc
should use external mbuf for receiving packets. The default value is 0.
(netvsc doesn't use external mbuf, it always allocates mbuf and copy
data to mbuf) A non-zero value tells netvsc to attach external buffers
to mbuf on receiving packets, thus avoid copying memory.

Signed-off-by: Long Li <longli@microsoft.com>

net/netvsc: allow setting Rx and Tx copy break

The values for Rx and Tx copy break should be tunable rather
than hard coded constants.

The rx_copybreak sets the threshold where the driver uses an
external mbuf to avoid having to copy data. Setting 0 for copybreak
will cause driver to always create an external mbuf. Setting
a value greater than the MTU would prevent it from ever making
an external mbuf and always copy. The default value is 256 (bytes).

Likewise the tx_copybreak sets the threshold where the driver
aggregates multiple small packets into one request. If tx_copybreak
is 0 then each packet goes as a VMBus request (no copying).
If tx_copybreak is set larger than the MTU, then all packets smaller
than the chunk size of the VMBus send buffer will be copied; larger
packets always have to go as a single direct request. The default
value is 512 (bytes).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Long Li <longli@microsoft.com>

net/octeontx2: avoid per packet barrier with multi segment

Avoid per-pkt barrier with multi-seg with fast free
and remove mbuf update to NULL.

Fixes: ce8628c66a22 ("net/octeontx2: fix jumbo frame crash")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>

net/octeontx2: support VF base steering rule

Adds support for merging a base steering rule with
all flow rules created on a VF.

Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/thunderx: fix memory leak on rbdr desc ring failure

In nicvf_qset_rbdr_alloc(), we allocate memory for the 'rbdr'
structure but not released when allocate 'rbdr desc ring' fails.

Fixes: 7413feee662d ("net/thunderx: add device start/stop and close")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/ena: upgrade driver version to v2.2.0

The v2.2.0 adds support for network interface metrics, includes some bug
fixes and updates HAL to the latest version.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>

doc: mark Armv8 as supported by ena PMD

The ARMv8 platform support was tested and works fine with the ENA PMD.

It can be used on the AWS a1.* and m6g.* instances.

The ARMv8 support in ENA is at least from v19.11, where the VFIO DPDK
driver was fixed to work with 32-bit applications compiled for arm.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>

net/ena/base: align IO CQ allocation to 4K

Latest generation HW requires IO completion queue descriptors to be
aligned to a 4K in order to achieve the best performance.

Because of that, the new allocation macros were added, which allows
driver to allocate the memory with specified alignment.

The previous allocation macros are now wrappers around the macros
doing the alignment, with the alignment value equal to cacheline size.

Fixes: b68309be44c0 ("net/ena/base: update communication layer for the ENAv2")
Cc: stable@dpdk.org
Signed-off-by: Ido Segev <idose@amazon.com>
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Amit Bernstein <amitbern@amazon.com>

net/ena: change name of supported PCI device IDs

The ID 0xEC21 is not associated with LLQ feature of the device, so it
would be misleading for the user. Because of that, the current
identifier is more precise.

Together with code update, the documentation was changed to reflect
current changes

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>

net/ena: fix setting Rx checksum flags in mbuf

The driver was never setting PKT_RX_*_CKSUM_GOOD flags, so the only way
of checking if the checksum was checked was by testing for the
PKT_RX_*_CKSUM_BAD. In that situation, the application couldn't detect
if the checksum was valid or unknown, as unknown flag is equal to 0.

Moreover, the l3_csum_err value is only valid if the l3_proto is
indicating IPv4, so it shouldn't be checked for other protocols.

Fixes: 1173fca25af9 ("ena: add polling-mode driver")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>

net/ena: fix getting xstats global stats offset

There was a bug in a code, which was reading stat_offset value from the
ena_stats_rx_strings array instead of ena_stats_global_strings.

It wasn't causing real problems just because ena_stats_rx_strings was
not smaller than ena_stats_global_strings and both arrays hold the same
offsets.

Fixes: 7830e905b7c9 ("net/ena: expose extended stats")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>

net/enic: fix header sizes when copying flow patterns

Several functions use sizeof(struct rte_flow_item_eth) and
sizeof(struct rte_flow_item_ipv6) when copying headers. These sizes
used to coincide with the sizes of rte_ether_hdr and
rte_ipv6_hdr. But, with recently added fields, rte_flow_item_eth and
rte_flow_item_ipv6 have grown in size. Use sizeof(rte_ether_hdr) and
sizeof(rte_ipv6_hdr) instead.

Coverity issue: 363572, 363573
Fixes: ea7768b5bba8 ("net/enic: add flow implementation based on Flow Manager API")
Cc: stable@dpdk.org
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>

net/hns3: fix enabling SVE Rx/Tx

The ARM SVE vector implementation defined macro is
__ARM_FEATURE_SVE and RTE_MACHINE_CPUFLAG macros
have replaced by regular compiler macros.

Besides, we remove the unused macro RTE_LIBRTE_HNS3_INC_VECTOR_SVE.

Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: check setting VF PCI bus return value

Currently hns3vf_reinit_dev only judge whether the return value of
setting PCI bus function is not 0, while it will return a negative
value when execute failed.

Fixes: 243651cb6c8c ("net/hns3: check PCI config space reads")
Cc: stable@dpdk.org
Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: fix clearing HW ring after queue stop

Currently, the rx HW ring is not cleared after queue stop.
When there are packets remaining in the HW rings and the
queues have been stopped, if upper layer user calls the
rx_burst function at this time, an illegal memory access
will occur due to the sw rings has been released.

This patch fix this by reset the sw ring after disable the
queue.

Fixes: fa29fe45a7b4 ("net/hns3: support queue start and stop")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: fix data type to store queue number

Currently, u8 type variable is used to control to release fake queues in
hns3_fake_rx/tx_queue_config function. Although there is no case in
which more than 256 fake queues are created in hns3 network engine, it
is unreasonable to compare u8 variable with u16 variable.

Fixes: a951c1ed3ab5 ("net/hns3: support different numbers of Rx and Tx queues")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: fix unchecked return value

There are coverity defects related "calling
hns3_reset_all_tqps without checking return value
in hns3_do_start".

This patch fixes the warning by add "void" declaration
because here is exception handling, hns3_reset_all_tqps
will have the corresponding error message if it is
handled incorrectly, so it is not necessary to check
hns3_reset_all_tqps return value, here keep ret as the
error code causing the exception.

Coverity issue: 363048
Fixes: fa29fe45a7b4 ("net/hns3: support queue start and stop")
Cc: stable@dpdk.org
Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: fix packet type report in Rx

Currently, hns3 supports recognizing a lot of ptypes, but most
tunnel packet types are not reported to the API
rte_eth_dev_get_supported_ptypes.

And there are some errors in L2 and L3 packet recognition. The
ARP and LLDP are classified to L3 field in RX descriptor. So,
the ptype of LLDP and ARP packets will be set twice. And ptypes
are assigned by bitwise OR, which will eventually cause the ptype
result to be incorrect.

Besides, when a packet with only L2 header, its ptype will not
report by hns3 PMD. This is because the L2/L3 ptype table is not
initialized properly. In this case, the table query result is 0
by default.

As a result, it fixes missing supported ptypes and the mistake in
L2/L3 packet recognition and the unreported L2 packet ptype by
reporting its L2 type when the L3 type unrecognized..

Fixes: bba636698316 ("net/hns3: support Rx/Tx and related operations")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: fix RSS max queue id allowed in multi-TC

Currently, driver uses the maximum number of queues configured by user
as the maximum queue id that can be specified by the RSS rule or the
reta_update api. It is unreasonable and may trigger an incorrect
behavior in the multi-TC scenario. The driver must ensure that the queue
id configured in the redirection table must be within the range of the
number of queues allocated to a TC.

Fixes: c37ca66f2b27 ("net/hns3: support RSS")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/hns3: get number of used descriptors of Rx queue

Implement the available and used rxd number count function.

In Kunpeng series, the NIC hardware supports to read the bd numbers
which wait processed from the hardware FBD (Full Buffer Descriptor),
and the driver maintains the bd number to be written back hardware.
Compare the number of FBDs with the number of BDs to be written back to
the hardware.

The number of used descriptors of a rx queue is computed as follows:
The fbd numbers of reading from FBD register plus the bd numbers to be
written back to hardware maintained by the driver.

Signed-off-by: Lijun Ou <oulijun@huawei.com>

net/iavf: support flex desc metadata extraction

Enable metadata extraction for flexible descriptors in AVF, that would
allow network function directly get metadata without additional parsing
which would reduce the CPU cost for VFs. The enabling metadata
extractions involve the metadata of VLAN/IPv4/IPv6/IPv6-FLOW/TCP/MPLS
flexible descriptors, and the VF could negotiate the capability of
the flexible descriptor with PF and correspondingly configure the
specific offload at receiving queues.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>

net/ice: rename dynamic mbuf name

Rename the dynamic mbuf name to 'intel_pmd_xxx' format, so that the
Intel PMD which has the protocol extraction feature will share the
same dynamic field/flags space in mbuf.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>