Kalesh AP [Tue, 13 Jul 2021 13:34:13 +0000 (19:04 +0530)]
net/bnxt: clear cached statistics
As part of the workaround put in the commit "219842b9990c",
driver caches the last read stats values from the hardware.
But this is not cleared during the clear stats operation. This
results in showing up stale stats values while reading the stats
after the clear operation.
Fixes: 219842b9990c ("net/bnxt: workaround spurious zero stats in Thor") Cc: stable@dpdk.org Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Lance Richardson <lance.richardson@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Store the async event completion data1 and data2 in separate variables
at the start of the function before the switch case for the different
events so they can be used by any of the event handlers.
net/bnxt: fix missing barriers in completion handling
Ensure that Rx/Tx/Async completion entry fields are accessed
only after the completion's valid flag has been loaded and
verified. This is needed for correct operation on systems that
use relaxed memory consistency models.
Satheesh Paul [Tue, 6 Jul 2021 08:19:18 +0000 (13:49 +0530)]
net/cnxk: fix default MCAM allocation size
Preallocation of MCAM entries is not valid anymore since the
AF side MCAM allocation scheme has changed. This patch disables
preallocation by changing the default MCAM preallocation size
from 8 to 1.
Fixes: 168c59cfe42 ("net/octeontx2: add flow MCAM utility functions") Cc: stable@dpdk.org Signed-off-by: Satheesh Paul <psatheesh@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Anoob Joseph [Thu, 1 Jul 2021 09:29:29 +0000 (14:59 +0530)]
net/octeontx2: support non-ethernet L2 header
In the inline inound path, a custom header would be present at L3 which
has sequence number & SPI. L2 need to be adjusted such that the eventual
packet would have L3 after L2. Remove assumption of L2 type in this
handling.
Signed-off-by: Anoob Joseph <anoobj@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Dana Vardi [Sun, 11 Jul 2021 13:12:49 +0000 (16:12 +0300)]
net/mvpp2: fix configured state dependency
Need to set configure flag to allow create and commit mrvl tm
hierarchy tree. tm configuration depends on parameters that are
being set in port configure stage, e.g. nb_tx_queues.
This also aligned with the tm api description.
Fixes: 429c394417 ("net/mvpp2: support traffic manager") Cc: stable@dpdk.org Signed-off-by: Dana Vardi <danat@marvell.com> Reviewed-by: Liron Himi <lironh@marvell.com>
net/mlx5: fix threshold for mbuf replenishment in MPRQ
The replenishment scheme for the vectorized MPRQ Rx burst aims
to improve the cache locality by allocating new mbufs only when
there are almost no mbufs left: one burst gap between allocated
and consumed indexes.
This gap is not big enough to accommodate a corner case when we
have a very aggressive CQE compression with multiple regular CQEs
at the beginning and 64 zipped CQEs at the end.
Need to keep in mind this case and extend the replenishment
threshold by MLX5_VPMD_RX_MAX_BURST (64) to avoid mbuf overflow.
Fixes: 5fc2e5c27d6 ("net/mlx5: fix mbuf overflow in vectorized MPRQ") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Xiaoyu Min [Wed, 7 Jul 2021 02:32:47 +0000 (10:32 +0800)]
net/mlx5: fix missing RSS expansion of IPv6 frag
IPV6_FRAG_EXT item is missed for RSS expansion which causes wrongly
expanded flows:
flow create 0 ingress pattern eth / ipv6 / udp dst is 250 / vxlan-gpe /
ipv6 / ipv6_frag_ext / end actions rss level 2 types ip end / end
Different from other items, IPV6_FRAG_EXT hasn't next field because HW
only support to do hash of UDP/TCP for non-fragment.
This MLX5_EXPANSION_IPV6_FRAG_EXT node in RSS expansion graph only helps
RSS expansion function to locate right node in graph from which start
to expand.
Fixes: 0e5a0d8f7556 ("net/mlx5: support match on IPv6 fragment extension") Cc: stable@dpdk.org Signed-off-by: Xiaoyu Min <jackmin@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
net/mlx5: optimize hash list table allocate on demand
Currently, all the hash list tables are allocated during start up.
Since different applications may only use dedicated limited actions,
optimized the hash list table allocate on demand will save initial
memory.
This commit optimizes hash list table allocate on demand.
Currently, hash list uses the cache list as bucket list. The list
in the buckets have the same name, ctx and callbacks. This wastes
the memory.
This commit abstracts all the name, ctx and callback members in the
list to a constant struct and others to the inconstant struct, uses
the wrapper functions to satisfy both hash list and cache list can
set the list constant and inconstant struct individually.
common/mlx5: allocate cache list memory individually
Currently, the list's local cache instance memory is allocated with
the list. As the local cache instance array size is RTE_MAX_LCORE,
most of the cases the system will only have very limited cores.
allocate the instance memory individually per core will be more
economic to the memory.
This commit changes the instance array to pointer array, allocate
the local cache memory only when the core is to be used.
When a cache entry is allocated by lcore A and is released by lcore B,
the driver should synchronize the cache list access of lcore A.
The design decision is to manage a counter per lcore cache that will be
increased atomically when the non-original lcore decreases the reference
counter of cache entry to 0.
In list register operation, before the running lcore starts a lookup in
its cache, it will check the counter in order to free invalid entries in
its cache.
When mlx5 list object is accessed by multiple cores, the list lock
counter is all the time written by all the cores what increases cache
misses in the memory caches.
In addition, when one thread accesses the list for add\remove\lookup
operation, all the other threads coming to do an operation in the list
are stuck in the lock.
Add per lcore cache to allow thread manipulations to be lockless when
the list objects are mostly reused.
Synchronization with atomic operations should be done in order to
allow threads to unregister an entry from other thread cache.
For object which wants efficient index allocate and free, local
cache will be very helpful.
Two level cache is introduced to allocate and free the index more
efficient. One as local and the other as global. The global cache
is able to save all the allocated index. That means all the allocated
index will not be freed. Once the local cache is full, the extra
index will be flushed to the global cache. Once local cache is empty,
first try to fetch more index from global, if global is still empty,
allocate new trunk with more index.
This commit adds new local cache mechanism for indexed pool.
Ruifeng Wang [Wed, 7 Jul 2021 09:03:07 +0000 (17:03 +0800)]
net/mlx5: reduce unnecessary memory access in Rx
MR btree len is a constant during Rx replenish.
Moved retrieve of the value out of loop to reduce data loads.
Slight performance uplift was measured on both N1SDP and x86.
Ruifeng Wang [Wed, 7 Jul 2021 09:03:06 +0000 (17:03 +0800)]
net/mlx5: remove redundant operations in NEON Rx
Mask of entries after the compressed CQE is covered by invalid mask of
non-compressed valid CQEs. Hence remove redundant calculation on mask.
The change showed slight performance uplift on N1SDP.
Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM") Cc: stable@dpdk.org Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Rongwei Liu [Tue, 13 Jul 2021 12:09:19 +0000 (15:09 +0300)]
net/mlx5: support matching on VXLAN reserved field
This adds matching on the reserved field of VXLAN
header (the last 8-bits). The capability from rdma-core
is detected by creating a dummy matcher using misc5
when the device is probed.
For non-zero groups and FDB domain, the capability is
detected from rdma-core, meanwhile for NIC domain group
zero it's relying on the HCA_CAP from FW.
For the newly attached ports (with "port attach" command) the
default offloads settings, configured from application command
line, were not applied, causing port start failure following
the attach.
For example, if scattering offload was configured in command
line and rxpkts was configured for multiple segments, the newly
attached port start was failed due to missing scattering offload
enable in the new port settings. The missing code to apply
the offloads to the new device and its queues is added.
The new local routine init_config_port_offloads() is introduced,
embracing the shared part of port offloads initialization code.
Fixes: c9cce42876f5 ("ethdev: remove deprecated attach/detach functions") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Aman Deep Singh <aman.deep.singh@intel.com> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
Huisong Li [Sat, 10 Jul 2021 01:58:34 +0000 (09:58 +0800)]
net/hns3: support multiple TC MAC pause
MAC PAUSE can take effect on a single TC or multiple TCs, depending on the
hardware. For example, the Kunpeng 920 supports MAC pause in a single TC,
and the Kunpeng 930 supports MAC pause in multiple TCs. This patch
supports MAC PAUSE in multiple TC for some hardware.
Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Since the HW limitation for VF, the VLAN filter is default enabled, and
is not allowed to be closed. Now, the limitation has been removed in
Kunpeng930 network engine, so this patch add support for VF to modify the
VLAN filter state.
A capabilities bit is added to differentiate between different platforms
and achieve compatibility. When the VF runs on an incomatible platform or
an incompatible kernel-mode driver version is used, the VF behavior is
the same as that before.
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
There are some features of VF depend on PF, so it's necessary for VF
to know whether current PF supports. Therefore, the final capability
set of VF will be composed of the capability set of hardware and the
capability set of PF.
For compatibility reasons, the mailbox HNS3_MBX_GET_TCINFO has been
modified to obatin more basic information about the current PF, including
the communication interface version and current PF capabilities set.
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
In function softnic_conn_init(), a block of memory is allocated as
connection buffer, but it is never freed in softnic_conn_free(),
which cause memory leak.
Add support for MSI-X interrupt vectors to the vmxnet3 driver.
This will allow more efficient deployments in cloud environments.
By default it will try to allocate 1 vector (0) for link
event and one MSI-X vector for each Rx queue. To simplify
things, it will only be enabled if the number of Tx and Rx
queues are equal (so that Tx/Rx share the same vector).
If for any reason vmxnet3 cannot enable intr mode, it will
fall back to the LSC only mode.
Signed-off-by: Yong Wang <yongwang@vmware.com> Signed-off-by: Jochen Behrens <jbehrens@vmware.com>
Martin Havlik [Tue, 22 Jun 2021 09:25:29 +0000 (11:25 +0200)]
net/bonding: check flow setting
Return value from bond_ethdev_8023ad_flow_set() is now checked
and appropriate message is logged on error.
Fixes: 112891cd27e5 ("net/bonding: add dedicated HW queues for LACP control") Cc: stable@dpdk.org Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz> Acked-by: Min Hu (Connor) <humin29@huawei.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Martin Havlik [Tue, 22 Jun 2021 09:25:28 +0000 (11:25 +0200)]
net/bonding: fix error message on flow verify
Return value is now saved to errval and log message on error reports
correct function name, doesn't use q_id which was out of context,
and uses up-to-date errval.
Fixes: 112891cd27e5 ("net/bonding: add dedicated HW queues for LACP control") Cc: stable@dpdk.org Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz> Acked-by: Min Hu (Connor) <humin29@huawei.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Setup MSI-X interrupt, complete PHY configuration and set device link
speed to start device. Disable interrupt, stop hardware and clear queues
to stop device.
In its current state, the API can overflow the user-passed buffer if a new
representor range appears between function calls.
In order to solve this problem, augment the representor info structure with
the numbers of allocated and initialized ranges. This way the users of this
structure can be sure they will not overrun the buffer.
Fixes: 85e1588ca72f ("ethdev: add API to get representor info") Cc: stable@dpdk.org Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Xueming Li <xuemingl@nvidia.com>
Changpeng Liu [Wed, 19 May 2021 06:45:48 +0000 (14:45 +0800)]
eal: suppress error log on multi-process hotplug
This is a normal case that the primary process already
owned one device while the secondary process try to
attach it, so suppress the error log here to exclude
this case.
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
The PMD Power Management scheme currently has 3 modes,
scale, monitor and pause. However, it would be nice to
have a baseline mode for easy comparison of power savings
with and without these modes.
This patch adds a 'baseline' mode were the PMD power
management is not enabled. Use --pmd-mgmt=baseline.
Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
A selector table is made up of groups of weighted members, with a
given member potentially part of several groups. The select operation
returns a member ID by first selecting a group based on an input group
ID and then selecting a member within that group based on hashing one
or several input header/meta-data fields. It is very useful for
implementing an ECMP/WCMP-enabled FIB or a load balancer. It is part
of the action selector described by the P4 Portable Switch
Architecture (PSA) specification.
For more flexibility, the single monolithic table update command is
split into table entry add, table entry delete, table default entry
add, pipeline commit and pipeline abort.
The rte_swx_pipeline_table_entry_read() function is used to read from
a character string a table entry that is to be added to the table,
deleted from the table or set as the default entry of the table.
Addition needs both the match and the part of the entry, deletion
ignores the action part, while the default set ignores the match part,
hence the need to make both the match and the action part optional.
The logic for skipping the match or the action part was broken, hence
the current fix.
Due to a typo, only 3 out of 4 keys in the bucket of the exact match
table were considered, which can result in valid keys being
incorrectly dropped from the table.
Fix build failures seen on Fedora Core 34 (GCC 11)
because of uninitialized variables.
In function ‘ulp_mapper_index_tbl_process’:
drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error:
‘*(unsigned int *)((char *)&glb_res + offsetof(struct bnxt_ulp_glb_resource_info, resource_func))’
may be used uninitialized in this function
2252 | struct bnxt_ulp_glb_resource_info glb_res;
| ^~~~~~~
drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error:
‘glb_res.resource_type’ may be used uninitialized in this function
In function ‘dpool_defrag’:
drivers/net/bnxt/tf_core/dpool.c:95:18: error:
‘index’ may be used uninitialized in this function
95 | uint32_t index;
| ^~~~~
Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation") Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Chengwen Feng [Mon, 28 Jun 2021 02:57:51 +0000 (10:57 +0800)]
net/hns3: fix Arm SVE build with GCC 8.3
If the target machine has SVE feature (e.g. '-march=armv8.2-a+sve'),
and compiler is gcc-8.3, it will fail, the error is arm_sve.h:
no such file or directory.
The solution:
a. If RTE_HAS_SVE_ACLE defined (it means the minimum instruction set
support SVE ACLE) then compiles it.
b. Else if the compiler support SVE ACLE then compiles it.
c. Otherwise don't compile it.
Fixes: 8c25b02b082a ("net/hns3: fix enabling SVE Rx/Tx") Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx") Cc: stable@dpdk.org Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
Chengwen Feng [Mon, 28 Jun 2021 02:57:50 +0000 (10:57 +0800)]
config/arm: fix SVE build with GCC 8.3
If the target machine has SVE feature (e.g. "-march=armv8.2-a+sve'),
and the compiler is gcc-8.3, it will produce this error:
In file included from lib/eal/common/eal_common_options.c:38:
lib/eal/arm/include/rte_vect.h:13:10: fatal error:
arm_sve.h: No such file or directory
#include <arm_sve.h>
^~~~~~~~~~~
The root cause is that gcc-8.3 supports SVE (the macro
__ARM_FEATURE_SVE was 1), but it doesn't support SVE ACLE [1].
The solution:
a) Detect compiler whether support SVE ACLE, if support then define
RTE_HAS_SVE_ACLE macro.
b) Use the RTE_HAS_SVE_ACLE macro to include SVE header file.
[1] ACLE: Arm C Language Extensions, the SVE ACLE header file is
<arm_sve.h>, user should include it when writing ACLE SVE code.
Fixes: 67b68824a82d ("lpm/arm: support SVE") Cc: stable@dpdk.org Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Acked-by: Ruifeng Wang <ruifeng.wang@arm.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Ruifeng Wang [Wed, 7 Jul 2021 05:48:38 +0000 (13:48 +0800)]
ring: use WFE to wait for tail update on aarch64
Instead of polling for tail to be updated, use WFE instruction.
Signed-off-by: Gavin Hu <gavin.hu@arm.com> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>