dpdk.git
3 years agonet/mlx5: support matching on VXLAN reserved field
Rongwei Liu [Tue, 13 Jul 2021 12:09:19 +0000 (15:09 +0300)]
net/mlx5: support matching on VXLAN reserved field

This adds matching on the reserved field of VXLAN
header (the last 8-bits). The capability from rdma-core
is detected by creating a dummy matcher using misc5
when the device is probed.

For non-zero groups and FDB domain, the capability is
detected from rdma-core, meanwhile for NIC domain group
zero it's relying on the HCA_CAP from FW.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Raslan Darawsheh <rasland@nvidia.com>
3 years agoapp/testpmd: add flow matching on IPv4 version and IHL
Gregory Etelson [Tue, 13 Jul 2021 07:29:25 +0000 (10:29 +0300)]
app/testpmd: add flow matching on IPv4 version and IHL

The new flow item allows PMD to offload IPv4 IHL field for matching,
if hardware supports that operation.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agoapp/testpmd: fix offloads for newly attached port
Viacheslav Ovsiienko [Mon, 12 Jul 2021 12:40:53 +0000 (15:40 +0300)]
app/testpmd: fix offloads for newly attached port

For the newly attached ports (with "port attach" command) the
default offloads settings, configured from application command
line, were not applied, causing port start failure following
the attach.

For example, if scattering offload was configured in command
line and rxpkts was configured for multiple segments, the newly
attached port start was failed due to missing scattering offload
enable in the new port settings. The missing code to apply
the offloads to the new device and its queues is added.

The new local routine init_config_port_offloads() is introduced,
embracing the shared part of port offloads initialization code.

Fixes: c9cce42876f5 ("ethdev: remove deprecated attach/detach functions")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Aman Deep Singh <aman.deep.singh@intel.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
3 years agonet/hns3: support multiple TC MAC pause
Huisong Li [Sat, 10 Jul 2021 01:58:34 +0000 (09:58 +0800)]
net/hns3: support multiple TC MAC pause

MAC PAUSE can take effect on a single TC or multiple TCs, depending on the
hardware. For example, the Kunpeng 920 supports MAC pause in a single TC,
and the Kunpeng 930 supports MAC pause in multiple TCs. This patch
supports MAC PAUSE in multiple TC for some hardware.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: support VLAN filter state modify for VF
Chengchang Tang [Sat, 10 Jul 2021 01:58:33 +0000 (09:58 +0800)]
net/hns3: support VLAN filter state modify for VF

Since the HW limitation for VF, the VLAN filter is default enabled, and
is not allowed to be closed. Now, the limitation has been removed in
Kunpeng930 network engine, so this patch add support for VF to modify the
VLAN filter state.

A capabilities bit is added to differentiate between different platforms
and achieve compatibility. When the VF runs on an incomatible platform or
an incompatible kernel-mode driver version is used, the VF behavior is
the same as that before.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: query basic info for VF
Chengchang Tang [Sat, 10 Jul 2021 01:58:32 +0000 (09:58 +0800)]
net/hns3: query basic info for VF

There are some features of VF depend on PF, so it's necessary for VF
to know whether current PF supports. Therefore, the final capability
set of VF will be composed of the capability set of hardware and the
capability set of PF.

For compatibility reasons, the mailbox HNS3_MBX_GET_TCINFO has been
modified to obatin more basic information about the current PF, including
the communication interface version and current PF capabilities set.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/softnic: fix connection memory leak
Dapeng Yu [Fri, 9 Jul 2021 06:00:56 +0000 (14:00 +0800)]
net/softnic: fix connection memory leak

In function softnic_conn_init(), a block of memory is allocated as
connection buffer, but it is never freed in softnic_conn_free(),
which cause memory leak.

Fixes: 7709a63bf178 ("net/softnic: add connection agent")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
3 years agonet/vmxnet3: support MSI-X interrupt
Jochen Behrens [Thu, 8 Jul 2021 14:02:25 +0000 (07:02 -0700)]
net/vmxnet3: support MSI-X interrupt

Add support for MSI-X interrupt vectors to the vmxnet3 driver.
This will allow more efficient deployments in cloud environments.

By default it will try to allocate 1 vector (0) for link
event and one MSI-X vector for each Rx queue.  To simplify
things, it will only be enabled if the number of Tx and Rx
queues are equal (so that Tx/Rx share the same vector).
If for any reason vmxnet3 cannot enable intr mode, it will
fall back to the LSC only mode.

Signed-off-by: Yong Wang <yongwang@vmware.com>
Signed-off-by: Jochen Behrens <jbehrens@vmware.com>
3 years agonet/bonding: check flow setting
Martin Havlik [Tue, 22 Jun 2021 09:25:29 +0000 (11:25 +0200)]
net/bonding: check flow setting

Return value from bond_ethdev_8023ad_flow_set() is now checked
and appropriate message is logged on error.

Fixes: 112891cd27e5 ("net/bonding: add dedicated HW queues for LACP control")
Cc: stable@dpdk.org
Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
3 years agonet/bonding: fix error message on flow verify
Martin Havlik [Tue, 22 Jun 2021 09:25:28 +0000 (11:25 +0200)]
net/bonding: fix error message on flow verify

Return value is now saved to errval and log message on error reports
correct function name, doesn't use q_id which was out of context,
and uses up-to-date errval.

Fixes: 112891cd27e5 ("net/bonding: add dedicated HW queues for LACP control")
Cc: stable@dpdk.org
Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
3 years agonet/ngbe: support close and reset device
Jiawen Wu [Thu, 8 Jul 2021 09:32:39 +0000 (17:32 +0800)]
net/ngbe: support close and reset device

Support to close and reset device.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: add simple Tx flow
Jiawen Wu [Thu, 8 Jul 2021 09:32:38 +0000 (17:32 +0800)]
net/ngbe: add simple Tx flow

Initialize device with the simplest transmit functions.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: add simple Rx flow
Jiawen Wu [Thu, 8 Jul 2021 09:32:37 +0000 (17:32 +0800)]
net/ngbe: add simple Rx flow

Initialize device with the simplest receive function.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: support Rx queue start/stop
Jiawen Wu [Thu, 8 Jul 2021 09:32:36 +0000 (17:32 +0800)]
net/ngbe: support Rx queue start/stop

Initializes receive unit, support to start and stop receive unit for
specified queues.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: support Tx queue start/stop
Jiawen Wu [Thu, 8 Jul 2021 09:32:35 +0000 (17:32 +0800)]
net/ngbe: support Tx queue start/stop

Initializes transmit unit, support to start and stop transmit unit for
specified queues.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: support device start/stop
Jiawen Wu [Thu, 8 Jul 2021 09:32:34 +0000 (17:32 +0800)]
net/ngbe: support device start/stop

Setup MSI-X interrupt, complete PHY configuration and set device link
speed to start device. Disable interrupt, stop hardware and clear queues
to stop device.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: support Tx queue setup/release
Jiawen Wu [Thu, 8 Jul 2021 09:32:33 +0000 (17:32 +0800)]
net/ngbe: support Tx queue setup/release

Setup device Tx queue and release Tx queue.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: support Rx queue setup/release
Jiawen Wu [Thu, 8 Jul 2021 09:32:32 +0000 (17:32 +0800)]
net/ngbe: support Rx queue setup/release

Setup device Rx queue and release Rx queue.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: setup PHY link
Jiawen Wu [Thu, 8 Jul 2021 09:32:31 +0000 (17:32 +0800)]
net/ngbe: setup PHY link

Setup PHY, determine link and speed status from PHY.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: support link update
Jiawen Wu [Thu, 8 Jul 2021 09:32:30 +0000 (17:32 +0800)]
net/ngbe: support link update

Register to handle device interrupt.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: store MAC address
Jiawen Wu [Thu, 8 Jul 2021 09:32:29 +0000 (17:32 +0800)]
net/ngbe: store MAC address

Store MAC addresses and init receive address filters.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: identify and reset PHY
Jiawen Wu [Thu, 8 Jul 2021 09:32:28 +0000 (17:32 +0800)]
net/ngbe: identify and reset PHY

Identify PHY to get the PHY type, and perform a PHY reset.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: add HW initialization
Jiawen Wu [Thu, 8 Jul 2021 09:32:27 +0000 (17:32 +0800)]
net/ngbe: add HW initialization

Initialize the hardware by resetting the hardware in base code.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: initialize and validate EEPROM
Jiawen Wu [Thu, 8 Jul 2021 09:32:26 +0000 (17:32 +0800)]
net/ngbe: initialize and validate EEPROM

Reset swfw lock before NVM access, init EEPROM and validate the
checksum.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: set MAC type and LAN ID with initialization
Jiawen Wu [Thu, 8 Jul 2021 09:32:25 +0000 (17:32 +0800)]
net/ngbe: set MAC type and LAN ID with initialization

Add basic init and uninit function.
Map device IDs and subsystem IDs to single ID for easy operation.
Then initialize the shared code.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: define registers
Jiawen Wu [Thu, 8 Jul 2021 09:32:24 +0000 (17:32 +0800)]
net/ngbe: define registers

Define all registers that will be used.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: add log and error types
Jiawen Wu [Thu, 8 Jul 2021 09:32:23 +0000 (17:32 +0800)]
net/ngbe: add log and error types

Add log type and error type to trace functions.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: support probe and remove
Jiawen Wu [Thu, 8 Jul 2021 09:32:22 +0000 (17:32 +0800)]
net/ngbe: support probe and remove

Add device IDs for Wangxun 1Gb NICs, map device IDs to register ngbe
PMD. Add basic PCIe ethdev probe and remove.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/ngbe: add build and doc infrastructure
Jiawen Wu [Thu, 8 Jul 2021 09:32:21 +0000 (17:32 +0800)]
net/ngbe: add build and doc infrastructure

Adding bare minimum PMD library and doc build infrastructure
and claim the maintainership for ngbe PMD.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agoversion: 21.08-rc1
Thomas Monjalon [Sat, 10 Jul 2021 10:01:52 +0000 (12:01 +0200)]
version: 21.08-rc1

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
3 years agoethdev: keep count of representor ranges in API
Viacheslav Galaktionov [Mon, 5 Jul 2021 10:02:52 +0000 (13:02 +0300)]
ethdev: keep count of representor ranges in API

In its current state, the API can overflow the user-passed buffer if a new
representor range appears between function calls.

In order to solve this problem, augment the representor info structure with
the numbers of allocated and initialized ranges. This way the users of this
structure can be sure they will not overrun the buffer.

Fixes: 85e1588ca72f ("ethdev: add API to get representor info")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
3 years agoeal: suppress error log on multi-process hotplug
Changpeng Liu [Wed, 19 May 2021 06:45:48 +0000 (14:45 +0800)]
eal: suppress error log on multi-process hotplug

This is a normal case that the primary process already
owned one device while the secondary process try to
attach it, so suppress the error log here to exclude
this case.

Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
3 years agoexamples/l3fwd-power: add baseline PMD management mode
David Hunt [Tue, 22 Jun 2021 14:07:50 +0000 (15:07 +0100)]
examples/l3fwd-power: add baseline PMD management mode

The PMD Power Management scheme currently has 3 modes,
scale, monitor and pause. However, it would be nice to
have a baseline mode for easy comparison of power savings
with and without these modes.

This patch adds a 'baseline' mode were the PMD power
management is not enabled. Use --pmd-mgmt=baseline.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
3 years agoexamples/pipeline: add FIB example
Cristian Dumitrescu [Fri, 9 Jul 2021 17:07:00 +0000 (18:07 +0100)]
examples/pipeline: add FIB example

Add example for FIB with VRF and ECMP support.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
3 years agopipeline: support LPM lookup
Cristian Dumitrescu [Thu, 8 Jul 2021 10:11:29 +0000 (11:11 +0100)]
pipeline: support LPM lookup

Add support for the Longest Prefix Match (LPM) lookup to the SWX
pipeline.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
3 years agoexamples/pipeline: add selector example
Cristian Dumitrescu [Sat, 10 Jul 2021 00:20:36 +0000 (01:20 +0100)]
examples/pipeline: add selector example

Added the files to illustrate the selector table usage.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
3 years agoexamples/pipeline: support selector table
Cristian Dumitrescu [Sat, 10 Jul 2021 00:20:35 +0000 (01:20 +0100)]
examples/pipeline: support selector table

Add application-level support for selector tables.

Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
3 years agopipeline: support selector table
Cristian Dumitrescu [Sat, 10 Jul 2021 00:20:34 +0000 (01:20 +0100)]
pipeline: support selector table

Add pipeline-level support for selector tables.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
3 years agotable: support selector table
Cristian Dumitrescu [Fri, 2 Jul 2021 22:46:05 +0000 (23:46 +0100)]
table: support selector table

A selector table is made up of groups of weighted members, with a
given member potentially part of several groups. The select operation
returns a member ID by first selecting a group based on an input group
ID and then selecting a member within that group based on hashing one
or several input header/meta-data fields. It is very useful for
implementing an ECMP/WCMP-enabled FIB or a load balancer. It is part
of the action selector described by the P4 Portable Switch
Architecture (PSA) specification.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
3 years agoexamples/pipeline: improve table update commands
Churchill Khangar [Fri, 2 Jul 2021 22:46:04 +0000 (23:46 +0100)]
examples/pipeline: improve table update commands

For more flexibility, the single monolithic table update command is
split into table entry add, table entry delete, table default entry
add, pipeline commit and pipeline abort.

Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
3 years agopipeline: fix table entry read
Cristian Dumitrescu [Mon, 5 Jul 2021 22:56:50 +0000 (23:56 +0100)]
pipeline: fix table entry read

The rte_swx_pipeline_table_entry_read() function is used to read from
a character string a table entry that is to be added to the table,
deleted from the table or set as the default entry of the table.
Addition needs both the match and the part of the entry, deletion
ignores the action part, while the default set ignores the match part,
hence the need to make both the match and the action part optional.
The logic for skipping the match or the action part was broken, hence
the current fix.

Fixes: b32c0a2c5e4c ("pipeline: add SWX table update high level API")
Cc: stable@dpdk.org
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Venkata Suresh Kumar P <venkata.suresh.kumar.p@intel.com>
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
3 years agotable: fix bucket empty check
Thierry Herbelot [Wed, 7 Jul 2021 11:19:05 +0000 (13:19 +0200)]
table: fix bucket empty check

Due to a typo, only 3 out of 4 keys in the bucket of the exact match
table were considered, which can result in valid keys being
incorrectly dropped from the table.

Fixes: d0a00966618ba ("table: add exact match SWX table")
Cc: stable@dpdk.org
Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
3 years agonet/bnxt: fix build
Ajit Khaparde [Thu, 8 Jul 2021 22:09:18 +0000 (15:09 -0700)]
net/bnxt: fix build

Fix build failures seen on Fedora Core 34 (GCC 11)
because of uninitialized variables.

In function ‘ulp_mapper_index_tbl_process’:
drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error:
‘*(unsigned int *)((char *)&glb_res + offsetof(struct bnxt_ulp_glb_resource_info, resource_func))’
may be used uninitialized in this function
 2252 |         struct bnxt_ulp_glb_resource_info glb_res;
      |                                           ^~~~~~~
drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error:
‘glb_res.resource_type’ may be used uninitialized in this function

In function ‘dpool_defrag’:
drivers/net/bnxt/tf_core/dpool.c:95:18: error:
‘index’ may be used uninitialized in this function
   95 |         uint32_t index;
      |                  ^~~~~

Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation")

Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/hns3: fix Arm SVE build with GCC 8.3
Chengwen Feng [Mon, 28 Jun 2021 02:57:51 +0000 (10:57 +0800)]
net/hns3: fix Arm SVE build with GCC 8.3

If the target machine has SVE feature (e.g. '-march=armv8.2-a+sve'),
and compiler is gcc-8.3, it will fail, the error is arm_sve.h:
no such file or directory.

The solution:
a. If RTE_HAS_SVE_ACLE defined (it means the minimum instruction set
support SVE ACLE) then compiles it.
b. Else if the compiler support SVE ACLE then compiles it.
c. Otherwise don't compile it.

Fixes: 8c25b02b082a ("net/hns3: fix enabling SVE Rx/Tx")
Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
3 years agoconfig/arm: fix SVE build with GCC 8.3
Chengwen Feng [Mon, 28 Jun 2021 02:57:50 +0000 (10:57 +0800)]
config/arm: fix SVE build with GCC 8.3

If the target machine has SVE feature (e.g. "-march=armv8.2-a+sve'),
and the compiler is gcc-8.3, it will produce this error:
In file included from lib/eal/common/eal_common_options.c:38:
lib/eal/arm/include/rte_vect.h:13:10: fatal error:
arm_sve.h: No such file or directory
#include <arm_sve.h>
       ^~~~~~~~~~~

The root cause is that gcc-8.3 supports SVE (the macro
__ARM_FEATURE_SVE was 1), but it doesn't support SVE ACLE [1].

The solution:
a) Detect compiler whether support SVE ACLE, if support then define
RTE_HAS_SVE_ACLE macro.
b) Use the RTE_HAS_SVE_ACLE macro to include SVE header file.

[1] ACLE:  Arm C Language Extensions, the SVE ACLE header file is
<arm_sve.h>, user should include it when writing ACLE SVE code.

Fixes: 67b68824a82d ("lpm/arm: support SVE")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
3 years agoring: use WFE to wait for tail update on aarch64
Ruifeng Wang [Wed, 7 Jul 2021 05:48:38 +0000 (13:48 +0800)]
ring: use WFE to wait for tail update on aarch64

Instead of polling for tail to be updated, use WFE instruction.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
3 years agospinlock: use WFE to reduce contention on aarch64
Gavin Hu [Wed, 7 Jul 2021 05:48:37 +0000 (13:48 +0800)]
spinlock: use WFE to reduce contention on aarch64

In acquiring a spinlock, cores repeatedly poll the lock variable.
This is replaced by rte_wait_until_equal API.

Running micro benchmarking and testpmd and l3fwd traffic tests
on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well
and no notable performance gain nor degradation was measured.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
3 years agonet/af_xdp: support power monitoring
Anatoly Burakov [Fri, 9 Jul 2021 16:08:11 +0000 (16:08 +0000)]
net/af_xdp: support power monitoring

Implement support for .get_monitor_addr in AF_XDP driver.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
3 years agoexamples/l3fwd-power: support multiqueue ethdev power management
Anatoly Burakov [Fri, 9 Jul 2021 16:08:17 +0000 (16:08 +0000)]
examples/l3fwd-power: support multiqueue ethdev power management

Currently, l3fwd-power enforces the limitation of having one queue per
lcore. This is no longer necessary, so remove the limitation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
3 years agopower: support monitoring multiple Rx queues
Anatoly Burakov [Fri, 9 Jul 2021 16:08:16 +0000 (16:08 +0000)]
power: support monitoring multiple Rx queues

Use the new multi-monitor intrinsic to allow monitoring multiple ethdev
Rx queues while entering the energy efficient power state. The multi
version will be used unconditionally if supported, and the UMWAIT one
will only be used when multi-monitor is not supported by the hardware.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
3 years agopower: support callbacks for multiple Rx queues
Anatoly Burakov [Fri, 9 Jul 2021 16:08:15 +0000 (16:08 +0000)]
power: support callbacks for multiple Rx queues

Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.

The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:

- Replace per-queue structures with per-lcore ones, so that any device
  polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
  added to the list of queues to poll, so that the callback is aware of
  other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
  shared between all queues polled on a particular lcore, and is only
  activated when all queues in the list were polled and were determined
  to have no traffic.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
  is incapable of monitoring more than one address.

Also, while we're at it, update and improve the docs.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
3 years agopower: make ethdev power management thread unsafe
Anatoly Burakov [Fri, 9 Jul 2021 16:08:14 +0000 (16:08 +0000)]
power: make ethdev power management thread unsafe

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
3 years agoeal: add power monitor for multiple events
Anatoly Burakov [Fri, 9 Jul 2021 16:08:13 +0000 (16:08 +0000)]
eal: add power monitor for multiple events

Use RTM and WAITPKG instructions to perform a wait-for-writes similar to
what UMWAIT does, but without the limitation of having to listen for
just one event. This works because the optimized power state used by the
TPAUSE instruction will cause a wake up on RTM transaction abort, so if
we add the addresses we're interested in to the read-set, any write to
those addresses will wake us up.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
3 years agoeal: use callbacks for power monitoring comparison
Anatoly Burakov [Fri, 9 Jul 2021 16:08:10 +0000 (16:08 +0000)]
eal: use callbacks for power monitoring comparison

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
3 years agodoc: add power management to NIC features
Anatoly Burakov [Fri, 9 Jul 2021 16:08:12 +0000 (16:08 +0000)]
doc: add power management to NIC features

At this point, multiple different Ethernet drivers from multiple vendors
will support the PMD power management scheme. It would be useful to add
it to the NIC feature table to indicate support for it.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
3 years agodoc: add aarch32 build guidance
Phil Yang [Wed, 7 Jul 2021 13:25:43 +0000 (15:25 +0200)]
doc: add aarch32 build guidance

Add cross-compiling guidance for 32-bit aarch32 DPDK on aarch64 host.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Aaron Conole <aconole@redhat.com>
3 years agoconfig/arm: add aarch32 cross-compilation
Juraj Linkeš [Wed, 7 Jul 2021 13:25:42 +0000 (15:25 +0200)]
config/arm: add aarch32 cross-compilation

Create meson cross file arm32_armv8a_linux_gcc. Use arm-linux-gnueabihf-
toolset which comes with standard packages on most used systems, such as
Ubuntu and CentOS.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
3 years agoconfig/arm: add aarch32
Juraj Linkeš [Wed, 7 Jul 2021 13:25:41 +0000 (15:25 +0200)]
config/arm: add aarch32

Add aarch32 armv8 SoC to build config.
Also modify how arm flags are updated in meson build - for 32-bit build,
update only if cross-compiling.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
3 years agoeal/arm: update CPU flags
Juraj Linkeš [Wed, 7 Jul 2021 13:25:40 +0000 (15:25 +0200)]
eal/arm: update CPU flags

There are two execution states on armv8 architecture, aarch64 and
aarch32. Add PLATFORM_STR for the latter and update RTE_ARCH_* flags
according to e9b97392640.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
3 years agonet/virtio: fix aarch32 build
Juraj Linkeš [Wed, 7 Jul 2021 13:25:39 +0000 (15:25 +0200)]
net/virtio: fix aarch32 build

NEON vector path of the PMD needs aarch64 support. But it was
enabled for aarch32 build as well because aarch32 build had
cpu_family set to aarch64. So build for aarch32 will fail due
to unsupported intrinsics.

Fix aarch32 build by updating meson file to exclude NEON vector
implementation for aarch32.

Fixes: 749799482a72 ("net/virtio: add to meson build")
Cc: stable@dpdk.org
Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agonet/bnxt: fix aarch32 build
Ruifeng Wang [Wed, 7 Jul 2021 13:25:38 +0000 (15:25 +0200)]
net/bnxt: fix aarch32 build

NEON vector path of the PMD needs aarch64 support. But it was
enabled for aarch32 build as well because aarch32 build had
cpu_family set to aarch64. So build for aarch32 will fail due
to unsupported intrinsics.

Fix aarch32 build by updating meson file to exclude NEON vector
implementation for aarch32.

Fixes: 398358341419 ("net/bnxt: support NEON")
Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>
3 years agonet/sfc: fix aarch32 build
Ruifeng Wang [Wed, 7 Jul 2021 13:25:37 +0000 (15:25 +0200)]
net/sfc: fix aarch32 build

The sfc PMD was enabled for aarch32 which is 32-bit mode but has
cpu_family set to aarch64.
As sfc support only 64-bit system, it should be disabled for aarch32.

Updated meson file to disable sfc for aarch32 build.

Fixes: 141d2870675a ("net/sfc: support aarch64 architecture")
Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
3 years agomaintainers: update for ABI
Ray Kinsella [Tue, 22 Jun 2021 15:50:17 +0000 (16:50 +0100)]
maintainers: update for ABI

Update to ABI MAINTAINERS.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
3 years agomaintainers: update for Arm v8
Jerin Jacob [Fri, 25 Jun 2021 14:27:20 +0000 (19:57 +0530)]
maintainers: update for Arm v8

Resigning my maintainership for ARM v8 architecture.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
3 years agoconfig/arm: add Qualcomm Centriq 2400 part number
Thierry Herbelot [Thu, 17 Jun 2021 15:13:28 +0000 (17:13 +0200)]
config/arm: add Qualcomm Centriq 2400 part number

0xc00 is for "SoC 2.0" Qualcomm Centriq servers.
0x800 is for "SoC 1.1".

Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
3 years agokni: update link only on change
Ferruh Yigit [Thu, 24 Jun 2021 13:32:16 +0000 (14:32 +0100)]
kni: update link only on change

'rte_kni_update_link()' updates virtual KNI interface link using kernel
sysfs interface.
If the requested link status is same as interface link status, do not
update the link status but return with success.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agobuild: support drivers symlink on Windows
Nick Connolly [Mon, 26 Apr 2021 10:07:32 +0000 (11:07 +0100)]
build: support drivers symlink on Windows

The symlink-drivers-solibs.sh script was disabled as part of 'install'
for Windows because there is no support for shell scripts. However,
this means that driver related DLLs are not present in the installed
'libdir' directory. Add a python script to perform the install and use
it for Windows if the version of meson supports using an external
program with add_install_script (>= 0.55.0).

On Windows, symbolic links are somewhat problematic since the
SeCreateSymbolicLinkPrivilege is required to be able to create them.
In addition, different cross-compilation environments handle symbolic
links differently, e.g. WSL, Msys2, Cygwin. Rather than trying to
distinguish these scenarios, the python script will perform a file copy
for any Windows specific names.

On Windows, the shared library outputs have different names depending
upon which toolset has been used to build them. The script currently
handles Clang and GCC.

On Linux the functionality is unchanged, but could be replaced with the
python script once the required minimum version of meson is >= 0.55.0.

Cc: stable@dpdk.org
Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Tested-by: Narcisa Vasile <navasile@linux.microsoft.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agotest/power: fix CPU frequency when turbo enabled
Richael Zhuang [Fri, 9 Jul 2021 10:55:48 +0000 (18:55 +0800)]
test/power: fix CPU frequency when turbo enabled

On arm platform, the value in "/sys/.../cpuinfo_cur_freq" may not
be exactly the same as what was set when using CPPC cpufreq driver.
For other cpufreq driver, no need to round it currently, or else
this check will fail with turbo enabled. For example, with acpi_cpufreq,
cpuinfo_cur_freq can be 2401000 which is equal to freqs[0].It should
not be rounded to 2400000.

Fixes: 606a234c6d360 ("test/power: round CPU frequency to check")
Cc: stable@dpdk.org
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Acked-by: David Hunt <david.hunt@intel.com>
3 years agopower: support cppc_cpufreq driver
Richael Zhuang [Fri, 9 Jul 2021 10:55:47 +0000 (18:55 +0800)]
power: support cppc_cpufreq driver

Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are
supported, which are both not available on arm64 platforms. Add
support for cppc_cpufreq driver which works on most arm64 platforms.

Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Acked-by: David Hunt <david.hunt@intel.com>
3 years agodoc: fix build on Windows with Meson 0.58
Dmitry Kozlyuk [Wed, 30 Jun 2021 16:22:35 +0000 (19:22 +0300)]
doc: fix build on Windows with Meson 0.58

The `doc` target used `echo` as its command.
On Windows, `echo` is always a shell built-in, there is no binary.
Starting from meson 0.58, `run_target()` always searches for command
executable and no longer accepts `echo` as such on Windows.
Replace plain `echo` with a Python one-liner.

Fixes: d02a2dab2dfb ("doc: support building HTML guides with meson")
Cc: stable@dpdk.org
Reported-by: Rob Scheepens <rob.scheepens@nutanix.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
3 years agobuild: use platform for generic and native builds
Juraj Linkeš [Tue, 6 Jul 2021 09:44:28 +0000 (11:44 +0200)]
build: use platform for generic and native builds

The current meson option 'machine' should only specify the ISA, which is
not sufficient for Arm, where setting ISA implies other settings as well
(and is used in Arm configuration as such).
Use the existing 'platform' meson option to differentiate the type of
the build (native/generic) and set ISA accordingly, unless the user
chooses to override it with a new option, 'cpu_instruction_set'.
The 'machine' option set the ISA in x86 builds and set native/default
'build type' in aarch64 builds. These two new variables, 'platform' and
'cpu_instruction_set', now properly set both ISA and build type for all
architectures in a uniform manner.
The 'machine' option also doesn't describe very well what it sets. The
new option, 'cpu_instruction_set', is much more descriptive. Keep
'machine' for backwards compatibility.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agonet/octeontx/base: fix debug build with clang
David Marchand [Fri, 9 Jul 2021 06:02:29 +0000 (11:32 +0530)]
net/octeontx/base: fix debug build with clang

Remove conflicting declaration of this symbol.

Fixes: d0d654986018 ("net/octeontx: support event Rx adapter")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
3 years agocrypto/cnxk: fix build with asserts
Tejasree Kondoj [Fri, 9 Jul 2021 09:41:47 +0000 (15:11 +0530)]
crypto/cnxk: fix build with asserts

Removing usage of unavailable macro.

Fixes: baee42a6beff ("crypto/cnxk: add IPsec datapath")

Reported-by: Ali Alnubani <alialnu@nvidia.com>
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Tejasree Kondoj <ktejasree@marvell.com>
3 years agocrypto/cnxk: add PCI ID for cn9k
Anoob Joseph [Fri, 9 Jul 2021 06:28:43 +0000 (11:58 +0530)]
crypto/cnxk: add PCI ID for cn9k

Add PCI ID for crypo_cn9k PMD.

To avoid conflicting PCI ID in crypto_octeontx2 and crypto_cn9k PMDs,
disable crypto_cn9k PMD when built with octeontx2 config.

The lack of PCI ID is causing debug build to fail on Ubuntu 18.04
for crypto_cn9k PMD.

Reported-by: Ali Alnubani <alialnu@nvidia.com>
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
3 years agonet/ixgbe: fix flow entry access after freeing
Dapeng Yu [Fri, 9 Jul 2021 03:14:59 +0000 (11:14 +0800)]
net/ixgbe: fix flow entry access after freeing

The original code use a heap pointer after it is freed.
This patch fix it.

Fixes: a14de8b498d1 ("net/ixgbe: destroy consistent filter")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Reviewed-by: Haiyue Wang <haiyue.wang@intel.com>
3 years agonet/i40e: fix descriptor scan on Arm
Joyce Kong [Tue, 6 Jul 2021 06:54:03 +0000 (01:54 -0500)]
net/i40e: fix descriptor scan on Arm

For Arm platforms, reading descs can get re-ordered, then the
status of DD bits will be discontinuous, so add the logic to
only process continuous descs by checking DD bits.

Fixes: 4861cde46116 ("i40e: new poll mode driver")
Cc: stable@dpdk.org
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
3 years agonet/ice: fix VXLAN flow director creation
Dapeng Yu [Wed, 16 Jun 2021 01:20:53 +0000 (09:20 +0800)]
net/ice: fix VXLAN flow director creation

In original implementation, error returned when creating VXLAN flow
director with SCTP or TCP as layer 4 protocol of inner segment.

There are several root causes for the error:
1. ice_fdir_input_set_hdrs() set ICE_FLOW_SEG_HDR_UDP into protocol
header flag of inner segment of VXLAN FDIR rule, even if it shall be
ICE_FLOW_SEG_HDR_TCP or ICE_FLOW_SEG_HDR_SCTP
2. ice_fdir_input_set_hdrs() set ICE_FLOW_SEG_HDR_VXLAN into protocol
header flag of segments of VXLAN FDIR rule, it not necessary, and can
be set automatically by ice_flow_set_fld() later
3. flow type: ICE_FLTR_PTYPE_NONF_IPV4_UDP_VXLAN hides the flow type of
inner segment of VXLAN FDIR rule, then further causes function:
ice_fdir_get_gen_prgm_pkt() cannot write correct protocol id into inner
segment of training packet.

This patch fixes those defects described above.

Fixes: 855d23a07b36 ("net/ice: support VXLAN VNI field in flow director")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice/base: fix VXLAN flow director creation
Dapeng Yu [Wed, 16 Jun 2021 01:20:52 +0000 (09:20 +0800)]
net/ice/base: fix VXLAN flow director creation

In original implementation, error returned when creating VXLAN flow
director with SCTP or TCP as layer 4 protocol of inner segment.

There are several root causes for the error:
1. ice_fdir_udp4_vxlan_pkt[] is not adapted to the TCP and SCTP protocol.
Its length cannot hold TCP header, only UDP protocol was supported in
original implementation
2. VXLAN VNI offset: 45 is inconsistent with IETF RFC 7348

This patch fixes those defects described above.

Fixes: 608cd0a5e283 ("net/ice/base: support VXLAN VNI field in flow director")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: support QoS bandwidth config after VF reset in DCF
Ting Xu [Thu, 8 Jul 2021 02:33:46 +0000 (10:33 +0800)]
net/ice: support QoS bandwidth config after VF reset in DCF

When VF reset happens, the QoS bandwidth configuration will be lost. If
the reset is not caused by DCB change, it is supposed to replay the
bandwidth configuration to VF by DCF. In this patch, when a vsi update
PF event is received from PF after VF reset, and it is confirmed that
DCB is not changed, bandwidth configuration will be replayed.

Signed-off-by: Ting Xu <ting.xu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: fix check for QoS in DCF
Ting Xu [Thu, 8 Jul 2021 02:32:25 +0000 (10:32 +0800)]
net/ice: fix check for QoS in DCF

This patch fixed some unreasonable error check. Move all checks into one
helper function before configuring. Skip the check for DCF (VF0).

Fixes: 3a6bfc37eaf4 ("net/ice: support QoS config VF bandwidth in DCF")
Cc: stable@dpdk.org
Signed-off-by: Ting Xu <ting.xu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/iavf: simplify flow director rules for IP fragment
Wenjun Wu [Wed, 7 Jul 2021 06:18:53 +0000 (14:18 +0800)]
net/iavf: simplify flow director rules for IP fragment

This patch simplify the pattern of flow rules of FDIR for IP fragment.

Flow rule can be created by the following command:
1. flow create 0 ingress pattern eth /
   ipv4 fragment_offset spec 0x2000 fragment_offset mask 0x2000 /
   end <actions>
2. flow create 0 ingress pattern eth / ipv6 /
   ipv6_frag_ext fragment_offset spec 0x0001 fragment_offset mask 0x0001 /
   end <actions>

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: simplify flow director rules for IP fragment
Wenjun Wu [Wed, 7 Jul 2021 06:18:13 +0000 (14:18 +0800)]
net/ice: simplify flow director rules for IP fragment

This patch simplify the pattern of flow rules of FDIR for IP fragment.

Flow rule can be created by the following command:
1. flow create 0 ingress pattern eth /
   ipv4 fragment_offset spec 0x2000 fragment_offset mask 0x2000 /
   end <actions>
2. flow create 0 ingress pattern eth / ipv6 /
   ipv6_frag_ext fragment_offset spec 0x0001 fragment_offset mask 0x0001 /
   end <actions>

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: fix memzone leak when firmware is missing
David Marchand [Tue, 6 Jul 2021 14:12:37 +0000 (16:12 +0200)]
net/ice: fix memzone leak when firmware is missing

Caught by our QE.
When the firmware is missing, memzones were not released.

$ dpdk-testpmd -c 0x1f -a 0:0:0.0 -- -i
...

testpmd> dump_memzone
...
Zone 6: name:<RTE_METRICS>, len:0x15040, virt:0x1661b24c0, socket_id:0,
flags:0
physical segments used:
  addr: 0x140000000 iova: 0x140000000 len: 0x40000000 pagesz: 0x40000000

testpmd> port attach 0000:5e:00.0
Attaching a new port...
EAL: Using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:5e:00.0 (socket 0)
ice_load_pkg(): failed to open file: /lib/firmware/intel/ice/ddp/ice.pkg

ice_dev_init(): Failed to load the DDP package,Use safe-mode-support=1 to
 enter Safe Mode
EAL: Releasing PCI mapped resource for 0000:5e:00.0
EAL: Calling pci_unmap_resource for 0000:5e:00.0 at 0x2200000000
EAL: Calling pci_unmap_resource for 0000:5e:00.0 at 0x2202000000
EAL: Driver cannot attach the device (0000:5e:00.0)
EAL: Failed to attach device on primary process
testpmd: Failed to attach port 0000:5e:00.0

testpmd> dump_memzone
...
Zone 139: name:<ice_dma_17168374657430093156>, len:0x1000,
  virt:0x1660ed800, socket_id:0, flags:0 physical segments used:
  addr: 0x140000000 iova: 0x140000000 len: 0x40000000 pagesz: 0x40000000

With 20 tries attaching a net/ice port, we would end up with:

EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:5e:00.0 (socket 0)
EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested memzone
  segments exceeds RTE_MAX_MEMZONE
ice_dev_init(): Failed to initialize HW

Fixes: a4c8c48fe3f4 ("net/ice: load OS default package")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
3 years agocommon/mlx5: fix compatibility with OFED port query API
Viacheslav Ovsiienko [Wed, 7 Jul 2021 15:54:28 +0000 (18:54 +0300)]
common/mlx5: fix compatibility with OFED port query API

The compilation flag HAVE_MLX5DV_DR_DEVX_PORT depends on presence
of mlx5dv_query_devx_port routine in rdma-core library.

The mlx5dv_query_devx_port routine exists only in OFED versions
of rdma-core library and is being planned to be removed and replaced
with Upstream compatible mlx5dv_query_port.

As mlx5dv_query_devx_port is being removed all the dependencies on
the HAVE_MLX5DV_DR_DEVX_PORT compilation flag are reconsidered.

The new compilation flag HAVE_MLX5DV_DR_CREATE_DEST_IB_PORT is for
backward compatibility with older OFED versions.

Fixes: 6cfe84fbe7b1 ("net/mlx5: fix port action for LAG")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agocommon/mlx5: use new port query API if available
Viacheslav Ovsiienko [Wed, 7 Jul 2021 15:54:27 +0000 (18:54 +0300)]
common/mlx5: use new port query API if available

In order to get E-Switch vport identifiers the mlx5 PMD relies
on two approaches:
  [a] use port query API if it is provided by rdma-core library
  [b] otherwise, deduce vport ids from the related VF index
The latter is not reliable and may not work with newer kernel
drivers and in some configurations (LAG), causing E-Switch
malfunction. Hence, engaging the port query API is highly
desirable.

Depending on rdma-core version the port query API is:
  - very old OFED versions have no query API (approach [b])
  - rdma-core OFED < 5.5 provides mlx5dv_query_devx_port,
    HAVE_MLX5DV_DR_DEVX_PORT flag is defined (approach [a])
  - rdma-core OFED >= 5.5 has mlx5dv_query_port, flag
    HAVE_MLX5DV_DR_DEVX_PORT_V35 is defined (approach [a])
  - future OFED versions might remove mlx5dv_query_devx_port
    and HAVE_MLX5DV_DR_DEVX_PORT will not be defined
  - Upstream rdma-core < v35 has no port query API (approach [b])
  - Upstream rdma-core >= v35 has  mlx5dv_query_port, flag
    HAVE_MLX5DV_DR_DEVX_PORT_V35 is defined (approach [a])

In order to support the new mlx5dv_query_port routine, the
conditional compilation flag HAVE_MLX5DV_DR_DEVX_PORT_V35
is introduced by this patch. The flag HAVE_MLX5DV_DR_DEVX_PORT
is kept for compatibility with previous rdma-core versions.

Despite this patch is not a bugfix (it follows the introduced API
variation in underlying library), it resolves the compatibility
issue and is highly desired to be ported to DPDK LTS.

Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/mlx5: control flow rules with identical pattern
Jiawei Wang [Tue, 6 Jul 2021 08:12:27 +0000 (11:12 +0300)]
net/mlx5: control flow rules with identical pattern

In order to allow\disallow configuring rules with identical
patterns, the new device argument 'allow_duplicate_pattern'
is introduced.
If allow, these rules be inserted successfully and only the
first rule take affect.
If disallow, the first rule will be inserted and other rules
be rejected.

The default is to allow.
Set it to 0 if disallow, for example:
-a <PCI_BDF>,allow_duplicate_pattern=0

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/mlx5: validate meter action in policy
Shun Hao [Tue, 6 Jul 2021 13:14:50 +0000 (16:14 +0300)]
net/mlx5: validate meter action in policy

This adds the validation when creating a policy with meter action.

Currently meter action is only allowed for green color in policy, and
8 meters are supported at maximum in one meter hierarchy.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/mlx5: add meter hierarchy destroy and cleanup
Shun Hao [Tue, 6 Jul 2021 13:14:49 +0000 (16:14 +0300)]
net/mlx5: add meter hierarchy destroy and cleanup

When creating hierarchy meter, its color rules will increase next
meter's reference count, so when destroy the hierarchy meter, also
need to dereference the next meter's count.

During flushing all meters of a port, need to destroy all hierarchy
meters and their policies first, to dereference the last meter in
hierarchy. Then all meters have no reference and can be destroyed.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/mlx5: support meter hierarchy drop count
Shun Hao [Tue, 6 Jul 2021 13:14:48 +0000 (16:14 +0300)]
net/mlx5: support meter hierarchy drop count

When using meter hierarchy with multiple meters, every meter may have
drop counter, so a packet being set red color by one meter should be
counted to that specific meter only.

To support this, add tag action in the color rule so packet going to
next new meter can have its meter id, so as to be counted to the
correct drop counter in drop table.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/mlx5: support meter action in meter policy
Shun Hao [Tue, 6 Jul 2021 13:14:47 +0000 (16:14 +0300)]
net/mlx5: support meter action in meter policy

This makes the meter policy support meter action. So multiple meters
can be chained as a meter hierarchy.

Only termination meter is allowed as the last meter in a hierarchy,
and there're two cases:
1. The last meter has non-RSS policy, can directly create sub-policy
and color rules during each meter's policy creation.
2. The last meter has RSS policy, don't create sub-policy/rules when
creating meter policy. Only when a RTE flow is using the meter hierarchy,
will iterate all meters of the hierarchy and create needed sub-
policies and color rules for them.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/mlx5: limit inner RSS expansion for MPLS
Xiaoyu Min [Fri, 2 Jul 2021 08:34:48 +0000 (16:34 +0800)]
net/mlx5: limit inner RSS expansion for MPLS

If user wants to do MPLS inner RSS and only provides pattern
till MPLS without inner items [1], RSS expansion will expand flows
into 13 sub-flows[2] which is too many and it impacts flow insert
rate, stack usage becomes large as well.

This expansion into 13 sub-flows seems not worthy of and it can
be significantly reduced (i.e, 7 sub-flows [3]) by user providing
at least one inner L2/L3 item [4].

[1]:
pattern eth / ipv4 / udp / mpls / end actions rss type tcp udp ip
end level 2 / end

[2]:
eth / ipv4 / udp / mpls
eth / ipv4 / udp / mpls / ipv4
eth / ipv4 / udp / mpls / ipv4 / udp
eth / ipv4 / udp / mpls / ipv4 / tcp
eth / ipv4 / udp / mpls / ipv6
eth / ipv4 / udp / mpls / ipv6 / udp
eth / ipv4 / udp / mpls / ipv6 / tcp
eth / ipv4 / udp / mpls / eth / ipv4
eth / ipv4 / udp / mpls / eth / ipv4 / udp
eth / ipv4 / udp / mpls / eth / ipv4 / tcp
eth / ipv4 / udp / mpls / eth / ipv6
eth / ipv4 / udp / mpls / eth / ipv6 / udp
eth / ipv4 / udp / mpls / eth / ipv6 / tcp

[3]:
eth / ipv4 / udp / mpls / eth
eth / ipv4 / udp / mpls / eth / ipv4 / udp
eth / ipv4 / udp / mpls / eth / ipv4 / tcp
eth / ipv4 / udp / mpls / eth / ipv6
eth / ipv4 / udp / mpls / eth / ipv6 / udp
eth / ipv4 / udp / mpls / eth / ipv6 / tcp

[4]:
pattern eth / ipv4 / udp / mpls / eth / end actions rss type tcp udp ip
level 2 / end

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/mlx5: fix MPLS RSS expansion
Xiaoyu Min [Fri, 2 Jul 2021 08:34:47 +0000 (16:34 +0800)]
net/mlx5: fix MPLS RSS expansion

MPLSoUDP and MPLSoGRE are supported by PMD from
rte flow point of view.

RSS expansion doesn't support above but, instead, supports
normal MPLS over L2, which actually will be rejected by PMD.

This patch removes RSS expansion support of the MPLS over L2
and adds support of MPLSoUDP and MPLSoGRE.

In addition to above, support for eth over MPLS expansion is
added too.

Fixes: a4a5cd21d20a ("net/mlx5: add flow MPLS item")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/mlx5: remove unsupported flow item MPLS over IP
Xiaoyu Min [Fri, 2 Jul 2021 08:34:46 +0000 (16:34 +0800)]
net/mlx5: remove unsupported flow item MPLS over IP

HW doesn't support match MPLS over IP traffic.

Remove related code.

Fixes: d1abe664ddde ("net/mlx5: add MPLS to Direct Verbs flow engine")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/mlx5: fix offset calculation for modify field action
Alexander Kozyrev [Mon, 5 Jul 2021 09:47:04 +0000 (12:47 +0300)]
net/mlx5: fix offset calculation for modify field action

Offsets are not taken into account during MAC addresses
manipulation for the MODIFY_FIELD action. That leads to
a wrong split between 0-15 and 16-47 bits and corrupted
data being copied to/from MAC addresses. Use both source
and destination offsets to calcucate the proper modify
header action specification.

Fixes: fdd0c046f4 ("net/mlx5: fix modify field action order for MAC")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: fix L4 integrity translation
Gregory Etelson [Mon, 5 Jul 2021 09:05:00 +0000 (12:05 +0300)]
net/mlx5: fix L4 integrity translation

MLX5 PMD supports L3 and L4 integrity bits.
L4 checksum-ok bit was not translated correctly.
The patch updates the l4_csum_ok integrity bit translation.

Fixes: 79f8952783d0 ("net/mlx5: support integrity flow item")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agocommon/mlx5: fix Netlink receive message buffer size
Viacheslav Ovsiienko [Thu, 1 Jul 2021 07:31:33 +0000 (10:31 +0300)]
common/mlx5: fix Netlink receive message buffer size

If there are many VFs the Netlink message length sent by kernel
in reply to RTM_GETLINK request can be large. We should query
the size of message being received in advance and allocate
the large enough buffer to handle these large messages.

Fixes: ccdcba53a3f4 ("net/mlx5: use Netlink to add/remove MAC addresses")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: fix match MPLS over GRE with key
Xiaoyu Min [Thu, 1 Jul 2021 05:54:56 +0000 (13:54 +0800)]
net/mlx5: fix match MPLS over GRE with key

Currently PMD needs previous layer information in order to set
corresponding match field for MPLSoGRE or MPLSoUDP.

GRE_KEY item is missing as supported previous layer when translate
item MPLS, which causes flow[1] cannot match MPLS over GRE traffic.

According to RFC4023, MPLS over GRE tunnel with optional key
field needs to be supported too.

By adding missing GRE_KEY as supported previous layer fix problem.

[1]:
flow create 0 ingress pattern eth / ipv6 / gre k_bit is 1 / gre_key /
mpls label is 966138 / end actions queue index 1 / mark id 0xa / end

Fixes: a7a0365565a4 ("net/mlx5: match GRE key and present bits")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/mlx5: fix pattern expansion in RSS flow rules
Gregory Etelson [Wed, 30 Jun 2021 07:19:52 +0000 (10:19 +0300)]
net/mlx5: fix pattern expansion in RSS flow rules

Flow rule pattern may be implicitly expanded by the PMD if the rule
has RSS flow action. The expansion adds network headers to the
original pattern. The new pattern lists all network levels that
participate in the rule RSS action.

The patch validates that buffer for expanded pattern has enough bytes
for new flow items.

Fixes: c7870bfe09dc ("ethdev: move RSS expansion code to mlx5 driver")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: add more details to flow dump
Haifei Luo [Mon, 31 May 2021 02:22:08 +0000 (05:22 +0300)]
net/mlx5: add more details to flow dump

Currently the flow dump provides few information about actions
- just the pointers. Add implementations to display details for
counter, modify_hdr and encap_decap actions.

For counter, the regular flow operation query is engaged and
the counter content information is provided, including hits
and bytes values.For modify_hdr, encap_and decap actions,
the information stored in the ipool objects is dumped.

There are the formats of information presented in the dump:
Counter: rec_type,id,hits,bytes
Modify_hdr: rec_type,id,actions_number,actions
Encap_decap: rec_type,id,buf

Signed-off-by: Haifei Luo <haifeil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: fix r/w lock usage in DMA unmap
Feifei Wang [Thu, 27 May 2021 09:48:06 +0000 (17:48 +0800)]
net/mlx5: fix r/w lock usage in DMA unmap

For mlx5 DMA unmap, write lock should be used for rebuilding memory
region cache table rather than read lock.

Fixes: 989e999d9305 ("net/mlx5: support PCI device DMA map and unmap")
Cc: stable@dpdk.org
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>