dpdk.git
3 years agonet/mlx5: add translation of connection tracking action
Bing Zhao [Wed, 5 May 2021 12:23:22 +0000 (15:23 +0300)]
net/mlx5: add translation of connection tracking action

When creating a flow with this action context for CT, it needs to be
translated in 2 levels.

First, retrieve from action context to rte_flow action.
Second, translate it to the corresponding DR action with traffic
direction that was specified when creating or updating via
rte_flow_action_handle* API.

Before using the DR action in a flow, the CT context should be
available to use in the hardware. A synchronization is done before
inserting the flow rule with CT action to check the HW availability
of this CT context.

In order to release the DR actions and reuse the context of a CT,
the reference count should also be handled in the flow rule
destroying.

The CT index will be recorded in the rte_flow by reusing the ASO age
index to save memory, since only one ASO action is supported in one
flow rule currently. The action context type should also be saved
for CT. When destroying a flow rule, if the context type is CT and
the index is valid (non-zero), the release process should be
handled. By default, the handling will fall back to try to release
the ASO age if any.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: add ASO connection tracking destroy
Bing Zhao [Wed, 5 May 2021 12:23:21 +0000 (15:23 +0300)]
net/mlx5: add ASO connection tracking destroy

When trying to destroy an ASO connection tracking context, the DR
action created on this context should also be destroyed. Before
inserting the related software object into the management free list,
the reference count should be checked.

Right now, the context object will not be freed to the system and
will be reused directly from the free list.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: add ASO connection tracking query
Bing Zhao [Wed, 5 May 2021 12:23:20 +0000 (15:23 +0300)]
net/mlx5: add ASO connection tracking query

After the connection tracking context is created and being used by
the flows, the context will be updated by the HW automatically after
a packet passed the CT validation. E.g., the ACK, SEQ, window and
state of CT can be updated with both direction traffic.

In order to query the updated contents of this context, a WQE should
be posted to the SQ with a return buffer. The data will be filled
into the buffer. And the profile will be filled with specific value.

During the execution of query command, the context may be updated.
The result of the query command may not be the latest one.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: release connection tracking management
Bing Zhao [Wed, 5 May 2021 12:23:19 +0000 (15:23 +0300)]
net/mlx5: release connection tracking management

When freeing the IB shared context during stopping a device, the
ASO connection tracking management structure should also be cleaned
up.

All the DR actions created should be destroyed. The structures need
to be freed and ASO CT QP should be released. In the meanwhile, the
allocated and registered memory region for query should also be
deregistered and then freed.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: add actions for connection tracking creation
Bing Zhao [Wed, 5 May 2021 12:23:18 +0000 (15:23 +0300)]
net/mlx5: add actions for connection tracking creation

Allocating a CT from the management pools and creating the DR actions
for both directions by default.

If there is no available connection tracking action, a new pool will
be created with a fixed size bulk allocation. Right now, all the
resources are controlled by the linked list.

The ASO connection tracking context associated with these actions
need to be updated via WQE before using for steering.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: support connection tracking modify
Bing Zhao [Wed, 5 May 2021 12:23:17 +0000 (15:23 +0300)]
net/mlx5: support connection tracking modify

After the connection tracking object bulk is allocated, all the
objects' contents are filled with zero by default. Every
new-allocated object must be modified via WQE operation before it is
used.

In order to reduce the latency for the flow creation, an asynchronous
way is used instead of busy waiting for the CQE to be generated.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agocommon/mlx5: add DevX connection tracking objects creation
Bing Zhao [Wed, 5 May 2021 12:23:16 +0000 (15:23 +0300)]
common/mlx5: add DevX connection tracking objects creation

Adding support for connection tracking ASO creation via Devx command.
Right now only bulk creation is supported.

By default, the objects with zero contents will be created. Before
using a single object, the modification via posting a WQE to the ASO
CT SQ is needed.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: initialize connection tracking management
Bing Zhao [Wed, 5 May 2021 12:23:15 +0000 (15:23 +0300)]
net/mlx5: initialize connection tracking management

The definitions of ASO connection tracking objects management
structures are added.

Considering performance, the bulk allocation of ASO CT objects
should be used. The maximal value per bulk and the granularity could
be fetched from HCA capabilities 2. Right now, a fixed number of 64
is used for each bulk for a better management purpose.

The ASO QP for CT is initialized, the SQ will be used for both
modify and query command.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: use meter color register for connection tracking
Bing Zhao [Wed, 5 May 2021 12:23:14 +0000 (15:23 +0300)]
net/mlx5: use meter color register for connection tracking

Based on the capacity, 3 registers could be used. Due to the register
allocation, only the one REG_C_3 for meter color could be reused
right now.

Then in the same flow, no more than one ASO action can be supported.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agocommon/mlx5: check connection tracking offload capability
Bing Zhao [Wed, 5 May 2021 12:23:13 +0000 (15:23 +0300)]
common/mlx5: check connection tracking offload capability

During startup, the ASO connection tracking offload capability could
be queried via HCA_CAP_QUERY command. If the HW doesn't support ASO
CT, the value would be 0 by default. The following initialization
should be skipped and the creation of the CT object should return
a failure directly.

The following CT creation should also check this capability. With
the old driver, the pre-processing macro should be used in order to
make the compiling pass.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agocommon/mlx5: add connection tracking object
Bing Zhao [Wed, 5 May 2021 12:23:12 +0000 (15:23 +0300)]
common/mlx5: add connection tracking object

The structures of ASO connection tracking offload object are added
based on the definitions in the PRM. One CT object context will be
loaded into the cache completely in a reversed order of dwords. The
valid bit should be the MSB of the last dword. This is used for the
conntrack context creation and update, as well as for the query.

The capabilities 2 (HCA_CAP_2) layout is also added. The connection
tracking related capabilities could be queried via the HCA_CAP_2.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: fix TCP flags size for modify actions
Wisam Jaddo [Wed, 28 Apr 2021 10:29:07 +0000 (13:29 +0300)]
net/mlx5: fix TCP flags size for modify actions

From RFC the size of the TCP flags is 9, while the defined
current size is 6.

Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
Cc: stable@dpdk.org
Signed-off-by: Wisam Jaddo <wisamm@nvidia.com>
Reviewed-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: support power monitoring
Alexander Kozyrev [Thu, 29 Apr 2021 14:55:18 +0000 (17:55 +0300)]
net/mlx5: support power monitoring

Support the PMD power management API in MLX5 driver.
The monitor policy of this API puts a CPU core to sleep until
a data in some monitored memory address is changed by the NIC.
Implement the get_monitor_addr function to return an address
of a CQE owner bit to monitor the arrival of a new packet.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: workaround ASO memory region creation
Michael Baum [Mon, 26 Apr 2021 12:48:10 +0000 (15:48 +0300)]
net/mlx5: workaround ASO memory region creation

Due to kernel issue in direct MKEY creation using the DevX API for
physical memory, this patch replaces the ASO MR creation to use Verbs
API.

Fixes: f935ed4b645a ("net/mlx5: support flow hit action for aging")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/bnxt: prevent device access in error state
Kalesh AP [Mon, 3 May 2021 05:21:50 +0000 (10:51 +0530)]
net/bnxt: prevent device access in error state

Driver should prevent any DMA with the device when it
detects an error. When firmware is in fatal state,
stop tx/rx by assigning them to dummy functions.

Fixes: be14720def9c ("net/bnxt: support FW reset")
Fixes: 9d0cbaecc91a ("net/bnxt: support periodic FW health monitoring")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: fix ring count calculation
Ajit Khaparde [Fri, 30 Apr 2021 20:14:13 +0000 (13:14 -0700)]
net/bnxt: fix ring count calculation

Fix ring count calculation for Thor. VNIC count does not have a
direct bearing on the number of rings that can be used.

Fixes: fe8dd26f86c7 ("net/bnxt: cap max Rx rings for Thor")

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: fix mismatched type comparison in Rx
Ajit Khaparde [Fri, 30 Apr 2021 20:14:12 +0000 (13:14 -0700)]
net/bnxt: fix mismatched type comparison in Rx

Fix comparison between uint16_t and uint32_t types.

Fixes: 6dc83230b43b ("net/bnxt: support port representor data path")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: check PCI config read
Ajit Khaparde [Fri, 30 Apr 2021 20:14:11 +0000 (13:14 -0700)]
net/bnxt: check PCI config read

Return value where return value of rte_pci_read_config was not checked.
Fix it.

Coverity issue: 349919
Fixes: 9d0cbaecc91a ("net/bnxt: support periodic FW health monitoring")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
3 years agonet/bnxt: fix mismatched type comparison in MAC restore
Ajit Khaparde [Fri, 30 Apr 2021 20:14:10 +0000 (13:14 -0700)]
net/bnxt: fix mismatched type comparison in MAC restore

dev_info.max_mac_addrs is of type uint32_t. But the counter i is
of type uint16_t. This mismatch may cause the loop condition may
always be true. Change the loop counter variable to uint32_t.

Fixes: b02f1573cd07 ("net/bnxt: restore MAC filters during reset recovery")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
3 years agonet/bnxt: fix single PF per port check
Kalesh AP [Thu, 29 Apr 2021 05:53:00 +0000 (11:23 +0530)]
net/bnxt: fix single PF per port check

The check BNXT_SINGLE_PF(bp) returns false for a VF. So there is no
extra check needed for VF along with BNXT_SINGLE_PF(bp).

Also make error messages more explicit.

Fixes: ff947c6ce15f ("net/bnxt: add check for multi host PF per port")
Fixes: f86febfb46da ("net/bnxt: support VF")
Fixes: 3e12fdb78e82 ("net/bnxt: support VLAN pvid")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: fix dynamic VNIC count
Lance Richardson [Wed, 28 Apr 2021 22:03:44 +0000 (18:03 -0400)]
net/bnxt: fix dynamic VNIC count

Ensure that the current count of in-use VNICs is decremented
when a VNIC is freed. Don't attempt VNIC allocation when the
maximum supported number of VNICs is currently allocated.

Fixes: 49d0709b257f ("net/bnxt: delete and flush L2 filters cleanly")
Fixes: d24610f7bfda ("net/bnxt: allow flow creation when RSS is enabled")
Cc: stable@dpdk.org
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reported-by: Stephen Hemminger <sthemmin@microsoft.com>
3 years agonet/bnxt: fix Rx timestamp when FIFO pending bit is set
Somnath Kotur [Mon, 26 Apr 2021 06:07:55 +0000 (11:37 +0530)]
net/bnxt: fix Rx timestamp when FIFO pending bit is set

Fix to clear the Rx FIFO while reading the timestamp.
If the Rx FIFO has pending bit set, keep reading to clear it
and return the last valid timestamp instead of unconditionally
returning an error.

Fixes: b11cceb83a34 ("net/bnxt: support timesync")
Cc: stable@dpdk.org
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: refactor multi-queue Rx configuration
Somnath Kotur [Mon, 26 Apr 2021 06:07:54 +0000 (11:37 +0530)]
net/bnxt: refactor multi-queue Rx configuration

Eliminate separate codepath/handling for single queue
as the multiqueue code path takes care of it as well.
The only difference being the end_grp_id being 1
now instead of 0 for single queue, but that does not matter
for single queue and does not alter any functionality.

Fixes: 6133f207970c ("net/bnxt: add Rx queue create/destroy")
Cc: stable@dpdk.org
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/iavf: fix VLAN extraction in AVX512 path
Leyi Rong [Thu, 22 Apr 2021 02:48:30 +0000 (10:48 +0800)]
net/iavf: fix VLAN extraction in AVX512 path

The new VIRTCHNL_VF_OFFLOAD_VLAN_V2 capability added support that allows
the PF to set the location of the RX VLAN tag for stripping offloads.

So the VF needs to extract the VLAN tag according to the location flags.

This patch is the fix for AVX512 path, as AVX2 is already fixed.

Fixes: 9c9aa0040344 ("net/iavf: add offload path for Rx AVX512 flex descriptor")

Signed-off-by: Leyi Rong <leyi.rong@intel.com>
Tested-by: Qin Sun <qinx.sun@intel.com>
3 years agonet/ice: support flow director for IP fragment packet
Jeff Guo [Tue, 13 Apr 2021 10:06:30 +0000 (18:06 +0800)]
net/ice: support flow director for IP fragment packet

New FDIR parsing are added to handle the fragmented IPv4/IPv6 packet.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Signed-off-by: Ting Xu <ting.xu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: support RSS hash for IP fragment
Jeff Guo [Tue, 13 Apr 2021 10:06:29 +0000 (18:06 +0800)]
net/ice: support RSS hash for IP fragment

New pattern and RSS hash flow parsing are added to handle fragmented
IPv4/IPv6  packet.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice/base: support IP fragment RSS and FDIR
Qi Zhang [Thu, 29 Apr 2021 00:41:43 +0000 (08:41 +0800)]
net/ice/base: support IP fragment RSS and FDIR

Add support for IP fragment RSS hash and FDIR function. Separate IP
fragment and IP other packet types.

The patch also update the release date in README.

Signed-off-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice/base: sign external device package programming
Qi Zhang [Thu, 29 Apr 2021 00:41:42 +0000 (08:41 +0800)]
net/ice/base: sign external device package programming

External topology devices (e.g. PHYs) connected to 100G or to SoC that
includes 100G IP might have a firmware engine within the device and
the firmware is usually loaded from NVM connected to the topology
device.
The topology device NVM images can be updated using SW tools but
such solution poses a security risk if there is no validation of
the integrity of an image before programming it to the device NVM.
In order to prevent security risk, the topology device NVM image might
be included as part of 100G NVM image. When the topology device
NVM image is present in the 100G NVM image, it is authenticated
and might be loaded to the topology device at startup or on command
of SW using dedicated AQ.
This patch provides support for this functionality.

Signed-off-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice/base: support L3 DSCP QoS
Qi Zhang [Thu, 29 Apr 2021 00:41:41 +0000 (08:41 +0800)]
net/ice/base: support L3 DSCP QoS

The base code support to build configuration TLVs
in DSCP mode has not been implemented before, so
the functions to do so and the flow control to determine
if we are in VLAN or DSCP mode need to be added.

The current value for maximum number of DCB APPs
(ICE_DCBX_MAX_APPS) is not sufficient when supporting
DSCP mode.  Each DSCP->TC mapping will come in as a
single APP value.  So, there can be up to 64 APPs for
DSCP mapping.

Need to keep track of the current DSCP to TC mapping
so that TLVs can be built up to send to the FW.  Add
an u8 array to hold this info.

A u64 is also needed to keep track of the DSCP values
that have had an APP submitted to map its value to a
TC.  Since it would be unwise to allow an APP to be
overwritten by subsequent APPs, reject mappings for a
DSCP value that already has a user mapped value.  This
will allow us to easily track which DSCP values have
been mapped, and when the last one has been deleted.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice/base: log if DDP/FW do not support QinQ
Qi Zhang [Thu, 29 Apr 2021 00:41:40 +0000 (08:41 +0800)]
net/ice/base: log if DDP/FW do not support QinQ

Currently if the driver supports QinQ there is no message/information
if the DDP and/or FW don't support QinQ. Add functionality that prints
if the DDP and/or FW don't support QinQ if the driver attempts to
configured DVM. This will make it more obvious to users in the field
that they need to update their DDP and/or FW.

This required a small refactor so some of the existing code could be
shared and used by this new print functionality.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: refactor post DDP download VLAN mode config
Qi Zhang [Thu, 29 Apr 2021 00:41:39 +0000 (08:41 +0800)]
net/ice/base: refactor post DDP download VLAN mode config

Currently it's not clear that only the first PF downloads the package
and configures the VLAN mode. When this is happening all other PFs are
blocked on the global configuration lock. Once the package is
successfully downloaded and the global configuration lock has been
released then all PFs resume initialization. This includes some post
package download VLAN mode configuration. To make this more obvious add
the new function ice_post_pkg_dwnld_vlan_mode_cfg() so any/all post
download VLAN mode configuration code can be put in here.

This also makes it more clear that all PFs will call this new function.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: add IP fragment flags
Qi Zhang [Thu, 29 Apr 2021 00:41:38 +0000 (08:41 +0800)]
net/ice/base: add IP fragment flags

Add the IPv6 fragment flags and the IPv4 fragment field shift.

Signed-off-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agocommon/iavf: fix order of protocol header types
Ting Xu [Sun, 25 Apr 2021 06:53:04 +0000 (14:53 +0800)]
common/iavf: fix order of protocol header types

The new virtchnl protocol header types for IPv4 and IPv6 fragment are
not added in order, which will break ABI. Move them to the end of the
list.

Fixes: e6a42fd9158b ("common/iavf: add protocol header for IP fragment")
Cc: stable@dpdk.org
Signed-off-by: Ting Xu <ting.xu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agovhost: fix offload flags in Rx path
David Marchand [Mon, 3 May 2021 16:43:44 +0000 (18:43 +0200)]
vhost: fix offload flags in Rx path

The vhost library currently configures Tx offloading (PKT_TX_*) on any
packet received from a guest virtio device which asks for some offloading.

This is problematic, as Tx offloading is something that the application
must ask for: the application needs to configure devices
to support every used offloads (ip, tcp checksumming, tso..), and the
various l2/l3/l4 lengths must be set following any processing that
happened in the application itself.

On the other hand, the received packets are not marked wrt current
packet l3/l4 checksumming info.

Copy virtio rx processing to fix those offload flags with some
differences:
- accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
- ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with
  the virtio spec),

Some applications might rely on the current behavior, so it is left
untouched by default.
A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the
new behavior.

The vhost example has been updated for the new behavior: TSO is applied to
any packet marked LRO.

Fixes: 859b480d5afd ("vhost: add guest offload setting")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agonet/virtio: refactor Tx offload helper
David Marchand [Mon, 3 May 2021 16:43:43 +0000 (18:43 +0200)]
net/virtio: refactor Tx offload helper

Purely cosmetic but it is rather odd to have an "offload" helper that
checks if it actually must do something.
We already have the same checks in most callers, so move this branch
in them.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agonet/virtio: do not touch Tx offload flags
David Marchand [Mon, 3 May 2021 16:43:42 +0000 (18:43 +0200)]
net/virtio: do not touch Tx offload flags

Tx offload flags are of the application responsibility.
Leave the mbuf alone and use a local storage for implicit tcp checksum
offloading in case of TSO.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
3 years agovdpa/mlx5: improve interrupt management
Matan Azrad [Sun, 2 May 2021 10:45:10 +0000 (13:45 +0300)]
vdpa/mlx5: improve interrupt management

The driver should notify the guest for each traffic burst detected by CQ
polling.

The CQ polling trigger is defined by `event_mode` device argument,
either by busy polling on all the CQs or by blocked call to HW
completion event using DevX channel.

Also, the polling event modes can move to blocked call when the
traffic rate is low.

The current blocked call uses the EAL interrupt API suffering a lot
of overhead in the API management and serve all the drivers and
libraries using only single thread.

Use blocking FD of the DevX channel in order to do blocked call
directly by the DevX channel FD mechanism.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovhost: add batch datapath for async packed ring
Cheng Jiang [Tue, 27 Apr 2021 08:03:34 +0000 (08:03 +0000)]
vhost: add batch datapath for async packed ring

Add batch datapath for async vhost packed ring to improve the
performance of small packet processing.

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovhost: support packed ring in async datapath
Cheng Jiang [Tue, 27 Apr 2021 08:03:33 +0000 (08:03 +0000)]
vhost: support packed ring in async datapath

For now async vhost data path only supports split ring. This patch
enables packed ring in async vhost data path to make async vhost
compatible with virtio 1.1 spec.

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovhost: refactor async split ring functions
Cheng Jiang [Tue, 27 Apr 2021 08:03:32 +0000 (08:03 +0000)]
vhost: refactor async split ring functions

This patch moves some code of async vhost split ring into
inline functions to improve the readability. Also, it
changes the pointer index style of iterator to make the
code more concise.

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
3 years agoexamples/vhost: fix overflow in argument parsing
Cheng Jiang [Tue, 27 Apr 2021 03:14:01 +0000 (03:14 +0000)]
examples/vhost: fix overflow in argument parsing

Change the way passing args to fix potential overflow in args process.

Coverity issue: 363741
Fixes: 965b06f03582 ("examples/vhost: enhance getopt_long usage")

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agonet/virtio: fix vectorized Rx queue rearm
Xueming Li [Wed, 14 Apr 2021 14:14:04 +0000 (22:14 +0800)]
net/virtio: fix vectorized Rx queue rearm

When Rx queue worked in vectorized mode and rxd <= 512, under traffic of
high PPS rate, testpmd often start and receive packets of rxd without
further growth.

Testpmd started with rxq flush which tried to rx MAX_PKT_BURST(512)
packets and drop. When Rx burst size >= Rx queue size, all descriptors
in used queue consumed without rearm, device can't receive more packets.
The next Rx burst returned at once since no used descriptors found,
rearm logic was skipped, rx vq kept in starving state.

To avoid rx vq starving, this patch always check the available queue,
rearm if needed even no used descriptor reported by device.

Fixes: fc3d66212fed ("virtio: add vector Rx")
Fixes: 2d7c37194ee4 ("net/virtio: add NEON based Rx handler")
Fixes: 52b5a707e6ca ("net/virtio: add Altivec Rx")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agotelemetry: fix race on callbacks list
Ciara Power [Wed, 5 May 2021 15:22:48 +0000 (15:22 +0000)]
telemetry: fix race on callbacks list

The list_commands() function accessed the callbacks list,
but did not take the lock. This may have caused inconsistencies if
callbacks were being registered at the same time.
This is now fixed to lock before iterating the list,
and unlock afterwards.

Fixes: f38748736eb2 ("telemetry: add default callback commands")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agotelemetry: hide internal define
Jerin Jacob [Mon, 3 May 2021 16:34:28 +0000 (22:04 +0530)]
telemetry: hide internal define

Remove TELEMETRY_MAX_CALLBACKS symbol from the public
rte_telemetry.h header file.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
3 years agotest/distributor: fix burst flush on worker quit
Stanislaw Kardach [Wed, 28 Apr 2021 14:25:53 +0000 (16:25 +0200)]
test/distributor: fix burst flush on worker quit

While working on RISC-V port I have encountered a situation where worker
threads get stuck in the rte_distributor_return_pkt() function in the
burst test.
Investigation showed some of the threads enter this function with
flag RTE_DISTRIB_GET_BUF set in the d->retptr64[0]. At the same time the
main thread has already passed rte_distributor_process() so nobody will
clear this flag and hence workers can't return.

What I've noticed is that adding a flush just after the last _process(),
similarly to how quit_workers() function is written in the
test_distributor.c fixes the issue.
Lukasz Wojciechowski reproduced the same issue on x86 using a VM with 32
emulated CPU cores to force some lcores not to be woken up.

Fixes: 7c3287a10535 ("test/distributor: add performance test for burst mode")
Cc: stable@dpdk.org
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Acked-by: David Hunt <david.hunt@intel.com>
Tested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Reviewed-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
3 years agotest/distributor: fix worker notification in burst mode
Stanislaw Kardach [Wed, 28 Apr 2021 14:25:52 +0000 (16:25 +0200)]
test/distributor: fix worker notification in burst mode

Because a single worker can process more than one packet from the
distributor, the final set of notifications in burst mode should be
sent one-by-one to ensure that each worker has a chance to wake up.

This fix mirrors the change done in the functional test by
commit f72bff0ec272 ("test/distributor: fix quitting workers in burst
mode").

Fixes: c3eabff124e6 ("distributor: add unit tests")
Cc: stable@dpdk.org
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Acked-by: David Hunt <david.hunt@intel.com>
Tested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Reviewed-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
3 years agoethdev: add missing buses in device iterator
Hemant Agrawal [Thu, 29 Apr 2021 05:55:48 +0000 (11:25 +0530)]
ethdev: add missing buses in device iterator

This patch fixes issue with OVS 2.15 not working on
DPAA/FSLMC based platform due to missing support for
these busses in dev_iterate.
This patch adds dpaa_bus and fslmc to dev iterator
for bus arguments.

Fixes: 214ed1acd125 ("ethdev: add iterator to match devargs input")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
3 years agonet/hns3: increase readability in logs
Chengwen Feng [Fri, 30 Apr 2021 09:04:04 +0000 (17:04 +0800)]
net/hns3: increase readability in logs

Some logs format u64 variables, mostly using hexadecimal which was not
readable.
This patch formats most u64 variables in decimal, and add '0x' prefix
to the ones that are not adjusted.

Fixes: c37ca66f2b27 ("net/hns3: support RSS")
Fixes: 2790c6464725 ("net/hns3: support device reset")
Fixes: 8839c5e202f3 ("net/hns3: support device stats")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: remove unused VMDq code
Chengwen Feng [Fri, 30 Apr 2021 09:04:03 +0000 (17:04 +0800)]
net/hns3: remove unused VMDq code

VMDq is not supported yet, so remove the unused code.

Fixes: d51867db65c1 ("net/hns3: add initialization")
Fixes: 1265b5372d9d ("net/hns3: add some definitions for data structure and macro")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: remove read when enabling TM QCN error event
Chengwen Feng [Fri, 30 Apr 2021 09:04:02 +0000 (17:04 +0800)]
net/hns3: remove read when enabling TM QCN error event

According to the HW manual, the read operation is unnecessary when
enabling TM QCN error event, so remove it.

Fixes: f53a793bb7c2 ("net/hns3: add more hardware error types")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: fix vector Rx burst limitation
Chengwen Feng [Fri, 30 Apr 2021 06:28:50 +0000 (14:28 +0800)]
net/hns3: fix vector Rx burst limitation

Currently, driver uses the macro HNS3_DEFAULT_RX_BURST whose value is
32 to limit the vector Rx burst size, as a result, the burst size
can't exceed 32.

This patch fixes this problem by support big burst size.
Also adjust HNS3_DEFAULT_RX_BURST to 64 as it performs better than 32.

Fixes: a3d4f4d291d7 ("net/hns3: support NEON Rx")
Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: log flow director configuration
Chengwen Feng [Fri, 30 Apr 2021 06:28:49 +0000 (14:28 +0800)]
net/hns3: log flow director configuration

The rte flow interface does not support the API of the capability
set. Therefore, fdir configuration logs are added to facilitate
debugging.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: improve IO path data cache usage
Chengwen Feng [Fri, 30 Apr 2021 06:28:48 +0000 (14:28 +0800)]
net/hns3: improve IO path data cache usage

This patch improves data cache usage by:
1. Rearrange the rxq frequency accessed fields in the IO path to the
   first 128B.
2. Rearrange the txq frequency accessed fields in the IO path to the
   first 64B.
3. Make sure ptype table align cacheline size which is 128B instead of
   min cacheline size which is 64B because the L1/L2 is 64B and L3 is
   128B on Kunpeng ARM platform.

The performance gains are 1.5% in 64B packet macfwd scenarios.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: use existing macro to get array size
Chengwen Feng [Fri, 30 Apr 2021 06:28:47 +0000 (14:28 +0800)]
net/hns3: use existing macro to get array size

This patch uses RTE_DIM() instead of ARRAY_SIZE().

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: refactor optimised register write
Chengwen Feng [Fri, 30 Apr 2021 06:28:46 +0000 (14:28 +0800)]
net/hns3: refactor optimised register write

This patch modifies hns3_write_reg_opt() API implementation because
the rte_write32() already uses rte_io_wmb().

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: remove some unused capabilities
Chengwen Feng [Fri, 30 Apr 2021 06:28:45 +0000 (14:28 +0800)]
net/hns3: remove some unused capabilities

This patch deletes some unused capabilities, include:
1. Delete some unused firmware capabilities definition, which are:
   UDP_GSO, ATR, INT_QL, SIMPLE_BD, TX_PUSH, FEC and PAUSE.
2. Delete some unused driver capabilities definition, which are:
   UDP_GSO, TX_PUSH.
3. Also redefine HNS3_DEV_SUPPORT_*  as enum type, and change some of
   the values. Note: the HNS3_DEV_SUPPORT_* values is used only inside
   the driver, so it's safe to change the values.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/mlx5: support integrity flow item
Gregory Etelson [Thu, 29 Apr 2021 18:36:58 +0000 (21:36 +0300)]
net/mlx5: support integrity flow item

MLX5 PMD supports the following integrity filters for outer and
inner network headers:
- l3_ok
- l4_ok
- ipv4_csum_ok
- l4_csum_ok

`level` values 0 and 1 reference outer headers.
`level` > 1 reference inner headers.

Flow rule items supplied by application must explicitly specify
network headers referred by integrity item. For example:
flow create 0 ingress
  pattern
    integrity level is 0 value mask l3_ok value spec l3_ok /
    eth / ipv6 / end …

or

flow create 0 ingress
  pattern
    integrity level is 0 value mask l4_ok value spec 0 /
    eth / ipv4 proto is udp / end …

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agocommon/mlx5: add PRM definitions for integrity check
Gregory Etelson [Thu, 29 Apr 2021 18:36:57 +0000 (21:36 +0300)]
common/mlx5: add PRM definitions for integrity check

Add integrity and IPv4 IHL bits to PRM file.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agoethdev: fix integrity flow item
Gregory Etelson [Thu, 29 Apr 2021 18:36:56 +0000 (21:36 +0300)]
ethdev: fix integrity flow item

Add integrity item definition to the rte_flow_desc_item array.
The new entry allows to build RTE flow item from a data
stored in rte_flow_item_integrity type.

Fixes: b10a421a1f3b ("ethdev: add packet integrity check flow rules")

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
3 years agonet/hns3: fix IEEE 1588 PTP for scalar scattered Rx
Min Hu (Connor) [Thu, 29 Apr 2021 09:19:03 +0000 (17:19 +0800)]
net/hns3: fix IEEE 1588 PTP for scalar scattered Rx

When jumbo frame is enabled, Rx function will choose 'Scalar Scattered'
function which has no PTP handling.

This patch fixes it by adding PTP handling in 'Scalar Scattered'
function.

Fixes: 38b539d96eb6 ("net/hns3: support IEEE 1588 PTP")

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: fix MAC enable failure rollback
Huisong Li [Thu, 29 Apr 2021 09:03:59 +0000 (17:03 +0800)]
net/hns3: fix MAC enable failure rollback

If driver fails to enable MAC, it does not need to rollback the MAC
configuration. This patch fixes it.

Fixes: bdaf190f8235 ("net/hns3: support link speed autoneg for PF")

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agodoc: add build config option in hns3 guide
Min Hu (Connor) [Thu, 29 Apr 2021 06:12:07 +0000 (14:12 +0800)]
doc: add build config option in hns3 guide

This patch adds description of max TQP number per PF for config file
option.

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/bnxt: drop unused attribute
Kalesh AP [Fri, 23 Apr 2021 05:22:26 +0000 (10:52 +0530)]
net/bnxt: drop unused attribute

Remove "__rte_unused" instances that are wrongly marked.

Fixes: 6dc83230b43b ("net/bnxt: support port representor data path")
Fixes: 1bf01f5135f8 ("net/bnxt: prevent device access when device is in reset")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/sfc: fix mark support in EF100 native Rx datapath
Andrew Rybchenko [Wed, 28 Apr 2021 14:17:02 +0000 (17:17 +0300)]
net/sfc: fix mark support in EF100 native Rx datapath

Decouple user mark from user flag. Usage of mark does not require to
use flag as well. Flag is not actually supported yet.

Fixes: 1aacc3d388d3 ("net/sfc: support user mark and flag Rx for EF100")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
3 years agonet/i40e: extend VF reset waiting time
Wenjun Wu [Thu, 29 Apr 2021 08:27:24 +0000 (16:27 +0800)]
net/i40e: extend VF reset waiting time

When starting VF, VF will issue reset command to PF, wait a fixed
amount of time, and assume VF reset is done on PF side. However,
compared with kernel PF, DPDK PF needs more time to setup. If we
run DPDK PF to support DPDK VF, the original delay will not be
enough.

When we first start VF after PF is launched, the execution
time of the statement info.msg_buf = rte_zmalloc("msg_buffer",
info.buf_len, 0); in the function i40e_dev_handle_aq_msg is more
than 200ms. It may cause VF start error.

Since iavf can hardly trigger this issue and i40evf will be replaced
by iavf in future DPDK versions, this patch provide a workaround.
We extend VF reset waiting time from 200ms to 500ms so that
VF can start normally when using DPDK PF and DPDK VF in most cases.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/i40e: fix primary MAC type when starting port
Robin Zhang [Wed, 28 Apr 2021 08:04:52 +0000 (08:04 +0000)]
net/i40e: fix primary MAC type when starting port

When start port, all MAC addresses will be set. We should set the MAC
type of default MAC address as VIRTCHNL_ETHER_ADDR_PRIMARY.

Fixes: 3f604ddf33cf ("net/i40e: fix lack of MAC type when set MAC address")
Cc: stable@dpdk.org
Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/iavf: fix primary MAC type when starting port
Robin Zhang [Wed, 28 Apr 2021 08:04:51 +0000 (08:04 +0000)]
net/iavf: fix primary MAC type when starting port

When start port, all MAC addresses will be set. We should set the MAC
type of default MAC address as VIRTCHNL_ETHER_ADDR_PRIMARY.

Fixes: b335e7203475 ("net/iavf: fix lack of MAC type when set MAC address")
Cc: stable@dpdk.org
Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agoraw/ifpga: fix device name format
Wei Huang [Thu, 29 Apr 2021 02:33:40 +0000 (22:33 -0400)]
raw/ifpga: fix device name format

The device name format used in ifpga_rawdev_create() was changed to
"IFPGA:%02x:%02x.%x", but the format used in ifpga_rawdev_destroy()
was left as "IFPGA:%x:%02x.%x", it should be changed synchronously.

To prevent further similar errors, macro "IFPGA_RAWDEV_NAME_FMT" is
defined to replace this format string.

Fixes: 9c006c45d0c5 ("raw/ifpga: scan PCIe BDF device tree")
Cc: stable@dpdk.org
Signed-off-by: Wei Huang <wei.huang@intel.com>
Acked-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
3 years agonet/iavf: fix Rx function selection
Wenzhuo Lu [Thu, 29 Apr 2021 01:33:57 +0000 (09:33 +0800)]
net/iavf: fix Rx function selection

A performance drop is caused by that the RX scalar path
is selected when AVX512 is disabled and some HW offload
is enabled.
Actually, the HW offload is supported by AVX2 and SSE.
In this scenario AVX2 path should be chosen.

This patch removes the offload related check for SSE and AVX2
as SSE and AVX2 do support the offload features.
No implementation change about the data path.

Fixes: eff56a7b9f97 ("net/iavf: add offload path for Rx AVX512")

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/mlx5: use aging by counter when counter exists
Michael Baum [Thu, 29 Apr 2021 09:55:42 +0000 (12:55 +0300)]
net/mlx5: use aging by counter when counter exists

The driver support 2 mechanisms in order to support AGE action:
1. Aging by counter - HW counter will be configured to the flow traffic,
the driver polls the counter values efficiently to detect flow timeout.
2. Aging by ASO flow hit bit - HW ASO flow-hit bit is allocated for the
flow, the driver polls the bit efficiently to detect flow timeout.

ASO bit is only single bit resource while counter is 16 bytes, hence, it
is better to use ASO instead of counter for aging.

When a non-shared COUNT action is also configured to the flow, the
driver can use the same counter also for AGE action and no need to
create more ASO action for it.

The current code always uses ASO when it is supported in the device,
change it to reuse the non-shared counter if it exists in the flow.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/mlx5: fix flow age event triggering
Michael Baum [Thu, 29 Apr 2021 09:55:41 +0000 (12:55 +0300)]
net/mlx5: fix flow age event triggering

A FLOW_AGE event should be invoked when a new aged-out flow is detected
by the PMD after the last user get-aged query calling.
The PMD manages 2 flags for this information and check them in order to
decide if an event should be invoked:
MLX5_AGE_EVENT_NEW - a new aged-out flow was detected. after the last
check.
MLX5_AGE_TRIGGER - get-aged query was called after the last aged-out
flow.
The 2 flags were unset after the event invoking.

When the user calls get-aged query from the event callback, the TRIGGER
flag was set inside the user callback and unset directly after the
callback what may stop the event invoking forever.

Unset the TRIGGER flag before the event invoking in order to allow set
it by the user callback.

Fixes: f935ed4b645a ("net/mlx5: support flow hit action for aging")
Cc: stable@dpdk.org
Reported-by: David Bouyeure <david.bouyeure@fraudbuster.mobi>
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agoapp/testpmd: support indirect counter action query
Michael Baum [Thu, 29 Apr 2021 09:55:40 +0000 (12:55 +0300)]
app/testpmd: support indirect counter action query

Counter action query was implemented as part of flow query, but was not
implemented as part of indirect action query.

This patch adds the required implementation.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
3 years agoapp/testpmd: remove indirect RSS action query
Michael Baum [Thu, 29 Apr 2021 09:55:39 +0000 (12:55 +0300)]
app/testpmd: remove indirect RSS action query

The port_action_handle_query function supports query operation for
indirect RSS action.

No driver currently supports this operation, and this support is
unnecessary.

Remove it.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
3 years agonet/mlx5: support flow count action handle
Michael Baum [Thu, 29 Apr 2021 09:55:38 +0000 (12:55 +0300)]
net/mlx5: support flow count action handle

Existing API supports counter action to count traffic of a single flow.
The user can share the count action among different flows using the
shared flag and the same counter ID in the count action configuration.

Recent patch [1] introduced the indirect action API.
Using this API, an action can be created as indirect, unattached to any
flow rule.
Multiple flows can then be created using the same indirect action.
The new API also supports query operation of an indirect action.

The new API is more efficient because the driver gets it's own handler
for the count action instead of managing a mapping between the user ID
to the driver handle.

Support create, query and destroy indirect action operations for flow
count action.

Application will use the indirect action query operation to query this
count action.

In the meantime the old sharing mechanism (with the sharing flag)
continues to be supported, and the user can choose the way he wants to
share the counter.
The new indirect action API is only supported in DevX, so sharing
counter action in Verbs can only be done through the old mechanism.

[1] https://mails.dpdk.org/archives/dev/2020-July/174110.html

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
3 years agonet/hns3: select Tx prepare based on Tx offload
Chengchang Tang [Wed, 28 Apr 2021 07:20:55 +0000 (15:20 +0800)]
net/hns3: select Tx prepare based on Tx offload

Tx prepare should be called only when necessary to reduce the impact on
performance.

For partial TX offload, users need to call rte_eth_tx_prepare() to
invoke the tx_prepare callback of PMDs. In this callback, the PMDs
adjust the packet based on the offloading used by the user. (e.g. For
some PMDs, pseudo-headers need to be calculated when the TX cksum is
offloaded.)

However, for the users, they cannot grasp all the hardware and PMDs
characteristics. As a result, users cannot decide when they need to
actually call tx_prepare. Therefore, we should assume that the user
calls rte_eth_tx_prepare() when using any Tx offloading to ensure that
related functions work properly. Whether packets need to be adjusted
should be determined by PMDs. They can make judgments in the
dev_configure or queue_setup phase. When the related function is not
used, the pointer of tx_prepare should be set to NULL to reduce the
performance loss caused by invoking rte_eth_tx_repare().

In this patch, if tx_prepare is not required for the offloading used by
the users, the tx_prepare pointer will be set to NULL.

Fixes: bba636698316 ("net/hns3: support Rx/Tx and related operations")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: remove unused macros
Chengwen Feng [Wed, 28 Apr 2021 07:20:54 +0000 (15:20 +0800)]
net/hns3: remove unused macros

The hns3_is_csq() and cmq_ring_to_dev() macro were defined in previous
version but never used.

Fixes: 737f30e1c3ab ("net/hns3: support command interface with firmware")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: fix time delta calculation
Chengwen Feng [Wed, 28 Apr 2021 07:20:53 +0000 (15:20 +0800)]
net/hns3: fix time delta calculation

Currently, driver uses gettimeofday() API to get the time, and
then calculate the time delta, the delta will be used mainly in
judging timeout process.

But the time which gets from gettimeofday() API isn't monotonically
increasing. The process may fail if the system time is changed.

We use the following scheme to fix it:
1. Add hns3_clock_gettime() API which will get the monotonically
   increasing time.
2. Add hns3_clock_calctime_ms() API which will get the milliseconds of
   the monotonically increasing time.
3. Add hns3_clock_calctime_ms() API which will calc the milliseconds of
   a given time.

Fixes: 2790c6464725 ("net/hns3: support device reset")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: log time delta in decimal format
Chengwen Feng [Wed, 28 Apr 2021 07:20:52 +0000 (15:20 +0800)]
net/hns3: log time delta in decimal format

If the reset process cost too much time, driver will log one error
message which formats the time delta, but the formatting is using
hexadecimal which was not readable.

This patch fixes it by formatting in decimal format.

Fixes: 2790c6464725 ("net/hns3: support device reset")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: support preferred burst size and queues in VF
Chengwen Feng [Wed, 28 Apr 2021 07:20:51 +0000 (15:20 +0800)]
net/hns3: support preferred burst size and queues in VF

This patch supports get preferred burst size and queues when call
rte_eth_dev_info_get() API with VF.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agoapp/testpmd: remove redundant forwarding initialization
Huisong Li [Wed, 28 Apr 2021 06:40:46 +0000 (14:40 +0800)]
app/testpmd: remove redundant forwarding initialization

The fwd_config_setup() is called after init_fwd_streams().
The fwd_config_setup() will reinitialize forwarding streams.
This patch removes init_fwd_streams() from init_config().

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
3 years agoapp/testpmd: add forwarding configuration to DCB config
Huisong Li [Wed, 28 Apr 2021 06:40:45 +0000 (14:40 +0800)]
app/testpmd: add forwarding configuration to DCB config

This patch adds fwd_config_setup() at the end of cmd_config_dcb_parsed()
to update "cur_fwd_config", so that the actual forwarding streams can be
queried by the "show config fwd" cmd.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
3 years agoapp/testpmd: verify DCB config during forward config
Huisong Li [Wed, 28 Apr 2021 06:40:44 +0000 (14:40 +0800)]
app/testpmd: verify DCB config during forward config

Currently, the check for doing DCB test is assigned to
start_packet_forwarding(), which will be called when
run "start" cmd. But fwd_config_setup() is used in many
scenarios, such as, "port config all rxq".

This patch moves the check from start_packet_forwarding()
to fwd_config_setup().

Fixes: 7741e4cf16c0 ("app/testpmd: VMDq and DCB updates")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
3 years agoapp/testpmd: check DCB info support for configuration
Huisong Li [Wed, 28 Apr 2021 06:40:43 +0000 (14:40 +0800)]
app/testpmd: check DCB info support for configuration

Currently, '.get_dcb_info' must be supported for the port doing DCB
test, or all information in 'rte_eth_dcb_info' are zero. It should be
prevented when user run cmd "port config 0 dcb vt off 4 pfc off".

This patch adds the check for support of reporting dcb info.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
3 years agoapp/testpmd: fix DCB re-configuration
Huisong Li [Wed, 28 Apr 2021 06:40:42 +0000 (14:40 +0800)]
app/testpmd: fix DCB re-configuration

After DCB mode is configured, if we decrease the number of RX and TX
queues, fwd_config_setup() will be called to setup the DCB forwarding
configuration. And forwarding streams are updated based on new queue
numbers in fwd_config_setup(), but the mapping between the TC and
queues obtained by rte_eth_dev_get_dcb_info() is still old queue
numbers (old queue numbers are greater than new queue numbers).
In this case, the segment fault happens. So rte_eth_dev_configure()
should be called again to update the mapping between the TC and
queues before rte_eth_dev_get_dcb_info().

Like:
set nbcore 4
port stop all
port config 0 dcb vt off 4 pfc on
port start all
port stop all
port config all rxq 8
port config all txq 8

Fixes: 900550de04a7 ("app/testpmd: add dcb support")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
3 years agoapp/testpmd: fix DCB forwarding configuration
Huisong Li [Wed, 28 Apr 2021 06:40:41 +0000 (14:40 +0800)]
app/testpmd: fix DCB forwarding configuration

After DCB mode is configured, the operations of port stop and port start
change the value of the global variable "dcb_test", As a result, the
forwarding configuration from DCB to RSS mode, namely,
“dcb_fwd_config_setup()” to "rss_fwd_config_setup()".

Currently, the 'dcb_flag' field in struct 'rte_port' indicates whether
the port is configured with DCB. And it is sufficient to have
'dcb_config' as a global variable to control the DCB test status. So
this patch deletes the "dcb_test".

In addition, setting 'dcb_config' at the end of init_port_dcb_config()
in case that ports fail to enter DCB mode.

Fixes: 900550de04a7 ("app/testpmd: add dcb support")
Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")
Fixes: 7741e4cf16c0 ("app/testpmd: VMDq and DCB updates")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
3 years agoapp/testpmd: fix forward lcores number for DCB
Huisong Li [Wed, 28 Apr 2021 06:40:40 +0000 (14:40 +0800)]
app/testpmd: fix forward lcores number for DCB

For the DCB forwarding test, each core is assigned to each traffic class.
Number of forwarding cores for DCB test must be equal or less than number
of total TC. Otherwise, the following problems may occur:
1/ Redundant polling threads will be created when forwarding cores number
   is greater than total TC number.
2/ Two cores would try to use a same queue on a port when Rx/Tx queue
   number is greater than the used TC number, which is not allowed.

Fixes: 900550de04a7 ("app/testpmd: add dcb support")
Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
3 years agonet/txgbe: add copyright owner
Jiawen Wu [Thu, 29 Apr 2021 10:33:35 +0000 (18:33 +0800)]
net/txgbe: add copyright owner

All rights reserved by Beijing Wangxun Technology Co., Ltd.
Part of the code references Intel.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agonet/txgbe: remove port representor
Jiawen Wu [Thu, 29 Apr 2021 10:33:34 +0000 (18:33 +0800)]
net/txgbe: remove port representor

Remove port representor in device probe process, because it is not
supported by the driver yet.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/txgbe: support VXLAN-GPE
Jiawen Wu [Thu, 29 Apr 2021 10:33:33 +0000 (18:33 +0800)]
net/txgbe: support VXLAN-GPE

Support VXLAN-GPE in UDP tunnel port add and delete.
Fix to parsing packet type to pass hardware checksum.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/txgbe: fix MTU limitation for VF
Jiawen Wu [Thu, 29 Apr 2021 10:33:32 +0000 (18:33 +0800)]
net/txgbe: fix MTU limitation for VF

When requested MTU is bigger than mbuf size and scattered Rx is not
enabled, setting MTU fails for VF.

But scattered Rx can be enabled in next port start if required, so
enabling setting MTU bigger than mbuf size if device is stopped
independent from scattered Rx configuration.

Fixes: a2beaa4a769e ("net/txgbe: support VF MTU update")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/softnic: fix meter policies initialization
Dapeng Yu [Thu, 29 Apr 2021 07:06:14 +0000 (15:06 +0800)]
net/softnic: fix meter policies initialization

Initialize meter policy list before use to avoid segment fault

Fixes: 0d73ddf25faa ("net/softnic: add meter profile")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
3 years agomaintainers: update for e1000/igc/ixgbe/i40e
Jeff Guo [Tue, 27 Apr 2021 07:17:41 +0000 (15:17 +0800)]
maintainers: update for e1000/igc/ixgbe/i40e

Remove Jeff Guo from the maintainers list of igc, i40e, ixgbe & e1000
PMD.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
3 years agonet/kni: warn on stop failure
Min Hu (Connor) [Tue, 27 Apr 2021 02:08:45 +0000 (10:08 +0800)]
net/kni: warn on stop failure

Return value of function 'eth_kni_dev_stop' passed to 'ret' is
rewritten later, and this is unreasonable.

This patch fixes it.

Fixes: 62024eb82756 ("ethdev: change stop operation callback to return int")
Cc: stable@dpdk.org
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agonet/tap: check ioctl on restore
Chengchang Tang [Tue, 27 Apr 2021 00:54:22 +0000 (08:54 +0800)]
net/tap: check ioctl on restore

After restoring the remote states, the return value of ioctl() is not
checked. Therefore, users cannot know whether the remote state is
restored successfully.

This patch add log for restoring failure.

Fixes: 4810d3af8343 ("net/tap: restore state of remote device when closing")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agoapp/testpmd: fix division by zero on socket memory dump
Min Hu (Connor) [Mon, 26 Apr 2021 11:57:57 +0000 (19:57 +0800)]
app/testpmd: fix division by zero on socket memory dump

Variable total, which may be zero and result in segmentation fault.

This patch fixed it.

Fixes: 9b1249d9ff69 ("app/testpmd: support dumping socket memory")
Cc: stable@dpdk.org
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agonet/txgbe: fix null pointer check
Hongbo Zheng [Sun, 25 Apr 2021 12:54:29 +0000 (20:54 +0800)]
net/txgbe: fix null pointer check

In function cons_parse_ntuple_filter, item->spec and item->mask
should be confirmed not null before use memcmp on it, current
judgement (item->spec || item->mask) just can confirm item->spec
or item->mask is not null, and cause null pointer be used in
memcmp.

This patch fix this problem.

Fixes: b7eeecb17556 ("net/txgbe: parse n-tuple filter")
Cc: stable@dpdk.org
Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/hns3: fix link speed when port is down
Huisong Li [Sun, 25 Apr 2021 12:06:29 +0000 (20:06 +0800)]
net/hns3: fix link speed when port is down

When the port is link down state, it is meaningless to display the
port link speed. It should be an undefined state.

Fixes: 59fad0f32135 ("net/hns3: support link update operation")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: fix link status when port is stopped
Huisong Li [Sun, 25 Apr 2021 12:06:28 +0000 (20:06 +0800)]
net/hns3: fix link status when port is stopped

When port is stopped, link down should be reported to user. For HNS3
PF driver, link status comes from link status of hardware. If the port
supports NCSI feature, hardware MAC will not be disabled. At this case,
even if the port is stopped, the link status is still Up. So driver
should set link down when the port is stopped.

Fixes: 59fad0f32135 ("net/hns3: support link update operation")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/mlx5: support checksum offload on Windows
Tal Shnaiderman [Wed, 21 Apr 2021 16:34:41 +0000 (19:34 +0300)]
net/mlx5: support checksum offload on Windows

Support of the checksum offloading by checking
the relevant FW capability (csum_cap) for NIC support.

RX supported offloads:

DEV_RX_OFFLOAD_IPV4_CKSUM
DEV_RX_OFFLOAD_UDP_CKSUM
DEV_RX_OFFLOAD_TCP_CKSUM

TX supported offloads:

DEV_TX_OFFLOAD_IPV4_CKSUM
DEV_TX_OFFLOAD_UDP_CKSUM
DEV_TX_OFFLOAD_TCP_CKSUM

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Odi Assli <odia@nvidia.com>
3 years agocommon/mlx5: read checksum capability from DevX
Tal Shnaiderman [Wed, 21 Apr 2021 16:34:40 +0000 (19:34 +0300)]
common/mlx5: read checksum capability from DevX

mlx5 in Windows needs the hca capability csum_cap
to query the NIC for checksum offloading support.

Added the capability as part of the capabilities
queried by the PMD using DevX.

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Odi Assli <odia@nvidia.com>