git.droids-corp.org - dpdk.git/log

ethdev: move L2 tunnel config structure to ixgbe driver

net/ixgbe driver is the only user of the struct rte_eth_l2_tunnel_conf.
Move it to the driver and use ixgbe_ prefix instead of rte_eth_.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove L2 tunnel offload control API

Remove rte_eth_dev_l2_tunnel_offload_set() and corresponding
ethdev driver operation.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove API to config L2 tunnel EtherType

Remove rte_eth_dev_l2_tunnel_eth_type_conf() and corresponding
ethdev driver operation.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove legacy filter API functions

The legacy filter API, including rte_eth_dev_filter_supported() and
rte_eth_dev_filter_ctrl() is removed. Flow API should be used.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

app/testpmd: remove command to set FDIR flexible filter mask

The command uses FDIR filter information get API which
is not supported any more.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove legacy FDIR filter type support

Instead of FDIR filters RTE flow API should be used.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove legacy global filter configuration support

Global filter configuration request was supported by net/i40e
driver only to configure GRE key length.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove legacy L2 tunnel filter type support

Instead of L2 tunnel filter RTE flow API should be used.

Preserve RTE_ETH_FILTER_L2_TUNNEL since it is used in drivers
internally in RTE flow API support.

rte_eth_l2_tunnel_conf structure is used in other ethdev API
functions.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove legacy HASH filter type support

Instead of HASH filter RTE flow API should be used.

Preserve RTE_ETH_FILTER_HASH since it is used in drivers
internally in RTE flow API support.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove legacy tunnel filter type support

Instead of TUNNEL filter RTE flow API should be used.

Move corresponding defines and helper structure to ethdev
driver interface since it is still used by drivers internally.

Preserve RTE_ETH_FILTER_TUNNEL because of usage in drivers.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove legacy N-tuple filter type support

Instead of N-tuple filter RTE flow API should be used.

Preserve struct rte_eth_ntuple_filter in ethdev API since
the structure and related defines are used in flow classify
library and a number of drivers.

Preserve RTE_ETH_FILTER_NTUPLE because of usage in drivers.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove legacy SYN filter type support

Instead of SYN filter RTE flow API should be used.

Move corresponding definitions to ethdev internal driver API
since it is used by drivers internally.
Preserve RTE_ETH_FILTER_SYN because of it as well.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: move flexible filter type to e1000 driver

net/e1000 driver is the only user of the struct rte_eth_flex_filter
and helper defines. Move it to the driver and use igb_ prefix
instead of rte_eth_.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove legacy flexible filter type support

Instead of FLEXIBLE filter RTE flow API should be used.

Temporarily preserve helper defines in public interface.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove legacy EtherType filter type support

Instead of EtherType filter RTE flow API should be used.

Move corresponding definitions to ethdev internal driver API
since it is used by drivers internally.
Preserve RTE_ETH_FILTER_ETHERTYPE because of it as well.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: move MAC filter type to i40e driver

net/i40e driver is the only user of the enum rte_mac_filter_type.
Move the define to the driver and use i40e_ prefix instead of rte_.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: remove legacy MACVLAN filter type support

Instead of MACVLAN filter RTE flow API should be used.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/vdev_netvsc: fix device probing error flow

If a device probe fails, the alarm is canceled and will no longer work
for previously probed devices.

Fix this by checking if alarm is necessary at the end of each device
probe. Reset the alarm if there are vdev_netvsc_ctx created.

Fixes: e7dc5d7becc5 ("net/vdev_netvsc: implement core functionality")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/txgbe: prevent driver forcing application to exit

Replace the 'rte_panic()' with an error return.
Also change the type of the calling function.

Fixes: a6712cd029a4 ("net/txgbe: add PF module init and uninit for SRIOV")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

vdpa/mlx5: specify lag port affinity

If set TIS lag port affinity to auto, firmware assign port affinity on
each creation with Round Robin. In case of 2 PFs, if create virtq,
destroy and create again, then each virtq will get same port affinity.

To resolve this fw limitation, this patch sets create TIS with specified
affinity for each PF.

Fixes: bff735011078 ("vdpa/mlx5: prepare virtio queues")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

common/mlx5: get number of ports that can be bonded

Get HCA capability: number of physical ports that can be bonded.

Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix uninitialized local variable

This patch initializes a local parameter in async data path to avoid
compiler warnings.

Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: handle hardware error

When hardware error happens, vdpa didn't get such information and leave
driver in silent: working state but no response.

This patch subscribes firmware virtq error event and try to recover max
3 times in 3 seconds, stop virtq if max retry number reached.

When error happens, PMD log in warning level. If failed to recover,
outputs error log. Query virtq statistics to get error counters report.

Acked-by: Matan Azrad <matan@nvidia.com>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

common/mlx5: add virtq attributes error fields

Add the needed fields for virtq DevX object to read the error state.

Acked-by: Matan Azrad <matan@nvidia.com>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix guest/host physical address conversion

gpa_to_hpa() function almost always fails due to the wrong setup of
the binary tree search key. Since there has already been a similar
function gpa_to_first_hpa() available in the vhost, instead of fixing
the issue in its original logic, gpa_to_hpa() function is rewritten to
be a wrapper of the gpa_to_first_hpa() to avoid code redundancy.

Fixes: e246896178e6 ("vhost: get guest/host physical address mappings")
Fixes: faa9867c4da2 ("vhost: use binary search in address conversion")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/virtio-user: set status on socket reconnect

Newer vhost-user backends will rely on SET_STATUS to start the device
so this required to support them.

Fixes: 57912824615f ("net/virtio-user: support vhost status setting")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/virtio-user: do not assume vhost status feature

There are some status reads and updates that need to happen before the
protocol features are negotiated. Therefore, assuming the backend does
support this feature can lead to failures.

On server mode, do not assume the backend supports
VHOST_USER_PROTOCOL_F_STATUS. Activate it back on reconnection and
clear it on disconnection.

Fixes: 57912824615f ("net/virtio-user: support vhost status setting")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/virtio-user: lock-protect status updates

In order to safely set and get the device status from different
threads (e.g: interrupt handlers).

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/virtio-user: ignore result if status is unsupported

GET/SET STATUS is an optional feature, so it may not be negotiated. In
that case, the VIRTIO_GET_STATUS call will not update the status (given
as a pointer argument). Failing to identify this case would lead to
undefined behavior as the device status will be updated with the value
of a stack-allocated variable.

To fix this, return ENOTSUP if the feature is not supported and, in that
case, don't update device status.

Fixes: 44102e6298e7 ("net/virtio: check protocol feature in user backend")
Cc stable@dpdk.org

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/virtio-user: do not assume features are negotiated

According to the virtio spec, ACK and DRIVER status bits should be set
before feature negotiation.

However, until the protocol features are negotiated, the driver does not
know if the device actually supports those vhost-user messages.
Therefore, until FEATURES_OK is set, the GET/SET_STATUS messages should
not be sent.

Fixes: 57912824615f ("net/virtio-user: support vhost status setting")
Cc: stable@dpdk.org
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>

net/virtio-user: fix backend selection if stat fails

If stat fails because the file does not exist, it means that
the backend must be vhost-user in server mode.

Also, log the detected backend type.

Bugzilla ID: 559
Fixes: f908b22ea47a ("net/virtio: move backend type selection to ethdev")
Cc: stable@dpdk.org
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>

net/mlx5: use C11 atomics in packet scheduling

The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use the relaxed ordering and explicit
memory barrier for Clock Queue and timestamps synchronization.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: remove shared context lock

To support multi-thread flow insertion, this patch removes shared data
lock since all resources should support concurrent protection.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make shared action list thread safe

This commit uses spinlock to protect the shared action list in multiple
thread.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make tunnel hub list thread safe

This commit uses spinlock to protect the tunnel hub list in multiple
thread.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: remove unused hash list operations

In previous commits the hash list objects have been converted
to new thread safe hash list. The legacy hash list code can be
removed now.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make tunnel offloading table thread safe

To support multi-thread flow insertion, this patch updates tunnel
offloading hash table to use thread safe hash list.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make sample and mirror action thread safe

This commit uses cache list to make sample and mirror action thread
safe.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix sample register error flow

Currently, sample flow need to prepare and register the sub-actions
before sample action is created.

Once the same sample action exists, the sub-actions registered by
the second flow should be released, or these sub-actions will be
leaked. Since the exist sample action only release these same
sub-actions when the sample action itself releases.

When same sample action exists, call the sub-action release function
for the later flow to release the redundant prepared sub-actions.

Fixes: 0756228b2704 ("net/mlx5: update translate function for sample action")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: simplify sample attributes

Currently, the sample action resource already has ft_type to indicate
the action domain attribute, the extra flow attributes parameter can
be optimized.

This commit uses action resource ty_type as domain attribute instead of
the flow attribute.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make push VLAN action cache thread safe

To support multi-thread flow insertion, this patch converts push VLAN
action cache list to thread safe cache list.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make port ID action cache thread safe

To support multi-thread flow insertion, this patch convert port id
action cache list to thread safe cache list.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make matcher list thread safe

To support multi-thread flow insertion, this path converts matcher list
to use thread safe cache list API.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make Rx queue thread safe

This commit applies the cache linked list to Rx queue to make it thread
safe.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: optimize shared RSS list operation

When create shared RSS hrxq, the hrxq will be created directly, no hrxq
will be reused.

In this case, add the shared RSS hrxq to the queue list is redundant.
And it also hurts the generic queue lookup.

This commit avoids add the shared RSS hrxq to the queue list.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: introduce thread safe linked list cache

New API of linked list for cache:
- Optimized for small amount cache list.
- Optimized for read-most list.
- Thread safe.
- Since number of entries are limited, entries allocated by API.
- For dynamic entry size, pass 0 as entry size, then the creation
callback allocate the entry.
- Since number of entries are limited, no need to use indexed pool to
allocate memory. API will remove entry and free with mlx5_free.
- Search API is not supposed to be used in multi-thread.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make header reformat action thread safe

To support multi-thread flow insertion, this patch updates flow header
reformat action list to use thread safe hash list with write-most mode.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make metadata copy flow list thread safe

To support multi-thread flow insertion, this patch updates metadata copy
flow list to use thread safe hash list.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: remove unused mreg copy

After non-cache mode feature was implemented, the flows can only be
created when port started. No need to check if the mreg flows are
created in port stopped status, and apply the mreg flows after port
start will also never happen.

This commit removed the relevant not used mreg copy code.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make flow modify action list thread safe

To support multi-thread flow insertion, this patch updates flow modify
action list to use thread safe hash list with write-most mode.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make flow tag list thread safe

To support multi-thread flow insertion, this patch updates flow tag list
to use thread safe hash list with write-most mode.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix redundant Direct Verbs resources allocate

All table, tag, header modify, header reformat are supported only on DV
mode. For the OFED version doesn't support these, create the related
redundant DV resources waste the memory.

Add the code section in the HAVE_IBV_FLOW_DV_SUPPORT macro to avoid the
redundant resources allocation.

Fixes: 2eb4d0107acc ("net/mlx5: refactor PCI probing on Linux")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make flow table cache thread safe

To support multi-thread flow insertion/removal, this patch uses thread
safe hash list API for flow table cache hash list.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: add flow table tunnel offload attribute

As flow table is shared between the ports in the same shared IB device,
flow table may be created by one port and released by other port.

Currently, the tunnel offloading active check in flow table release is
based on the port which release the flow table. Since the flow table
create port and release port may have different tunnel offloading
configuration, it will cause invalid tunnel offloading release or
tunnel offloading resource leaks.

Add the flow table tunnel offloading attribute to indicate the flow
table has tunnel offloading resource or not to avoid wrong tunnel
offloading operation.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: support concurrent access for hash list

In order to support hash list concurrent access, adding next:
1. List level read/write lock.
2. Entry reference counter.
3. Entry create/match/remove callback.
4. Remove insert/lookup/remove function which are not thread safe.
5. Add register/unregister function to support entry reuse.

For better performance, lookup function uses read lock to
allow concurrent lookup from different thread, all other hash list
modification functions uses write lock which blocks concurrent
modification and lookups from other thread.

The exact objects change will be applied in the next patches.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: create global drop action

This commit creates the global drop action for flows instead of
maintain it in flow insertion time. The uniqueu global drop action
makes it thread safe.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: create global default miss action

This commit creates the global default miss action instead of maintain
it in flow insertion time. This makes the action to be thread safe.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: create global jump action

This commit changes the jump action in table to be created with table
creation in advanced. In this case, the jump action is safe to be used
in multiple thread. The jump action will be destroyed when table is not
used anymore and released.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make VLAN network interface thread safe

This commit protects the VLAN VM workaround area using a spinlock
in multiple-thread flow insertion to make it thread safe.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make meter action thread safe

This commit adds the spinlock for the meter action to make it be thread
safe. Atomic reference counter in all is not enough as the meter action
should be created synchronized with reference counter increment. With
only atomic reference counter, even the counter is increased, the action
may still not be created.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: make flow list thread safe

To support multi-thread flow operations, this patch introduces list lock
for the rte_flow list manages all the rte_flow handlers.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: use indexed pool as id generator

The ID generation API used an integer pool to save released ID, To
support multiple flow, it has to be enhanced to be thread safe.

Indexed pool could be used to generate unique ID by setting size of pool
entry to zero. Since bitmap is used, an extra benefits is saving memory
to about one bit per entry. Further more indexed pool could be thread
safe by enabling lock.

This patch leverages indexed pool to generate ID, removes
unused ID generating API.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: support zero size entry for indexed pool

To make indexed pool to be used as ID generator, this patch allows entry
size to be zero.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: reuse flow id as hairpin id

Hairpin flow matching required a unique flow ID for matching.
This patch reuses flow ID as hairpin flow ID, this will save some code
to generate a separate hairpin ID, also saves flow memory by removing
hairpin ID.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: use thread specific flow workspace

As part of multi-thread flow support, this patch moves flow intermediate
data to thread specific, makes them a flow workspace. The workspace is
allocated per thread, destroyed along with thread life-cycle.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: use thread safe index pool for flow objects

As mlx5 PMD is changed to be thread safe, all the flow-related
sub-objects inside the PMD should be thread safe. This commit
changes the index memory pools' lock configuration to be enabled.
That makes the index pool be thread safe.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx: do not enforce RSS hash offload

Rx RSS hash offload should be controlled by the user
and should not be enforced by RSS multi-queue Rx mode.

Fixes: 8b945a7f7dcb ("drivers/net: update Rx RSS hash offload capabilities")
Cc: stable@dpdk.org
Author: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: add flow sync API

When creating a flow, the rule itself might not take effort
immediately once the function call returns with success. It would
take some time to let the steering synchronize with the hardware.

If the application wants the packet to be sent to hit the flow after
it is created, this flow sync API can be used to clear the steering
HW cache to enforce next packet hits the latest rules.

For TX, usually the NIC TX domain and/or the FDB domain should be
synchronized depends in which domain the flow is created.

The application could also try to synchronize the NIC RX and/or the
FDB domain for the ingress packets.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

common/mlx5: add glue function for domain sync

In rdma-core, the "mlx5dv_dr_domain_sync" function was already
provided. It is used to flush the rule submission queue. The wrapper
function in the glue layer is added for using this.
It only supports DR flows right now the same as domain creating and
destroying functions.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: use C11 atomics for flow tables

The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use the relaxed ordering for RTE flow tables.
Enforce Acquire/Release model for managing DevX pools.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: use C11 atomics for RxQ/TxQ refcounts

The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use the relaxed ordering for RxQ/TxQ refcounts.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

common/mlx5: use C11 atomics for netlink sequence

The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use __atomic_add_fetch instead of
rte_atomic32_add_return to generate a Netlink sequence number.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

common/mlx5: use C11 atomics for memory allocation

The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use the relaxed ordering for mlx5 mallocs.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: fix Tx queue start

The Tx queue stop\start operations update the HW state of the Tx queue
object. The stop API should update the state from ready to reset in
order to stop any queue traffic and the start API should update the
state from reset to ready in order to open the traffic path.

The start API wrongly tried to change the state from ready to ready what
caused a failure in FW on the current state validation.

Replace ready to ready command by reset to ready command in the Tx start
API.

Fixes: 161d103b231c ("net/mlx5: add queue start and stop")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Asaf Penso <asafp@nvidia.com>

net/mlx5: support item type error message in flow Verbs

Update the flow verbs error message to "item type X not supported",
when it is not supported,
instead of a generic error message "item not supported".

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

common/mlx5: add ConnectX-7 and Bluefield-3 device IDs

This adds the ConnectX-7 and Bluefield-3 device ids to the list of
supported Mellanox devices that run the MLX5 PMDs.
The devices is still in development stage.

Signed-off-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: support VLAN matching fields

The fields ``has_vlan`` and ``has_more_vlan`` were added in rte_flow by
patch [1].

Using these fields, the application can match all the VLAN options by
single flow: any, VLAN only and non-VLAN only.

Add the support for the fields.
By the way, add the support for QinQ packets matching.

VLAN\QinQ limitations are listed in the driver document.

[1] https://patches.dpdk.org/patch/80965/

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>

doc: update hairpin support in mlx5 guide

Hairpin between two ports will be supported by mlx5 PMD.

The supported scenarios and limitations are listed in "mlx5.rst".

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: do not split hairpin flow in explicit mode

In the current implementation, the hairpin flow will be split into
two flows implicitly if there is some action that only belongs to the
Tx part. A Tx device flow will be inserted by the mlx5 PMD itself.

In hairpin between two ports, the explicit Tx flow mode will be the
only one to be supported. It is not the appropriate behavior to
insert a Tx flow into another device implicitly. The application
could create any flow as it likes and has full control of the user
flows. Hairpin flows will have no difference from standard flows and
the application can decide how to chain Rx and Tx flows together.

Even in the single port hairpin, this explicit Tx flow mode could
also be supported.

When checking if the hairpin needs to be split, it will just return
if the hairpin queue is with "tx_explicit" attribute. Then in the
following steps for validation and translation, the code path will
be the same as that for standard flows.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: change hairpin ingress flow validation

In the current implementation of the single port hairpin, there is
an implicit splitting process for actions. When inserting a hairpin
flow, all the actions will be included with the ingress attribute.
The flow engine will check and decide which actions should be moved
into the TX flow part, e.g., encapsulation, VLAN push.

In some NICs, some actions can only be done in one direction. Since
the hairpin flow will be split into two parts, such validation will
be skipped.

With the hairpin explicit TX flow mode, no splitting is needed any
more. The hairpin flow may have no big difference from a standard
flow (except the queue). The application should take full charge of
the actions and the flow engine should validate the hairpin flow in
the same way as other flows.

In the meanwhile, a new internal API is added to get the hairpin
configuration. This will bypass the useless atomic operation to save
the CPU cycles.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: add conditional hairpin auto bind

In single port hairpin mode, after the queues are configured during
start up. The binding process will be enabled automatically in the
port start phase and the default control flow for egress will be
created.

When switching to two ports hairpin mode, the auto binding process
should be skipped if there is no TX queue with the peer RX queue on
the same device, and it should be skipped also if the queues are
configured with manual bind attribute.

If the explicit TX flow rule mode is configured or hairpin is
between two ports, the default control flows for TX queues should
not be created.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: support getting hairpin peer ports

In real-life business, one device could be attached and detached
dynamically. The hairpin configuration of this port to/from all the
other ports should be enabled and disabled accordingly.

The RTE ethdev lib and PMD should provide this ability to get the
peer ports list in case that the application doesn't save it. It is
recommended that the size of the array to save the port IDs is as
large as the "RTE_MAX_ETHPORTS" to have the maximal capacity.

The order of the peer port IDs may be different from that during
hairpin queues set in the initialization stage. The peer port ID
could be the same as the current device port ID when the hairpin
peer ports contain itself - the single port hairpin.

The application should check the ports' status and decide if the
peer port should be bound / unbound when starting / stopping the
current device.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: support two ports hairpin mode

In order to support hairpin between two ports, mlx5 PMD needs to
implement the functions and provide them as the function pointers.

The bind and unbind functions are executed per port pairs. All the
hairpin queues between the two ports should have the same attributes
during queues setup. Different configurations among queue pairs from
the same ports are not supported. It is allowed that two ports only
have one direction hairpin.

In order to set up the connection between two queues, peer Rx queue
HW information must be fetched via the internal RTE API and the queue
information could be used to modify the SQ object. Then the RQ object
will be modified with the Tx queue HW information. The reverse
operation is not supported right now.

When disconnecting the queues pair, SQ and RQ object should be reset
without any peer HW information. The unbinding operation will try to
disconnect all Tx queues from the port from the Rx queues of the peer
port.

Tx explicit mode attribute will be saved and used when creating a
hairpin flow.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: change hairpin queue peer checking

In the current implementation of single port mode hairpin, the peer
queue should belong to the same port of the current queue. When the
two ports hairpin mode is introduced, such checking should be removed
to make the hairpin queue setup execute successfully since it is not
an invalid condition, if the Tx port and Rx port are not the same.

In the meanwhile, different devices could have different queue
configurations. The queues number of peer port is unknown to the
current device. The checking should be removed also.

If the Tx and Rx port IDs of a hairpin peer are different, only the
manual binding and explicit Tx flows are supported. Or else, the four
combinations of modes could be supported. The mode attributes
consistency checking will be done when connecting the queue with its
peer queue.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

common/mlx5: fix PCI driver name

In the refactor of mlx5 common layer, the PCI driver name to the RTE
device was changed from "net_mlx5" to "mlx5_pci". The string of name
"mlx5_pci" is used directly in the structure rte_pci_driver.

In the past, a macro "MLX5_DRIVER_NAME" is used instead of any direct
string, and now it is missing. The functions that use
"MLX5_DRIVER_NAME" will get some mismatch, e.g mlx5_eth_find_next.

It needs to use this macro again in all code to make everything get
aligned.

Fixes: 8a41f4deccc3 ("common/mlx5: introduce layer for multiple class drivers")
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/bnxt: fix Rx performance by removing spinlock

The spinlock was trying to protect scenarios where rx_queue stop/start
could be initiated dynamically. Assigning bnxt_dummy_recv_pkts and
bnxt_dummy_xmit_pkts immediately to avoid concurrent access of mbuf in Rx
and cleanup path should help achieve the same result.

Fixes: 14255b351537 ("net/bnxt: fix queue start/stop operations")
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Rahul Gupta <rahul.gupta@broadcom.com>

net/bnxt: set thread safe flow ops flag

PMD supports thread-safe flow operations. Set the
RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE dev_flag to indicate this info
to the application. rte_flow API functions can avoid using its
own mutex for safe multi-thread flow handling.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix resetting mbuf data offset

Reset mbuf->data_off before handing the Rx packet to the application.
We were not doing this in the TPA path. It can cause applications
using this field for post processing to work incorrectly.

Fixes: 0958d8b6435d ("net/bnxt: support LRO")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>

net/bnxt: increase size of Rx CQ

LRO aka TPA and jumbo frame support uses aggregation ring for placing
Rx buffers. These features can generate multiple Rx completions for a
single Rx packet. Increase size of Rx Completion Queue to handle TPA
and aggregation ring events.

Fixes: daef48efe5e5 ("net/bnxt: support set MTU")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Qingmin Liu <qingmin.liu@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>

net/bnxt: support VXLAN decap offload

VXLAN decap offload can happen in stages. The offload request may
not come as a single flow request rather may come as two flow offload
requests F1 & F2. This patch is adding support for this two stage
offload design. The match criteria for F1 is O_DMAC, O_SMAC,
O_DST_IP, O_UDP_DPORT and actions are COUNT, MARK, JUMP. The match
criteria for F2 is O_SRC_IP, O_DST_IP, VNI and inner header fields.
F1 and F2 flow offload requests can come in any order. If F2 flow
offload request comes first then F2 can’t be offloaded as there is
no O_DMAC information in F2. In this case, F2 will be deferred until
F1 flow offload request arrives. When F1 flow offload request is
received it will have O_DMAC information. Using F1’s O_DMAC, driver
creates an L2 context entry in the hardware as part of offloading F1.
F2 will now use F1’s O_DMAC to get the L2 context id associated with
this O_DMAC and other flow fields that are cached already at the time
of deferring F2 for offloading. F2s that arrive after F1 is offloaded
will be directly programmed and not cached.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: add VXLAN decap templates

Templates for outer tunnel & inner tunnel flow are added in this patch.
This will be used by subsequent patches to implement support for
VXLAN decap rte_flow offload.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Mike Baucom <michael.baucom@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: refactor flow id allocation

Currently, the flow id is allocated inside ulp_mapper_flow_create.
However with vxlan decap feature if F2 flow comes before F1 flow
then F2 is cached and not really installed in the hardware which
means the code will return without calling ulp_mapper_flow_create.
But, ULP has to still return valid flow id to the stack.
Hence, move the flow id allocation outside ulp_mapper_flow_create.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: add mapper support for wildcard TCAM

Added support for the key and mask fields encoding for the
wildcard TCAM entry. Also add internal function to post process
the key/mask blobs for wildcard TCAM table. The size of the
wildcard TCAM slice is 80 bytes.

Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Reviewed-by: Mike Baucom <michael.baucom@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: modify HWRM command to create reps

Use cfa pair alloc for configuring reps.
Instead of cfa_vfr_alloc for Wh+ and cfa_pair_alloc for Stingray,
converge to cfa_pair_alloc/free for both devices. Set the command
request structure bits accordingly.
As part of this, remove the old cfa_vfr_alloc cmd definitions as FW
has deprecated support for those commands.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Shahaji Bhosle <sbhosle@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: add hierarchical flow counters

Add support for hierarchical flow counter accumulation.
In case of hierarchical flows, involving parent and child flows,
the child flow counters are aggregated to get the parent flow counter
information. This should help in cases where one ore more flows
is related to a previously offloaded flow.

Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Reviewed-by: Shahaji Bhosle <sbhosle@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix flow query count

Fix infinite loop in flow query count.
`nxt_resource_idx` could be zero in some cases which is invalid and
should be part of the while loop condition. Also synchronize access to
the flow db using the fdb_lock

Fixes: 306c2d28e247 ("net/bnxt: support count action in flow query")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: update ULP resource counts

Update ULP resource counts for Stingray device.
- FW needs some resources for normal operation. Account those
in the resource manager.
- Update the SR ULP requested resource counts to reflect
those available after AFM resources are accounted for.
- Add build option to select either 2 or 4 slot EM entries.
The default is 4 slot entries.

Signed-off-by: Peter Spreadborough <peter.spreadborough@broadcom.com>
Signed-off-by: Farah Smith <farah.smith@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: add table scope to PF mapping

Add table scope to PF Mapping for SR and Wh+ devices.
Legacy devices require PF set of base addresses for EEM operation.
A table scope id is a logical construct and is mapped to the PF
associated with the communications channel used.
In the case of a VF, the parent PF is used.

Signed-off-by: Farah Smith <farah.smith@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: support two table scopes

Adding support for two table scopes. One for Exact Match tables
and other for External Exact Match tables.
New API to map a PARIF to an EEM table scope (set of Rx and Tx EEM
base addresses). It uses HWRM_TF_GLOBAL_CFG_SET HWRM to configure.
PARIF is handler to a partition of the physical port.
Adjustments to tf_global_cfg_set() to reduce overhead and nominal
name clarification.

Signed-off-by: Jay Ding <jay.ding@broadcom.com>
Signed-off-by: Farah Smith <farah.smith@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: add Stingray support to core layer

- Moved P4 chip specific code under the P4 directory
- Added P45 skeleton code for SR to build on
- Add SR support in TRUFLOW core layer.
The TRUFLOW core or the tf-core is a shim layer which communicates with
the CFA block in the hardware.

Signed-off-by: Peter Spreadborough <peter.spreadborough@broadcom.com>
Signed-off-by: Jay Ding <jay.ding@broadcom.com>
Reviewed-by: Farah Smith <farah.smith@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>