git.droids-corp.org - dpdk.git/log

common/mlx5: check send on time capability

The patch provides check for send scheduling on time hardware capability.
With this capability enabled hardware is able to handle Wait WQEs
with directly specified timestamp values. No Clock Queue is needed
anymore to handle send scheduling.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

app/testpmd: add async indirect actions operations

Add testpmd support for the rte_flow_async_action_handle API.
Provide the command line interface for operations dequeue.
Usage example:
  flow queue 0 indirect_action 0 create action_id 9
    ingress postpone yes action rss / end
  flow queue 0 indirect_action 0 update action_id 9
    action queue index 0 / end
flow queue 0 indirect_action 0 destroy action_id 9

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>

app/testpmd: add flow queue pull operation

Add testpmd support for the rte_flow_pull API.
Provide the command line interface for pulling operations results.
Usage example: flow pull 0 queue 0

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>

app/testpmd: add flow queue push operation

Add testpmd support for the rte_flow_push API.
Provide the command line interface for pushing operations.
Usage example: flow queue 0 push 0

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>

app/testpmd: add async flow create/destroy operations

Add testpmd support for the rte_flow_q_create/rte_flow_q_destroy API.
Provide the command line interface for enqueueing flow
creation/destruction operations. Usage example:
  testpmd> flow queue 0 create 0 postpone no
           template_table 6 pattern_template 0 actions_template 0
           pattern eth dst is 00:16:3e:31:15:c3 / end actions drop / end
  testpmd> flow queue 0 destroy 0 postpone yes rule 0

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>

app/testpmd: add flow table management

Add testpmd support for the rte_flow_table API.
Provide the command line interface for the flow
table creation/destruction. Usage example:
  testpmd> flow template_table 0 create table_id 6
    group 9 priority 4 ingress mode 1
    rules_number 64 pattern_template 2 actions_template 4
  testpmd> flow template_table 0 destroy table 6

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>

app/testpmd: add flow template management

Add testpmd support for the rte_flow_pattern_template and
rte_flow_actions_template APIs. Provide the command line interface
for the template creation/destruction. Usage example:
  testpmd> flow pattern_template 0 create pattern_template_id 2
           template eth dst is 00:16:3e:31:15:c3 / end
  testpmd> flow actions_template 0 create actions_template_id 4
           template drop / end mask drop / end
  testpmd> flow actions_template 0 destroy actions_template 4
  testpmd> flow pattern_template 0 destroy pattern_template 2

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>

app/testpmd: add flow engine configuration

Add testpmd support for the rte_flow_configure API.
Provide the command line interface for the Flow management.
Usage example: flow configure 0 queues_number 8 queues_size 256

Implement rte_flow_info_get API to get available resources:
Usage example: flow info 0

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>

ethdev: bring in async indirect actions operations

Queue-based flow rules management mechanism is suitable
not only for flow rules creation/destruction, but also
for speeding up other types of Flow API management.
Indirect action object operations may be executed
asynchronously as well. Provide async versions for all
indirect action operations, namely:
rte_flow_async_action_handle_create,
rte_flow_async_action_handle_destroy and
rte_flow_async_action_handle_update.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

ethdev: bring in async queue-based flow rules operations

A new, faster, queue-based flow rules management mechanism is needed for
applications offloading rules inside the datapath. This asynchronous
and lockless mechanism frees the CPU for further packet processing and
reduces the performance impact of the flow rules creation/destruction
on the datapath. Note that queues are not thread-safe and the queue
should be accessed from the same thread for all queue operations.
It is the responsibility of the app to sync the queue functions in case
of multi-threaded access to the same queue.

The rte_flow_async_create() function enqueues a flow creation to the
requested queue. It benefits from already configured resources and sets
unique values on top of item and action templates. A flow rule is enqueued
on the specified flow queue and offloaded asynchronously to the hardware.
The function returns immediately to spare CPU for further packet
processing. The application must invoke the rte_flow_pull() function
to complete the flow rule operation offloading, to clear the queue, and to
receive the operation status. The rte_flow_async_destroy() function
enqueues a flow destruction to the requested queue.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

ethdev: add flow item/action templates

Treating every single flow rule as a completely independent and separate
entity negatively impacts the flow rules insertion rate. Oftentimes in an
application, many flow rules share a common structure (the same item mask
and/or action list) so they can be grouped and classified together.
This knowledge may be used as a source of optimization by a PMD/HW.

The pattern template defines common matching fields (the item mask) without
values. The actions template holds a list of action types that will be used
together in the same rule. The specific values for items and actions will
be given only during the rule creation.

A table combines pattern and actions templates along with shared flow rule
attributes (group ID, priority and traffic direction). This way a PMD/HW
can prepare all the resources needed for efficient flow rules creation in
the datapath. To avoid any hiccups due to memory reallocation, the maximum
number of flow rules is defined at the table creation time.

The flow rule creation is done by selecting a table, a pattern template
and an actions template (which are bound to the table), and setting unique
values for the items and actions.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

ethdev: introduce flow engine configuration

The flow rules creation/destruction at a large scale incurs a performance
penalty and may negatively impact the packet processing when used
as part of the datapath logic. This is mainly because software/hardware
resources are allocated and prepared during the flow rule creation.

In order to optimize the insertion rate, PMD may use some hints provided
by the application at the initialization phase. The rte_flow_configure()
function allows to pre-allocate all the needed resources beforehand.
These resources can be used at a later stage without costly allocations.
Every PMD may use only the subset of hints and ignore unused ones or
fail in case the requested configuration is not supported.

The rte_flow_info_get() is available to retrieve the information about
supported pre-configurable resources. Both these functions must be called
before any other usage of the flow API engine.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

net/cnxk: fix build with GCC 12

Resolve following compilation error with gcc 12 version.
error: storing the address of local variable message in *error.message

Fixes: 26b034f78ca7 ("net/cnxk: support to validate meter policy")
Cc: stable@dpdk.org
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: add option to override outbound inline SA IV

Add option to override outbound inline SA IV for debug
purposes via environment variable. User can set env variable as:
export CN10K_ETH_SEC_IV_OVR="0x0, 0x0,..."

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: add devargs for min-max SPI

Add support for inline inbound SPI range via devargs
instead of just max SPI value and range being 0..max.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: enable flow control by default on device configure

Enable flow control by default on device configuration
instead of basing it on Kernel behaviour.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: enable packet pool tail drop

Enable packet pool tail drop on RQ when inbound security is not
enabled. This is only part of the configuration. It is a NOP if
tail drop is not enabled on NPA_AURA_CTX_S. And tail drop
on packet pool AURA is enabled only when that packet pool aura
is used by inline device RQ.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: use NPA batch burst free for meta buffers

Currently meta buffers are freed in bursts of one LMT line
i.e 15 pointers. Instead free them in bursts of 16 LMTlines
which is 240 ptrs for better perf.

Also mark mempool objects as get and put in missing places.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: fix inline IPsec security error handling

Use raw mbuf free on inline security error to simulate
HW NPA free instead of doing rte_pktmbuf_free(). This
is needed as the callback will not be called from
DPDK lcore.

Fixes: 69daa9e5022b ("net/cnxk: support inline security setup for cn10k")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: realloc inline dev XAQ for security

Realloc inline dev XAQ when Rx/Tx security ie enabled with
new packet pool as XAQ should be large enough to hold all
mbufs if inline outbound reports error or all mbufs.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: register callback early to handle initial packets

Register callback early to handle initial error packets from
inline device.

Fixes: 69daa9e5022b ("net/cnxk: support inline security setup for cn10k")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: fix inline device RQ tag mask

Fix inline device RQ tagmask to get packets with receive errors
as type ETHDEV packets to callback handler so that packet buffers
can get freed. Currently only IPsec denied packets get the right
tag mask.

Fixes: ee48f711f3b0 ("common/cnxk: support NIX inline inbound and outbound setup")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: remove tracking of mark actions

Removed roc NPC APIs which tracks addition and deletion of
mark actions. It was earlier needed to track number of mark
actions added as part of flow rules. If mark actions count
is > 0, then the function pointer for Rx would get updated
to even read mark value from CQE/WQE and populate in mbuf.
Now the same switch is done based on new Rx meta data negotiate
ethdev API.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: add Rx metadata negotiate operation

Added rx_metadata_negotiate API to enable mark update RX offload.
Removed software logic to enable/disable mark update inside flow
create/destroy APIs.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: allow force use of SSO device for outb inline

Allow force use of SSO device even when inline dev is available
so that in case driver needs events directly delivered to
event device.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: use SSO time counter threshold for IRQ

Enable time counter based threshold for raising SSO
EXE_INT instead of IAQ threshold. Time counter based
threshold helps getting periodic interrupts and process
pkts in burst instead of getting HW to raise an interrupt
for every new work.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: support enabling AURA tail drop for RQ

Add support to enable AURA tail drop via RQ specifically
for inline device RQ's pkt pool. This is better than RQ
RED drop as it can be applied to all RQ's that are not
having security enabled but using same packet pool.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: enable L3 header write back in SA

Enable the field in SA to write back L2, L3 headers in case
of errors during inline processing.

Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: use common SA init API for default options

Use common SA init API before doing initialization based on
params. This is better so that all HW specific default values
are at single place for lookaside and inline.

Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: support inline device API without ROC NIX

Update the inline device functions to work when roc_nix is NULL.
This is required, as IPsec driver have to use these APIs to work
with inline IPsec device, but the IPsec driver might not have roc_nix
information.

Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: adjust shaper rates to lower boundaries

Provide a method to get floor values for a requested shaper rate,
which can assure packets should never be transmitted at a rate higher
than configured.

Keep the old API to get HW suggested values.
And introduce new parameter to select appropriate API.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: realloc inline device XAQ AURA

Add support to realloc inline device XAQ AURA with more
buffers of new packet pool AURA.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: increase SMQ resource count

CN10K supports up to 832 resources at SMQ level, so increase
bitmap count to 1024.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/cnxk: support priority flow control

Adds support for priority flow control support for CNXK
platforms.

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: support priority flow control

CNXK platforms support priority flow control(802.1qbb) to pause
respective traffic per class on that link.

Adds RoC interface to configure priority flow control on MAC
block i.e. CGX on cn9k and RPM on cn10k.

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

mempool/cnxk: fix batch allocation failure path

Fix bug in batch alloc issue failure path where it was
enqueuing invalid pointers back to the pool. The code
should rightly be falling back to default dequeue path
in such cases.

Fixes: 91531e63f43b ("mempool/cnxk: add cn10k batch dequeue")
Cc: stable@dpdk.org
Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/mlx5: update doorbell mapping parameter name

The "tx_db_nc" devarg forces doorbell register mapping to non-cached
region eliminating the extra write memory barrier. This argument was
used in creating the UAR for Tx and thus affected its performance.

Recently [1] its use has been extended to all UAR creation in all mlx5
drivers, and now its name is no longer so accurate.

This patch changes its name to "sq_db_nc" to suit any send queue that
uses it. The old name will still work for backward compatibility.

[1] commit 5dfa003db53f ("common/mlx5: fix post doorbell barrier")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

doc: add shared guide for mlx5 drivers

Adds new documentation for MLX5 common driver that contains:
- Its features list (doesn't exist for now).
- Its devargs description.
- Device configuration information and tutorial.
- Quick Start Guide for Mellanox OFED/EN.

Move into this doc all shared information from other MLX5 PMD docs and
add them reference to new common doc.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

doc: correct name of BlueField-2 in mlx5 guide

Update "BlueField 2" -> "BlueField-2" in mlx5 docs.

Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

doc: replace broken links in mlx guides

Update links in both mlx4 and mlx5 doc.

Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

doc: remove obsolete vector Tx explanations from mlx5 guide

Vectorized routines were removed in result of Tx datapath refactoring,
and devarg keys documentation was updated.

However, more updating should have been done. In environment variables
doc, there was explanation according to vectorized Tx which isn't
relevant anymore.

This patch removes this irrelevant explanation.

Fixes: a6bd4911ad93 ("net/mlx5: remove Tx implementation")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: fix E-Switch manager vport ID

One of the E-Switch vports plays the special role - it is assigned as
"E-Switch manager" and has some special exclusive rights and duties - it
maintains all the representors, manages FDB domain flows, etc. By
default, the E-Switch vport index was supposed to be zero on standalone
NICs (regular ConnectX) and 0xFFFE SmartNIC (BlueField), but that was
not always correct - this index can be assigned with any value by
kernel/hypervisor.

Currently the E-Switch manager vport id is supposed to be default - 0
for standalone NICs, and 0xFFFE for the SmartNICs, and is deduced from
the device PCI id.

To handle this and do not suggest any default values, can use DevX API
to query E-Switch manager vport ID directly from the firmware during
initialization, and use that value by default. If the new method is not
provided (legacy firmware), fallback to use the PCI id approach.

Fixes: a564038699f9 ("net/mlx5: support E-Switch manager egress traffic match")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: optimize Rx queue creation

Recently shared RxQ has been introduced. All shared Rx queues with same
group and queue ID share the same rxq_ctrl, but each one has
mlx5_rxq_priv structure.
The mlx5_rx_queue_setup generates a new rxq_priv structure, and looks
for a rxq_ctrl structure to refer to. If there is already a compatible
rxq_ctrl structure it refers it, otherwise it calls the mlx5_rxq_new
function that generates a new one.

This patch makes mlx5_rxq_new function "standalone", it generates a
rxq_ctrl structure regardless to specific rxq_priv structure. All
operations on the rxq_ctrl structure that depend on the new rxq_priv
structure are performed in the mlx5_rx_queue_setup function, at the same
place for either a new rxq_ctrl structure or an existing rxq_ctrl
structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: fix entry in shared Rx queues list

The mlx5_rxq_new function creates control structure and if it from
shared group, it is inserted into the shared RXQs list.

After that, there are some validations which in case they fail, RxQ
control object is released.
In these cases, invalid pointer to the object still in the list, and
access it may cause a crash.

Move the list insertion to the end of the function where the RxQ control
object is surely valid.

Fixes: 09c2555303be ("net/mlx5: support shared Rx queue")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: refactor getting counter action pointer

Previously API flow_dv_query_count_ptr is defined to get counter's
action pointer. This DV function is directly called and the better
way is by the callback.

Add one arg in API mlx5_counter_query and the related callback
counter_query. The added arg is for counter's action pointer.

Signed-off-by: Haifei Luo <haifeil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: fix meter sub-policy creation

If meter policy action was RSS, the correct items were not provided
for sub-policy creation.

This fixes the issue by providing original items in meter split, so
the sub-policy creation gets the correct items.

Fixes: 3c481324baf3 ("net/mlx5: fix meter flow direction check")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: remove unused function

The mlx5_l3t_prepare_entry() function is not used anymore.
This commit removes the unused mlx5_l3t_prepare_entry() function.

Fixes: 92ef4b8f1688 ("ethdev: remove deprecated shared counter attribute")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: set flow error for hash list create

While mlx5_hlist_create() failed, the rte_flow_error was not filled
with the corresponding error information.

This commit adds the missing rte_flow_error_set() for the failure case.

Fixes: f3020a331dca ("net/mlx5: optimize hash list table allocate on demand")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

common/mlx5: fix queue pair ack timeout configuration

VDPA driver creates two QPs(1 queue pair include 1 send queue
and 1 receive queue) per virtio queue to get traffic events
from NIC to SW.
Two QPs(called FW QP and SW QP) are created as loopback QP
and FW QP'SQ is connected to SW QP'RQ internally.

When packet receive or send out, HW will send WQE by FW QP'SQ,
then SW will get CQE from the CQ of SW QP.

With large scale and heavy traffic, the SQ's request may fail
to get ACK from RQ HW, because HW is busy.
SQ will retry the request with qpc.retry_count times and each time
wait for 4.096 uS *2^(ack_timeout) for the response. If still can’t
get RQ’s HW response, SQ will go to an error state.

16 is experienced value. It should not be too high or too low.
Too high will make QP waits too long in case it’s packet drop.
Too low will cause QP to go to an error state(retry-exceeded) easily.

Fixes: 15c3807e86ab ("common/mlx5: support DevX QP operations")
Cc: stable@dpdk.org
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/ena: update version to 2.6.0

This release contains multiple bug fixes and improvements, including
  - Removal of the linearization function from the xmit Tx path. The
    DPDK assumes checking for the mbuf segments number in the Tx prepare
    function.
  - Extra logs, statistics, checks...
  - Cleanup of the unused variables and definitions.
  - Configurable Link Status event.
  - Improvements for the timer service and the reset.
  - Usage of the optimized memcpy on ARM.
  - MP awareness improvements - extra API support for the secondary
    processes (like reading basic statistics).
  - Support of the xstats API to get xstat names by ID.
  - Configurable Tx completions timeout.
  - Proper setting of the meta-descriptor's DF flag.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>

net/ena: fix checksum flag for L4

Some HW may invalidly set checksum error bit for the valid L4 checksum.
To avoid drop of the packets in that situation, do not indicate bad
checksum for L4 Rx csum offloads. Instead, set it as unknown, so the
application will re-verify this value.

The statistics counters will still work as previously.

Fixes: 05817057faba ("net/ena: fix indication of bad L4 Rx checksums")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>

net/ena: check memory BAR before initializing LLQ

The ena_com_config_dev_mode() performs many calculations related to LLQ
and then performs an admin queue call to configure LLQ in the device.

All of the operations performed by ena_com_config_dev_mode() are
unnecessary if membar hasn't been found. Move the dev_mem_base check
before ena_com_config_dev_mode() call. This prevents the unnecessary
operations from being performed.

Fixes: 2fca2a98c0d1 ("net/ena: support LLQv2")
Cc: stable@dpdk.org
Signed-off-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: extend logs for invalid request ID resets

Add information about port id, queue id and req_id to error logs in
validate_tx_req_id.

Signed-off-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: fix meta descriptor DF flag setup

Whenever Tx checksum offload is being used, the meta descriptor content
is taken into consideration. Setting DF field properly in the meta
descriptor may have huge impact on the performance both for the IPv4 and
IPv6 packets.

The requirements for the df field are as below:
* No offload used - value doesn't matter
* IPv4 - 0 or 1, depending on the DF flag in the IPv4 header
* IPv6 - 1

Setting DF to 0 causes the packet to enter the slow-path in the HW and
as a result can noticeable impact the performance.

Moreover, as 'true' may not always be mapped to 1 depending on it's
definition for the given platform/compiler, for safety DF field is being
set explicitly to 1.

Fixes: 1173fca25af9 ("ena: add polling-mode driver")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: make Tx completion timeout configurable

The default missing Tx completion timeout was set to 5 seconds.
In order to provide users with the interface to control this timeout
to adjust it with the application's watchdog, the device argument for
controlling this value was added.

The parameter is called 'miss_txc_to' and can be modified using the
devargs interface:

./app -a <bdf>,miss_txc_to=UINT_NUMBER

This parameter accepts values from 0 to 60 and indicates number of
seconds after which the Tx packet will be considered as missing.

HW hints for the Tx completions timeout were removed to do not overwrite
parameter from the user. Also specifying default Tx completion timeout
value was moved from the configuration to init phase in order to
simplify default value assignment.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: fix reset reason being overwritten

When triggering the reset, no check was performed to see if the reset
was already triggered. This could result in original reset reason being
overwritten. Add ena_trigger_reset helper function, which checks if the
reset was triggered and only sets the reset reason if the reset wasn't
triggered yet. Replace all occurrences of manually setting the reset
with ena_trigger_reset call.

Fixes: 2081d5e2e92d ("net/ena: add reset routine")
Cc: stable@dpdk.org
Signed-off-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: support xstat names by ID

ENA was only supporting retrieval of all the xstats name and wasn't
implementing the eth_xstats_get_names_by_id API.

As this API may be more efficient than retrieving all the names, it
tries to avoid excessive string copying.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: support Tx mbuf free on demand

ENA driver did not allow applications to call tx_cleanup. Freeing Tx
mbufs was always done by the driver and it was not possible to manually
request the driver to free mbufs.

Modify ena_tx_cleanup function to accept maximum number of packets to
free and return number of packets that was freed.

Signed-off-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena/base: make IO memzone unique per port

Originally, the ena_com memzone counter was shared by ports, which
caused the memzones to be harder to identify and could potentially
lead to race and because of that the counter had to be atomic.

This atomic counter was global variable and it couldn't work in the
multiprocess implementation.

The memzone is now being identified by the local to port memzone counter
and the port ID - both of those information can be found in the shared
data, so it can be probed easily.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: enable stats for multi-process mode

Since statistic gathering is now proxied safely to primary process, it
can be enabled in secondary processes.

Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: proxy AQ calls to primary process

Due to how the ena_com compatibility layer is written, all AQ commands
triggering functions use stack to save results of AQ and then copy them
to user given function.
Therefore to keep the compatibility layer common, introduce ENA_PROXY
macro. It either calls the wrapped function directly (in primary
process) or proxies it to the primary via DPDK IPC mechanism. Since all
proxied calls are taken under a lock share the result data through
shared memory (in struct ena_adapter) to work around 256B IPC parameter
size limit.

New proxy calls can be added by
1. Adding a new message type at the end of enum ena_mp_req
2. Adding new message arguments to the struct ena_mp_body if needed
3. Defining proxy request descriptor with ENA_PROXY_DESC. Its arguments
include handlers for request preparation and response processing.
Any of those may be empty (aside of marking arguments as used).
4. Adding request handling logic to ena_mp_primary_handle()
5. Replacing proxied function calls with ENA_PROXY(adapter, <func>, ...)

Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena/base: use optimized memcpy version also on Arm

As the default behavior for arm64 is to alias rte_memcpy as memcpy, ENA
cannot redefine memcpy as rte_memcpy as it would cause nested
declaration.

To make it possible to use optimized memcpy in the ena_com layer on Arm,
the driver now redefines memcpy when it is beneficial:
* For arm64 only when the flag RTE_ARCH_ARM64_MEMCPY was defined
* For arm only when the flag RTE_ARCH_ARM_NEON_MEMCPY was defined

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: perform Tx cleanup before sending packets

To increase likelihood that current burst will fit in the HW rings,
perform Tx cleanup before pushing packets to the HW. It may increase
latency a bit for sparse bursts, but the Tx flow now should be more
smooth.

It's also common order in the Tx burst function for other PMDs.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: skip timer if reset is triggered

Some user applications may not support PMD reset handling. If they will
support timer service it could cause a situation, when information
about the reset trigger is being showed every time the timer service is
being called.

Timer service is now being skipped if the reset was already triggered.

Fixes: d9b8b106bf9d ("net/ena: add watchdog and keep alive AENQ handler")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: make link status change interrupt configurable

ENA uses AENQ for notification about various events, like LSC, keep
alive etc. By default it was enabling all AENQ that were supported by
both the driver and the device. As a result the LSC was always processed
even if the application turned it off explicitly.

As the DPDK provides application with the possibility to configure the
LSC, ENA should respect that. AENQ groups are now being updated upon
configure step, thus LSC can be activated or disabled between ENA PMD
reconfigurations. Moreover, the LSC capability for the device is being
determined dynamically.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: add extra Rx checksum related xstats

* Split 'bad_csum' Rx statistic into 'l3_csum_bad' and 'l4_csum_bad' to
be able to check which checksum was not calculated properly.
* Add l4_csum_good statistic, which shows how many times L4 Rx checksum
was properly offloaded.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: remove unused offload variables

Those variables are being set, but never read. As they seem to be
leftover from the old offloads API and don't have any purpose right
now, they are simply being removed.

Fixes: a4996bd89c42 ("ethdev: new Rx/Tx offloads API")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Artur Rojek <ar@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: remove unused enumeration

The enumeration seems to be leftover from porting the Linux driver to
the DPDK. It was used nowhere and refers to the ethtool which is not
present in the DPDK.

Fixes: 372c1af5ed8f ("net/ena: add dedicated memory area for extra device info")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Artur Rojek <ar@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: assert on outstanding mbuf in Tx

To make sure there is no outstanding mbuf in the reused Tx queue (due to
improper cleanup, or some invalid logic on Tx path), the assertion was
added on the Tx path.

As it's being compiled out in the release version, it won't affect
the IO path performance.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/ena: remove Tx mbuf linearization

The linearization of the mbuf isn't common practice for the PMD, as it
can expose it's capabilities to the upper layer using
rte_eth_dev_info_get().

Moreover, the rte_eth_tx_prepare() function should also verify if the
number of segments inside the mbuf isn't too high.

Because of those 2 circumstances, it may be safer to avoid modifying
mbuf on PMD's Tx side and remove linearization at all. Instead, add
verification of the number of segments to the eth_ena_prep_pkts().

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Artur Rojek <ar@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>

net/txgbe: fix debug logs

Remove 'DEBUGFUNC' due to too many invalid debug log prints, unify the
DEBUG level macros.

Fixes: 7dc117068a7c ("net/txgbe: support probe and remove")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: fix debug logs

Remove 'DEBUGFUNC' due to too many invalid debug log prints, unify the
DEBUG level macros.

Fixes: cc934df178ab ("net/ngbe: add log and error types")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

doc: add AF_XDP queue setup information

When an AF_XDP PMD is created without specifying the 'start_queue', the
default Rx queue associated with the socket will be Rx queue 0. A common
scenario encountered by users new to AF_XDP is that they create the
socket on queue 0 however their interface is configured with many more
queues. In this case, traffic might land on for example queue 18 which
means it will never reach the socket.

This commit updates the AF_XDP documentation with instructions on how to
configure the interface to ensure the traffic will land on queue 0 and
thus reach the socket successfully.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

app/testpmd: fix GENEVE parsing in checksum mode

The csum FWD mode parses any received packet to set mbuf offloads for
the transmitting burst, mainly in the checksum/TSO areas.
In the case of a tunnel header, the csum FWD tries to detect known
tunnels by the standard definition using the header's data and fallback
to check the packet type in the mbuf to see if the Rx port driver
already sign the packet as a tunnel.
In the fallback case, the csum assumes the tunnel is VXLAN and parses
the tunnel as VXLAN.
When the GENEVE tunnel was added to the known tunnels in csum, its
parsing trial was wrongly located after the pkt type detection, causing
the csum to parse the GENEVE header as VXLAN when the Rx port set the
tunnel packet type.

Remove the fall back case to VXLAN.
Log error of unrecognized tunnel if no tunnel was parsed successfully.

Fixes: c10a026c3b03 ("app/testpmd: introduce vxlan parsing function in csum fwd engine")
Cc: stable@dpdk.org
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Aman Singh <aman.deep.singh@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

common/mlx5: refactor devargs management

Improve the devargs handling in two aspects:
- Parse the devargs string only once.
- Return error and report for unknown keys.

The common driver parses once the devargs string into a dictionary, then
provides it to all the drivers' probe. Each driver updates within it
which keys it has used, then common driver receives the updated
dictionary and reports about unknown devargs.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

common/mlx5: check common devargs in probing again

MLX5 common driver supports probing again in two scenarios:
- Add new driver under existing device. common probe function gets
   it in devargs, then calls the requested driver's probe function
   (regardless of the driver's own support in probing again) with the
   existing device as parameter.
- Transfer the probing again support of the drivers themselves
   (currently only net). In this scenario, the existing device is sent
   as a parameter to the existing driver's probe too.

In both cases it gets a new set of arguments that do not necessarily
match the configured arguments in the existing device.
Some of the arguments belong to the configuration of the existing
device, so they can't be updated in the probing again. On the other
hand, there are arguments that belong to a specific driver or specific
port and might get a new value in the probing again.
The user might generate any argument he wants in probing again, but when
he generates arguments belonging to the common device configuration, it
does not affect.

This patch adds an explicit check for the devargs belonging to the
common device configuration. If there is no match to the existing
configuration, it returns an error.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: separate per port configuration

Add configuration structure for port (ethdev). This structure contains
all configurations coming from devargs which oriented to port. It is a
field of mlx5_priv structure, and is updated in spawn function for each
port.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: refactor to detect operation by DevX

Add inline function indicating whether HW objects operations can be
created by DevX. It makes the code more readable.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: add shared device context config structure

Add configuration structure for shared device context. This structure
contains all configurations coming from devargs which oriented to
device. It is a field of shared device context (SH) structure, and is
updated once in mlx5_alloc_shared_dev_ctx() function.
This structure cannot be changed when probing again, so add function to
prevent it. The mlx5_probe_again_args_validate() function creates a
temporary IB context configure structure according to new devargs
attached in probing again, then checks the match between the temporary
structure and the existing IB context configure structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: concentrate all device configurations

Move all device configure to be performed by mlx5_os_cap_config()
function instead of the spawn function.
In addition move all relevant fields from mlx5_dev_config structure to
mlx5_dev_cap.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: rearrange device attribute structure

Rearrange the mlx5_os_get_dev_attr() function in such a way that it
first executes the queries and only then updates the fields.
In addition, it changed its name in preparation for expanding its
operations to configure the capabilities inside it.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: add E-Switch mode flag

This patch adds in SH structure a flag which indicates whether is
E-Switch mode.
When configure "dv_esw_en" from devargs, it is enabled only when is
E-switch mode. So, since dv_esw_en has been configure, it is enough to
check if "dv_esw_en" is valid.
This patch also removes E-Switch mode check when "dv_esw_en" is checked
too.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: share counter config function

The mlx5_flow_counter_mode_config function exists for both Linux and
Windows with the same name and content.
This patch moves its implementation to the folder shared between the
operating systems, removing the duplication.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: share realtime timestamp configure

The realtime timestamp configure work for Linux as same as Windows.
This patch removes it to the function implemented in the folder shared
between the operating systems, removing the duplication.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

common/mlx5: share VF check function

The check if device is VF work for Linux as same as Windows.
This patch removes it to the function implemented in the folder shared
between the operating systems, removing the duplication.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: remove Verbs query device duplication

The sharing device context structure has a field named "device_attr"
which is filled by mlx5_os_get_dev_attr() function.
The spawn function calls mlx5_os_get_dev_attr() again and save it to
local variable identical to "device_attr" field.

There is no need for this duplication, because there is a reference to
the sharing device context structure from spawn function.

This patch removes the local "device_attr" from spawn function, and uses
the context's field instead.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: remove DevX flag duplication

The sharing device context structure has a field named "devx" which
indicates if DevX is supported.
The common configure structure has also field named "devx" with the same
meaning.

There is no need for this duplication, because there is a reference to
the common structure from within the sharing device context structure.

This patch removes it from sharing device context structure and uses the
common config structure instead.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: remove HCA attribute structure duplication

The HCA attribute structure is field of net configure structure.
It is also field of common configure structure.

There is no need for this duplication, because there is a reference to
the common structure from within the net structures.

This patch removes it from net configure structure and uses the common
config structure instead.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: remove redundant check of devargs

The device arguments are parsed and updated twice during spawning. First
time before creating the share device context, and again later after
updating a default value to one of the arguments.

This patch consolidates them into one parsing and updates the default
values before it.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: remove declaration duplications

In mlx5_ethdev.c file are implemented those 4 functions:
- mlx5_dev_infos_get
- mlx5_fw_version_get
- mlx5_dev_set_mtu
- mlx5_hairpin_cap_get

In mlx5.h file they are declared twice. First time under mlx5.c file and
second time under mlx5_ethdev.c file.

This patch removes the redundant declaration.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix errno update in shared context creation

The mlx5_alloc_shared_dev_ctx() function has a local variable named
"err" which contains the errno value in case of failure.

When functions called by this function are failed, this variable is
updated with their return value (that should be a positive errno value).
However, some functions doesn't update errno value by themselves or
return negative errno value. If one of them fails, the "err" variable
contains negative value what cause to assertion failure.

This patch updates all functions uses by mlx5_alloc_shared_dev_ctx()
function to update rte_errno and take this value instead of "err" value.

Fixes: 5dfa003db53f ("common/mlx5: fix post doorbell barrier")
Fixes: 5d55a494f4e6 ("net/mlx5: split multi-thread flow handling per OS")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix ASO CT object release

The ASO connection tracking structure is initialized once for sharing
device context.

Its release takes place in the close function which is called for each
ethdev individually. i.e. when there is more than one ethdev under the
same sharing device context, it will be destroyed when one of them is
closed. If the other wants to use it later, it may cause it to crash.

In addition, the creation of this structure is performed in the spawn
function. If one of the creations of the objects following it fails, it
is supposed to be destroyed but this does not happen.

This patch moves its release to the sharing device context free function
and thus solves both problems.

Fixes: 0af8a2298a42 ("net/mlx5: release connection tracking management")
Fixes: ee9e5fad03eb ("net/mlx5: initialize connection tracking management")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix ineffective metadata argument adjustment

In "dv_xmeta_en" devarg there is an option of dv_xmeta_en=3 which
engages tunnel offload mode. In E-Switch configuration, that mode
implicitly activates dv_xmeta_en=1.

The update according to E-switch support is done immediately after the
first parsing of the devargs, but there is another adjustment later.

This patch moves the adjustment after the second parsing.

Fixes: 4ec6360de37d ("net/mlx5: implement tunnel offload")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix sibling device config check

The MLX5 net driver supports "probe again". In probing again, it
creates a new ethdev under an existing infiniband device context.

Sibling devices sharing infiniband device context should have compatible
configurations, so some of the devargs given in the probe again, the
ones that are mainly relevant to the sharing device context are sent to
the mlx5_dev_check_sibling_config function which makes sure that they
compatible its siblings.
However, the arguments are adjusted according to the capability of the
device, and the function compares the arguments of the probe again
before the adjustment with the arguments of the siblings after the
adjustment. A user who sends the same values to all siblings may fail in
this comparison if he requested something that the device does not
support and adjusted.

This patch moves the call to the mlx5_dev_check_sibling_config function
after the relevant adjustments.

Fixes: 92d5dd483450 ("net/mlx5: check sibling device configurations mismatch")
Fixes: 2d241515ebaf ("net/mlx5: add devarg for extensive metadata support")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/i40e: enable maximum frame size at port level

Currently max frame size is set at queue level, which makes the values
of the following counters wrong when a jumbo frame is received.

The expected value:
rx_good_bytes: 0
rx_errors: 1
rx_oversize_errors: 1

The actual value:
rx_good_bytes: 1626
rx_errors: 0
rx_oversize_errors: 0

This patch enables setting max frame size at port level, and makes the
values above right.

Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Tested-by: Peng Zhang <peng1x.zhang@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: fix segmentation offload buffer size

This fix commit ff8b8bcd2ebe, which resulted in incorrect buffer size
being computed for non IPses TSO packets.

Fixes: ff8b8bcd2ebe ("net/iavf: fix segmentation offload condition")
Cc: stable@dpdk.org
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/ice: fix overwriting of LSE bit by DCF

After enabling DCF on a VF, the ice driver stops receiving
link updates on it's Admin Receive Queue. During the init
of DCF ice_aqc_opc_get_link_status command is send to the
firmware without LSE (Link Status Event) bit set. This prevents
the ice driver from receiving up/down events, and correspondingly
updating netdev.

Fixes: 0b02c9519432 ("net/ice: handle PF initialization by DCF")
Cc: stable@dpdk.org
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>

net/af_xdp: reserve fill queue before socket create

Some zero copy AF_XDP drivers eg. ice require that there are addresses
already in the fill queue before the socket is created. Otherwise you may
see log messages such as:

XSK buffer pool does not provide enough addresses to fill 2047 buffers on
Rx ring 0

This commit ensures that the addresses are available before creating the
socket, instead of after.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/af_xdp: ensure socket is deleted on Rx queue setup error

The Rx queue setup can fail for many reasons eg. failure to setup the
custom program, failure to allocate or reserve fill queue buffers,
failure to configure busy polling etc. When a failure like one of these
occurs, if the xsk is already set up it should be deleted before
returning. This commit ensures this happens.

Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
Fixes: 288a85aef192 ("net/af_xdp: enable custom XDP program loading")
Fixes: 055a393626ed ("net/af_xdp: prefer busy polling")
Fixes: 01fa83c94d7e ("net/af_xdp: workaround custom program loading")
Cc: stable@dpdk.org
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

net/sfc: fix memory allocation size for cache

The size of unit cache should be sizeof(**cache) instead of
sizeof(*cache). Memory reallocation is inadequate by sizeof(*cache)
for the platform whose size of pointer is 32-bits. Found by coccinelle
(see https://coccinelle.gitlabpages.inria.fr/website) script.

Fixes: 63abf8d29225 ("net/sfc: support SW stats groups")
Cc: stable@dpdk.org
Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>