dpdk.git
3 years agoexamples/l3fwd-power: fix updating lcore parameters
Anatoly Burakov [Tue, 14 Jul 2020 10:30:02 +0000 (11:30 +0100)]
examples/l3fwd-power: fix updating lcore parameters

When perf-config option is specified, we are calling into the power
library even though it may not necessarily be enabled. It is
questionable whether perf-config option is even applicable to non-power
library modes, but for now, fix it just by avoiding calling into the
power library if it wasn't initialized, and assume that every lcore is
high performance core.

Fixes: e0194feb322c ("examples/l3fwd-power: add interrupt-only mode")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
3 years agopower: fix environment detection
Anatoly Burakov [Tue, 14 Jul 2020 10:30:01 +0000 (11:30 +0100)]
power: fix environment detection

Anything coming from sysfs has a newline at the end. Cut it off before
comparing the strings.

Fixes: 20ab67608a39 ("power: add environment capability probing")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Tested-by: Lihong Ma <lihongx.ma@intel.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agomempool: fix allocation in memzone during retry
Zhike Wang [Tue, 14 Jul 2020 07:26:05 +0000 (15:26 +0800)]
mempool: fix allocation in memzone during retry

If allocation is successful on the first attempt, typically
there is no problem since we allocated everything required and
we'll terminate the loop (if memory chunk is really sufficient
to populate required number of mempool elements).

If the first attempt fails, we try to allocate half
of mem_size and it succeed, we'll have one more iteration of
the for-loop to allocate memory for remaining elements and
should not try the next time with quarter of the mem_size.

It is wrong that max_alloc_size is divided by 2 in the
case of successful allocation as well, or invalid memory
can be allocated, and leads to population failure, then errno
other than ENOMEM may be returned.

Fixes: 3a3d0c75b43e ("mempool: fix slow allocation of large pools")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Zhike Wang <wangzhike@jd.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
3 years agonode: add packet classifier
Nithin Dabilpuram [Sun, 7 Jun 2020 16:40:42 +0000 (22:10 +0530)]
node: add packet classifier

This node classifies pkts based on packet type and
sends them to appropriate next node. This is node
helps in distribution of packets from ethdev_rx node
to different next node with a constant overhead for
all packet types.

Currently all except non fragmented IPV4 packets are marked
to be sent to "pkt_drop" node.
Performance difference on ARM64 Octeontx2 is -4.9% due to
addition of new node in the path.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agoraw/ifpga/base: fix NIOS SPI init
Tianfei Zhang [Tue, 14 Jul 2020 21:35:09 +0000 (05:35 +0800)]
raw/ifpga/base: fix NIOS SPI init

Add fecmode setting on NIOS SPI primary initialization.
this SPI is shared by NIOS core inside FPGA, NIOS will
use this SPI primary to do some one-time initialization
after power up, and then release the control to DPDK.

Fix the timeout initialization for polling the
NIOS_INIT_DONE.

Fixes: bc44402f ("raw/ifpga/base: configure FEC mode")
Cc: stable@dpdk.org
Signed-off-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
3 years agoraw/ifpga/base: fix SPI transaction
Tianfei Zhang [Tue, 14 Jul 2020 21:35:08 +0000 (05:35 +0800)]
raw/ifpga/base: fix SPI transaction

0x4a means idle status on physical layer. when encounter
0x4a on raw data, it need insert a ESCAPE character for
indication.

Fixes: 96ebfcf8 ("raw/ifpga/base: add SPI and MAX10 device driver")
Cc: stable@dpdk.org
Signed-off-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
3 years agonet/vhost: support queue update
Matan Azrad [Tue, 21 Jul 2020 16:38:16 +0000 (16:38 +0000)]
net/vhost: support queue update

The commit below changed the readiness condition of vhost device to fix
multi-queues issues showed with QEMU versions.

Now, the vhost device is ready when the first queue-pair is ready.
When more queues are being ready, the queue state callback will be
triggered to notify the vhost manager.

In case of Rx interrupt configuration, the vhost driver set the
kickfd queue file descriptor in order to be notified on Rx traffic.

So, when queue becomes ready, the kickfd may be changed and should be
updated in the Rx interrupt structure.

Update kickfd when the queue state callback is invoked.
Also update event notification when it is enabled by the user.

Fixes: d0fcc38f5fa4 ("vhost: improve device readiness notifications")

Suggested-by: Marvin Liu <yong.liu@intel.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agonet/sfc: do not enforce hash offload in RSS multi-queue
Andrew Rybchenko [Tue, 21 Jul 2020 14:54:49 +0000 (15:54 +0100)]
net/sfc: do not enforce hash offload in RSS multi-queue

Rx RSS hash offload should be controlled by the user and should
not be enforced by RSS multi-queue Rx mode.

Fixes: 8b945a7f7dcb ("drivers/net: update Rx RSS hash offload capabilities")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
3 years agonet/sfc: avoid unnecessary actions on dummy default MAC set
Andrew Rybchenko [Tue, 21 Jul 2020 08:58:45 +0000 (09:58 +0100)]
net/sfc: avoid unnecessary actions on dummy default MAC set

Just an optimization to avoid extra reconfiguration when it
is not actually required.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
3 years agonet/sfc: remove inclusion of unused headers
Andrew Rybchenko [Tue, 21 Jul 2020 08:57:55 +0000 (09:57 +0100)]
net/sfc: remove inclusion of unused headers

Defines and functions from rte_mbuf_ptype.h are not used.

Only libefx types and EF10 register definitions are used.
Native datapaths should be independent from main libefx interface.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
3 years agonet/sfc/base: improve headers independence
Andrew Rybchenko [Tue, 21 Jul 2020 08:57:54 +0000 (09:57 +0100)]
net/sfc/base: improve headers independence

efx_types.h uses defines from efx_annote.h, but does not include the
header. As the result if efx_types.h is included by a driver first,
build fails.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
3 years agonet: fix pedantic build
Raslan Darawsheh [Tue, 21 Jul 2020 08:31:55 +0000 (11:31 +0300)]
net: fix pedantic build

when trying to compile rte_mpls with pedantic enabled,
on old compilers like 4.8 it will complain about bit field definition.

error: type of bit-field 'bs' is a GCC extension [-Werror=pedantic]
error: type of bit-field 'tc' is a GCC extension [-Werror=pedantic]
error: type of bit-field 'tag_lsb' is a GCC extension [-Werror=pedantic]

This fixes the compilation error by adding extension to the header
definition.

Fixes: e480cf487a0d ("net: add MPLS header structure")
Cc: stable@dpdk.org
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agovhost: fix double-free with zero-copy
Patrick Fu [Tue, 21 Jul 2020 12:10:57 +0000 (12:10 +0000)]
vhost: fix double-free with zero-copy

zmbufs should be set to NULL when getting freed to avoid double free on
the same buffer pointer

Fixes: b0a985d1f340 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovhost: fix async completion of multi-seg packets
Patrick Fu [Tue, 21 Jul 2020 05:47:20 +0000 (13:47 +0800)]
vhost: fix async completion of multi-seg packets

In async enqueue copy, a packet could be split into multiple copy
segments. When polling the copy completion status, current async data
path assumes the async device callbacks are aware of the packet
boundary and return completed segments only if all segments belonging
to the same packet are done. Such assumption are not generic to common
async devices and may degrade the copy performance if async callbacks
have to implement it in software manner.

This patch adds tracking of the completed copy segments at vhost side.
If async copy device reports partial completion of a packets, only
vhost internal record is updated and vring status keeps unchanged
until remaining segments of the packet are also finished. The async
copy device is no longer necessary to care about the packet boundary.

Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovhost: fix missing virtqueue status check in async path
Patrick Fu [Tue, 21 Jul 2020 03:35:57 +0000 (11:35 +0800)]
vhost: fix missing virtqueue status check in async path

Vring should not be touched if vq is disabled. This patch adds the vq
status check in async enqueue polling to avoid accessing to a disabled
queue.

Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovhost: fix missing device pointer validity check
Patrick Fu [Tue, 21 Jul 2020 03:23:04 +0000 (11:23 +0800)]
vhost: fix missing device pointer validity check

This patch adds the check of dev pointer in vhost async enqueue
completion poll. If a NULL dev pointer detected, the poll function
returns immediately.

Coverity issue: 360839
Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agonet/octeontx2: free CQ ring memzone on queue release
Pavan Nikhilesh [Sun, 28 Jun 2020 23:31:35 +0000 (05:01 +0530)]
net/octeontx2: free CQ ring memzone on queue release

Free CQ ring memzone on Rx queue release. This prevents CQ using
incorrect memory size when ring size is reconfigured.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
3 years agonet/mlx5: fix tunnel flow priority
Gregory Etelson [Thu, 16 Jul 2020 07:39:58 +0000 (10:39 +0300)]
net/mlx5: fix tunnel flow priority

PMD flow priority is different from application flow priority.  Flow
rules with higher match granularity assigned higher PMD priority. Also
PMD splits internally RSS flows according to flow RSS layer.

Final PMD flow rule priority derived from the last match item network
level, after PMD adjusts flow rule, where L4 match gets the highest
priority and L2 the lowest.

The patch adjusts tunnels flow rule priority calculation for PMDs
running verb API.

Introduce MLX5_TUNNEL_PRIO_GET macro.

Fixes: 4a78c88e3bae ("net/mlx5: fix Verbs flow tunnel")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
3 years agonet/mlx5: fix VLAN push action on hairpin queue
Dekel Peled [Wed, 15 Jul 2020 07:31:01 +0000 (10:31 +0300)]
net/mlx5: fix VLAN push action on hairpin queue

Push VLAN action is allowed on Tx only, same as encap action.
Flow rules for hairpin queue are created on Rx, and split
by PMD to Rx and Tx rules, according to the above limitation.
In current implementation the encap action is split to Tx rule.
This patch adds the same handling for push-vlan action, as well as
its complementing actions set-vlan-vid and set-vlan-pcp.

Fixes: d85c7b5ea59f ("net/mlx5: split hairpin flows")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: fix VLAN pop with decap action validation
Dekel Peled [Wed, 15 Jul 2020 07:30:33 +0000 (10:30 +0300)]
net/mlx5: fix VLAN pop with decap action validation

The combination of decap action followed by pop VLAN action is not
fully validated in existing code.

This patch updates the validation function of pop vlan action.
Pop VLAN with preceding Decap requires inner header with VLAN.
Pop VLAN without preceding Decap requires outer header with VLAN.

Fixes: b41e47da2592 ("net/mlx5: support pop flow action on VLAN header")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: fix HW counters path in switchdev mode
Shy Shyman [Wed, 15 Jul 2020 10:50:55 +0000 (13:50 +0300)]
net/mlx5: fix HW counters path in switchdev mode

When debugging performance of a DPDK application the user may
need to view the different statistics of DPDK (for example out_of_buffer)
This can be enabled by using testpmd command 'show port xstats
<port_id>' for example.

The current implementation assumes legacy mode in which the counters
are at <ibdev_path>/<port_id>/hw_counters/<file_name>.
In switchdev mode the counters file is located right after the device
name, hence resides at <ibdev_path>/hw_counters.

The fix tries to open the path in the second location after a failure
to open the file from the first location.

Fixes: 9c0a9eed37f1 ("net/mlx5: switch to the names in the shared IB context")
Cc: stable@dpdk.org
Signed-off-by: Shy Shyman <shys@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agonet/mlx5: add queue start and stop
Viacheslav Ovsiienko [Sun, 19 Jul 2020 15:35:37 +0000 (15:35 +0000)]
net/mlx5: add queue start and stop

The mlx5 PMD did not support queue_start and queue_stop eth_dev API
routines, queue could not be suspended and resumed during device
operation.

There is the use case when this feature is crucial for applications:

- there is the secondary process handling the queue
- secondary process crashed/aborted
- some mbufs were allocated or used by secondary application
- some mbufs were allocated by Rx queues to receive packets
- some mbufs were placed to send queue
- queue goes to undefined state

In this case there was no reliable way to recovery queue handling
by restarted secondary process but reset queue to initial state
freeing all involved resources, including buffers involved in queue
operations, reset the mbuf pools, and then reinitialize queue
to working state:

- reset mbuf pool, allocate all mbuf to initialize pool into
  safe state after the crush and allow safe mbuf free calls
- stop queue, free all potentially involved mbufs
- reset mbuf pool again
- start queue, reallocate mbufs needed

This patch introduces the queue start/stop feature with some
limitations:

- hairpin queues are not supported
- it is application responsibility to synchronize start/stop
  with datapath routines, rx/tx_burst must be suspended during
  the queue_start/queue_stop calls
- it is application responsibility to track queue usage and
  provide coordinated queue_start/queue_stop calls from
  secondary and primary processes.
- Rx queues with vectorized Rx routine and engaged CQE
  compression are not supported by this patch currently

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agonet/mlx5: implement CQ for Rx using DevX API
Dekel Peled [Sun, 19 Jul 2020 11:13:06 +0000 (14:13 +0300)]
net/mlx5: implement CQ for Rx using DevX API

This patch continues the work to use DevX API for different objects
creation and management.
On Rx control path, the RQ, RQT, and TIR objects can already be
created using DevX API.
This patch adds the support to create CQ for RxQ using DevX API.
The corresponding event channel is also created and utilized using
DevX API.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agocommon/mlx5: support more fields in DevX CQ create
Dekel Peled [Sun, 19 Jul 2020 11:10:46 +0000 (14:10 +0300)]
common/mlx5: support more fields in DevX CQ create

Update CQ create operation using DevX API, support additional fields.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agocommon/mlx5: remove inclusion of Verbs header files
Ophir Munk [Sun, 19 Jul 2020 10:18:16 +0000 (10:18 +0000)]
common/mlx5: remove inclusion of Verbs header files

Several source files include Verbs header files as in (1). These source
files will not compile under non-Linux operating systems. This commit
removes this inclusion in two cases:

Case 1: There is no usage of ibv_* or mlx5dv_* symbols in the source
file so the inclusion in (1) can be safely removed.

Case 2: Verbs symbols are used. Please note the inclusion in (1) already
appears in file linux/mlx5_glue.h (which represents the interface
to the rdma-core library). Therefore, replace (1) in the source file
with (2).  Under non-Linux operating systems - file mlx5_glue.h will not
include (1).

(1)
 #include <infiniband/verbs.h>
 #include <infiniband/mlx5dv.h>

(2)
 #include <mlx5_glue.h>

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: refactor multi-process communication
Ophir Munk [Sun, 19 Jul 2020 10:18:15 +0000 (10:18 +0000)]
net/mlx5: refactor multi-process communication

1. The shared data communication between the primary and the secondary
processes is implemented using Linux API. Move the Linux API code under
linux directory (file linux/mlx5_os.c).

2. File net/mlx5/mlx5_mp.c handles requests to the primary and secondary
processes (e.g. start_rxtx, stop_rxtx). It is Linux based so it is moved
under linux (new file linux/mlx5_mp_os.c).

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: cleanup header file
Ophir Munk [Sun, 19 Jul 2020 10:18:14 +0000 (10:18 +0000)]
net/mlx5: cleanup header file

The cleanup refers to header file mlx5.h.
1. Remove unused prototypes.
2. Move prototypes under their correct title.
3. Change functions to static and remove their prototye from the header
file.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: eliminate dependency on Linux in shared header
Ophir Munk [Sun, 19 Jul 2020 10:18:13 +0000 (10:18 +0000)]
net/mlx5: eliminate dependency on Linux in shared header

This commit eliminates Linux dependencies in shared file mlx5.h.

1. All functions using 'struct ifreq' are moved to file
linux/mlx5_ethdev_os.c such that this struct can be removed from mlx5.h.
2. Function mlx5_set_flags() that uses Linux flags (e.g. IFF_UP) is
changed to static and its prototype is removed from mlx5.h.
3. Remove redundant member verbs_action from 'struct mlx5_priv'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: wrap Linux promiscuous and multicast functions
Ophir Munk [Sun, 19 Jul 2020 10:18:12 +0000 (10:18 +0000)]
net/mlx5: wrap Linux promiscuous and multicast functions

This commit adds Linux implementation of routines mlx5_os_set_promisc()
and mlx5_os_set_promisc(). The routines call netlink APIs.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: refactor Linux MAC operations
Ophir Munk [Sun, 19 Jul 2020 10:18:11 +0000 (10:18 +0000)]
net/mlx5: refactor Linux MAC operations

Move OS specific MAC operations add, remove, modify VF into file
linux/mlx5_os.c.
Remove unused function mlx5_get_mac().

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: replace Linux specific calls
Ophir Munk [Sun, 19 Jul 2020 10:18:10 +0000 (10:18 +0000)]
net/mlx5: replace Linux specific calls

The following Linux calls are replaced by their matching rte APIs.

mmap ==> rte_mem_map()
munmap == >rte_mem_unmap()
sysconf(_SC_PAGESIZE) ==> rte_mem_page_size()

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: move flow priority discovery to Verbs file
Ophir Munk [Sun, 19 Jul 2020 10:18:09 +0000 (10:18 +0000)]
net/mlx5: move flow priority discovery to Verbs file

Function calls mlx5_flow_adjust_priority() and
mlx5_flow_discover_priorities() are Verbs based. Move them from file
mlx5_flow.c to file mlx5_flow_verbs.c

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: add option to configure FCS or decapsulation
Suanming Mou [Wed, 15 Jul 2020 13:10:21 +0000 (21:10 +0800)]
net/mlx5: add option to configure FCS or decapsulation

There are some limitations on some NICs (at least on ConnectX-6 Dx
and BlueField 2) with supporting FCS (frame checksum) scattering for
the tunnel decapsulated packets.

For the case only one of the features can be supported in the same time,
and the new devarg "decap_en" is introduced to provide the choice to the
users.

If FCS scattering feature is not supposed to be engaged by application,
this new devarg should be specified as "decap_en=0", forcing the FCS
feature enable and rejecting tunnel decap actions in the rte_flow engine.
If FCS scatter is not needed and application supposes to use tunnel
decapsulation in rte_flow, the devarg can be omitted or set to non-zero
value (this is default settings).

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agocommon/mlx5: query scatter FCS with decap capability
Suanming Mou [Wed, 15 Jul 2020 13:10:20 +0000 (21:10 +0800)]
common/mlx5: query scatter FCS with decap capability

As scatter FCS might be not supported for decapsulated tunnel
packets in some NIC HW, a new capability bit which indicates
if scatter FCS works with decap is added.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agonet/mlx5: convert queue objects to unified malloc
Suanming Mou [Sun, 28 Jun 2020 09:21:47 +0000 (17:21 +0800)]
net/mlx5: convert queue objects to unified malloc

This commit allocates the Rx/Tx queue objects from unified malloc
function.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: convert configuration objects to unified malloc
Suanming Mou [Sun, 28 Jun 2020 09:02:44 +0000 (17:02 +0800)]
net/mlx5: convert configuration objects to unified malloc

This commit allocates the miscellaneous configuration objects from the
unified malloc function.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agocommon/mlx5: convert data path objects to unified malloc
Suanming Mou [Sun, 28 Jun 2020 08:36:15 +0000 (16:36 +0800)]
common/mlx5: convert data path objects to unified malloc

This commit allocates the data path object page and B-tree table memory
from unified malloc function with explicit flag MLX5_MEM_RTE.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agocommon/mlx5: convert control path memory to unified malloc
Suanming Mou [Sun, 28 Jun 2020 08:18:15 +0000 (16:18 +0800)]
common/mlx5: convert control path memory to unified malloc

This commit allocates the control path objects memory from the unified
malloc function.

These objects are all used during the instances initialize, it will not
affect the data path.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Signed-off-by: Ali Alnubani <alialnu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: convert control path memory to unified malloc
Suanming Mou [Sun, 28 Jun 2020 07:35:26 +0000 (15:35 +0800)]
net/mlx5: convert control path memory to unified malloc

This commit allocates the control path memory from unified malloc
function.

The objects be changed:

1. hlist;
2. rss key;
3. vlan vmwa;
4. indexed pool;
5. fdir objects;
6. meter profile;
7. flow counter pool;
8. hrxq and indirect table;
9. flow object cache resources;
10. temporary resources in flow create;

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: add option to allocate memory from system
Suanming Mou [Sun, 28 Jun 2020 03:41:57 +0000 (11:41 +0800)]
net/mlx5: add option to allocate memory from system

Currently, for MLX5 PMD, once millions of flows created, the memory
consumption of the flows are also very huge. For the system with limited
memory, it means the system need to reserve most of the memory as huge
page memory to serve the flows in advance. And other normal applications
will have no chance to use this reserved memory any more. While most of
the time, the system will not have lots of flows, the  reserved huge
page memory becomes a bit waste of memory at most of the time.

By the new sys_mem_en devarg, once set it to be true, it allows the PMD
allocate the memory from system by default with the new add mlx5 memory
management functions. Only once the MLX5_MEM_RTE flag is set, the memory
will be allocate from rte, otherwise, it allocates memory from system.

So in this case, the system with limited memory no need to reserve most
of the memory for hugepage. Only some needed memory for datapath objects
will be enough to allocated with explicitly flag. Other memory will be
allocated from system. For system with enough memory, no need to care
about the devarg, the memory will always be from rte hugepage.

One restriction is that for DPDK application with multiple PCI devices,
if the sys_mem_en devargs are different between the devices, the
sys_mem_en only gets the value from the first device devargs, and print
out a message to warn that.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agocommon/mlx5: add memory management functions
Suanming Mou [Sun, 28 Jun 2020 02:21:58 +0000 (10:21 +0800)]
common/mlx5: add memory management functions

Add the internal mlx5 memory management functions:

mlx5_malloc_mem_select();
mlx5_memory_stat_dump();
mlx5_rellaocate();
mlx5_malloc();
mlx5_free();

User will be allowed to manage memory from system or from rte memory
with the unified functions.

In this case, for the system with limited memory which can not reserve
lots of rte hugepage memory in advanced, will allocate the memory from
system for some of not so important control path objects based on the
sys_mem_en configuration.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agodoc: update release notes and mlx5 guide for eCPRI
Bing Zhao [Fri, 17 Jul 2020 07:11:51 +0000 (15:11 +0800)]
doc: update release notes and mlx5 guide for eCPRI

Update the release notes of mlx5 PMD part by adding the
support of eCPRI.
Update the firmware configuration in the mlx5 NIC guide to support
the usage of eCPRI.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agonet/mlx5: add eCPRI flex parser capacity check
Bing Zhao [Fri, 17 Jul 2020 07:11:50 +0000 (15:11 +0800)]
net/mlx5: add eCPRI flex parser capacity check

If the NIC or the FW does not support the dynamic flex parser,
it will return error when trying to create the parser for eCRPI.
Then it is hard to know the detail error reason of the failure.
Before creating the parser node and the following usage of the
parser, the capacity bit saved in the HCA_CAP could be used to
confirm if the dynamic flex parser is supported.
If no, an error will be returned directly with ENOTSUP to prevent
the following steps to be executed.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agonet/mlx5: create and destroy eCPRI flex parser
Bing Zhao [Fri, 17 Jul 2020 07:11:49 +0000 (15:11 +0800)]
net/mlx5: create and destroy eCPRI flex parser

eCPRI protocol has unified format layout for the variants, over
ETH layer (including .1Q) and UDP layer.

The common header of the message has 4 bytes fixed length, and the
message payload layers are different based on the type field. Now
only type #0, #2 and #5 will be supported, and 2 bytes are needed.

When creating the flex parser, the header will be extended to 8
bytes and 2 DW samples are needed. The 1st DW starts from offset 0
and will be used for the type field of the common header. The 2nd
DW starts from offset 4 and will be used for the physical channel
ID, real-time control ID or measurement ID fields.

The parser will be created once a flow with eCPRI item is observed
for the first time. After creating, it will remain in the system
and HW until the device is stopped. Right now, there is no need to
destroy the eCPRI flex parser after the last flow with eCPRI item
is destroyed. This is to get rid of the alternate states of creating
and destroying eCPRI flex parser with a single eCPRI flow.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agocommon/mlx5: add DevX command for flex parsers
Bing Zhao [Fri, 17 Jul 2020 07:11:48 +0000 (15:11 +0800)]
common/mlx5: add DevX command for flex parsers

In order to use dynamic flex parser to parse protocols that is not
supported natively, two steps are needed.

Firstly, creating the parse graph node. There are three parts of the
flex parser: node, arc and sample. Node is the whole structure of a
flex parser, when creating, the length of the protocol should be
specified. Then the input arc(s) is(are) mandatory, it will tell the
HW when to use this parser to parse the packet. For a single parser
node, up to 8 input arcs could be supported and it gives SW ability
to support this protocol over multiple layers. The output arc is
optional and also up to 8 arcs could be supported. If the protocol
is the last header of the stack, then output arc should be NULL. Or
else it should be specified. The protocol type in the arc is used to
indicate the parser pointing to or from this flex parser node. For
output arc, the next header type field offset and size should be set
in the node structure, then the HW could get the proper type of the
next header and decide which parser to point to.
Note: the parsers have two types now, native parser and flex parser.
The arc between two flex parsers are not supported in this stage.

Secondly, querying the sample IDs. If the protocol header parsed
with flex parser needs to used in flow rule offloading, the DW
samples are needed when creating the parse graph node. The offset
of bytes starting from the header needs to be set. After creating
the node successfully, a general object handle will be returned.
This object could be queried with Devx command to get the sample
IDs.
When creating a flow, sample IDs could be used to sample a DW from
the parsed header - 4 continuous bytes starting from the offset. The
flow entry could specify some mask to use part of this DW for
matching. Up to 8 samples could be supported for a single parse
graph node. The offset should not exceed the header length.

The HW resources have some limitation, low layer driver error should
be checked once there is a failure of creating parse graph node.

Signed-off-by: Netanel Gonen <netanelg@mellanox.com>
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agocommon/mlx5: add flex parser DevX structures
Bing Zhao [Fri, 17 Jul 2020 07:11:47 +0000 (15:11 +0800)]
common/mlx5: add flex parser DevX structures

The structures and other definitions will be used for the dynamic
flex parser creation via Devx command interface. These structures
will be used as some some intermediate variables and input
parameters for the parser creation API.
It is better to keep all members consistent with the PRM definition
even though some of them will not be used.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agonet/mlx5: add flow translation of eCPRI header
Bing Zhao [Fri, 17 Jul 2020 07:11:46 +0000 (15:11 +0800)]
net/mlx5: add flow translation of eCPRI header

In the translation stage, the eCPRI item should be translated into
the format that lower layer driver could use. All the fields that
need to match must be in network byte order after translation, as
well as the mask. Since the header in the item belongs to the network
layers stack, and the input parameter of the header is considered to
be in big-endian format already.

Base on the definition in the PRM, the DW samples will be used for
matching in the FTE/STE. Now, the type field and only the PC ID, RTC
ID, and DLY MSR ID of the payload will be supported. The masks should
be 00 ff 00 00 ff ff(00) 00 00 in the network order. Two DWs are
needed to support such matching. The mask fields could be zeros to
support some wildcard rules. But it makes no sense to support the
rule matching only on the payload but without matching type field.

The DW samples should be stored after the flex parser creation for
eCPRI. There is no need to query the sample IDs each time when
creating a flow rule with eCPRI item. It will not introduce
insertion rate degradation significantly.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agonet/mlx5: add flow validation of eCPRI header
Bing Zhao [Fri, 17 Jul 2020 07:11:45 +0000 (15:11 +0800)]
net/mlx5: add flow validation of eCPRI header

When creating a flow with eCPRI header item, the validation of it is
mandatory. The detailed limitations are listed below:
  1. Over Ether / VLAN, ethertype must be 0xAEFE.
  2. No tunnel support is described in the specification now.
  3. L3 layer is only supported when L4 is UDP, see #4.
  4. Over TCP is not supported from the specification, and over UDP
     is not supported right now.
  5. Concatenation indicator matching is not supported now.
  6. No need to check the revision.
  7. Only type field in the common header is mandatory, and one byte
     should be matched integrally.
  8. Fields in the message payload header are optional.
  9. Only messages with type #0, #2 and #5 are supported now.

Some limitations are only from software right now, because there is
no need to support all the message types and variants of protocol
stack listed in the specification.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
3 years agonet/mlx5: convert Rx timestamps in real-time format
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:20 +0000 (08:23 +0000)]
net/mlx5: convert Rx timestamps in real-time format

The ConnectX-6DX supports the timestamps in various formats,
the new realtime format is introduced - the upper 32-bit word
of timestamp contains the UTC seconds and the lower 32-bit word
contains the nanoseconds. This patch detects what format is
configured in the NIC and performs the conversion accordingly.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agocommon/mlx5: add register access DevX routine
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:19 +0000 (08:23 +0000)]
common/mlx5: add register access DevX routine

The DevX routine to read/write NIC registers via DevX API is added.
This is the preparation step to check timestamp modes and units
and gather the extended statistics.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: provide send scheduling error statistics
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:18 +0000 (08:23 +0000)]
net/mlx5: provide send scheduling error statistics

The mlx5 PMD exposes the following new introduced
extended statistics counter to report the errors
of packet send scheduling on timestamps:

  - txpp_err_miss_int - rearm queue interrupt was not handled
    was not handled in time and service routine might miss
    the completions

  - txpp_err_rearm_queue - reports errors in rearm queue
  - txpp_err_clock_queue - reports errors in clock queue

  - txpp_err_ts_past - timestamps in the packet being sent
    were found in the past, timestamps were ignored

  - txpp_err_ts_future - timestamps in the packet being sent
    were found in the too distant future (beyond HW/clock queue
    capabilities to schedule, typically it is about 16M of
    tx_pp devarg periods)

  - txpp_jitter - estimated jitter in device clocks between
    8K completions of Clock Queue.

  - txpp_wander - estimated wander in device clocks between
    16M completions of Clock Queue.

  - txpp_sync_lost - error flag, the Clock Queue completions
    synchronization is lost, accurate packet scheduling can
    not be handled, timestamps are being ignored, the restart
    of all ports using scheduling must be performed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: support reading device clock
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:17 +0000 (08:23 +0000)]
net/mlx5: support reading device clock

If send schedule feature is engaged there is the Clock Queue
created, that reports reliable the current device clock counter
value. The device clock counter can be read directly from the
Clock Queue CQE.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: support scheduling on send routine template
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:16 +0000 (08:23 +0000)]
net/mlx5: support scheduling on send routine template

This patch adds send scheduling on timestamps into tx_burst
routine template. The feature is controlled by static configuration
flag, the actual routines supporting the new feature are generated
over this updated template.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: prepare Tx to support scheduling
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:15 +0000 (08:23 +0000)]
net/mlx5: prepare Tx to support scheduling

The new static control flag is introduced to control
routine generating from template, enabling the scheduling
on timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: convert timestamp to completion index
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:14 +0000 (08:23 +0000)]
net/mlx5: convert timestamp to completion index

The application provides timestamps in Tx mbuf as clocks,
the hardware performs scheduling on Clock Queue completion index
match. This patch introduces the timestamp-to-completion-index
inline routine.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: prepare Tx queue structures to support timestamp
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:13 +0000 (08:23 +0000)]
net/mlx5: prepare Tx queue structures to support timestamp

The fields to support send scheduling on dynamic timestamp
field are introduced and initialized on device start.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: introduce clock queue service routine
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:12 +0000 (08:23 +0000)]
net/mlx5: introduce clock queue service routine

Service routine is invoked periodically on Rearm Queue
completion interrupts, typically once per some milliseconds
(1-16) to track clock jitter and wander in robust fashion.
It performs the following:

- fetches the completed CQEs for Rearm Queue
- restarts Rearm Queue on errors
- pushes new requests to Rearm Queue to make it
  continuously running and pushing cross-channel requests
  to Clock Queue
- reads and caches the Clock Queue CQE to be used in datapath
- gathers statistics to estimate clock jitter and wander
- gathers Clock Queue errors statistics

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: allocate packet pacing context
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:11 +0000 (08:23 +0000)]
net/mlx5: allocate packet pacing context

This patch allocates the Packet Pacing context from the kernel,
configures one according to requested pace send scheduling
granularity and assigns to Clock Queue.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: create Tx queues with DevX
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:10 +0000 (08:23 +0000)]
net/mlx5: create Tx queues with DevX

To provide the packet send schedule on mbuf timestamp the Tx
queue must be attached to the same UAR as Clock Queue is.
UAR is special hardware related resource mapped to the host
memory and provides doorbell registers, the assigning UAR
to the queue being created is provided via DevX API only.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: create rearm queue for packet pacing
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:09 +0000 (08:23 +0000)]
net/mlx5: create rearm queue for packet pacing

The dedicated Rearm Queue is needed to fire the work requests to
the Clock Queue in realtime. The Clock Queue should never stop,
otherwise the clock synchronization might be broken and packet
send scheduling would fail. The Rearm Queue uses cross channel
SEND_EN/WAIT operations to provides the requests to the
Clock Queue in robust way.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: create clock queue for packet pacing
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:08 +0000 (08:23 +0000)]
net/mlx5: create clock queue for packet pacing

This patch creates the special completion queue providing
reference completions to schedule packet send from
other transmitting queues.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: introduce shared UAR resource
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:07 +0000 (08:23 +0000)]
net/mlx5: introduce shared UAR resource

This is preparation step before moving the Tx queue creation
to the DevX approach. Some features require the shared UAR
for Tx queues and scheduling completion queues, the patch
manages the shared UAR.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: fix UAR lock sharing for multiport devices
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:06 +0000 (08:23 +0000)]
net/mlx5: fix UAR lock sharing for multiport devices

The master and representors might be created over the multiport
Infiniband devices and the UAR resource allocated for sibling
ports might belong to the same underlying Infiniband device.
Hardware requires the write access to the UAR must be performed
as atomic 64-bit write, on 32-bit systems this is two sequential
writes, protected by lock. Due to possibility to share the same
UAR between sibling devices the locks must be moved to shared
context.

Fixes: f048f3d479a6 ("net/mlx5: switch to the shared IB device context")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/mlx5: introduce send scheduling devargs
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:05 +0000 (08:23 +0000)]
net/mlx5: introduce send scheduling devargs

This patch introduces the new devargs:

tx_pp - enables accurate packet send scheduling on mbuf timestamps
  in the PMD. On the device start if "rte_dynflag_timestamp"
  dynamic flag is registered and this devarg non-zero value is
  specified, the driver initializes all necessary internal
  infrastructure to provide packet scheduling. The parameter
  value specifies scheduling granularity in nanoseconds.

tx_skew - the parameter adjusts the send packet scheduling on
  timestamps and represents the average delay between beginning
  of the transmitting descriptor processing by the hardware and
  appearance of actual packet data on the wire. The value should
  be provided in nanoseconds and is valid only if tx_pp parameter
  is specified. The default value is zero.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agocommon/mlx5: prepare support of packet pacing
Viacheslav Ovsiienko [Thu, 16 Jul 2020 08:23:04 +0000 (08:23 +0000)]
common/mlx5: prepare support of packet pacing

This patch prepares the common part of the mlx5 PMDs to
support packet send scheduling on mbuf timestamps:

  - the DevX routine to query the packet pacing HCA capabilities
  - packet pacing Send Queue attributes support
  - the hardware related definitions

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agocommon/mlx5: fix link with ibverbs glue dlopen option
Thomas Monjalon [Mon, 13 Jul 2020 15:37:10 +0000 (17:37 +0200)]
common/mlx5: fix link with ibverbs glue dlopen option

In case the ibverbs glue is a separate library to dlopen,
the PMD library must allocate a glue structure to be filled by dlopen.

The glue management was in mlx5_common.c and moved to mlx5_common_os.c,
but the variable allocation was not removed from the original file.
The consequence was a link failure, if ibverbs dlopen option is enabled,
because of the redefinition of the variable (with GCC 10):
multiple definition of 'mlx5_glue'

The original definition is removed to keep only the one moved
in the Linux sub-directory.

Fixes: 79aa430721b1 ("common/mlx5: split common file under Linux directory")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@mellanox.com>
3 years agonet/e1000: fix crash on Tx done clean up
Jeff Guo [Thu, 16 Jul 2020 09:24:38 +0000 (17:24 +0800)]
net/e1000: fix crash on Tx done clean up

As tx mbuf is not set for some advanced descriptors, if there is no
mbuf checking before rte_pktmbuf_free_seg() function be called on
the process of tx done clean up, that will cause a segfault. So add
a NULL pointer check to fix it.

Bugzilla ID: 501
Fixes: 8d907d2b79f7 ("net/igb: free consumed Tx buffers on demand")
Cc: stable@dpdk.org
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
3 years agonet/iavf: fix GTPU L4 hash
Jeff Guo [Mon, 20 Jul 2020 04:00:20 +0000 (12:00 +0800)]
net/iavf: fix GTPU L4 hash

When the configure pattern involve GTPU inner l3 and l4, even the
configure input set only l3 but not l4, the different l4 protocol
header should also be configured for the different l4 protocol.

Fixes: 215a247b5f33 ("net/iavf: refactor hash flow")
Fixes: 642f20195015 ("net/iavf: support RSS for IPv4 IPv6 mix of GTP")

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: fix GTPU L4 hash
Jeff Guo [Tue, 21 Jul 2020 02:32:46 +0000 (10:32 +0800)]
net/ice: fix GTPU L4 hash

When the configure pattern involve GTPU inner l3 and l4, even the
configure input set only l3 but not l4, the different l4 protocol
header should also be configured for the different l4 protocol.

Fixes: 0b952714e9c1 ("net/ice: refactor PF hash flow")
Fixes: de32fa2ba27b ("net/ice: support RSS for IPv6 prefix")

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: fix symmetric hash configuration
Jeff Guo [Mon, 20 Jul 2020 09:13:56 +0000 (17:13 +0800)]
net/ice: fix symmetric hash configuration

Some protocol don't support symmetric hash, need to handle these cases.
When set an invalid symmetric hash rule, just return failed.

Fixes: 4eafe71ee952 ("net/ice: fix RSS type")

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/e1000: report VLAN extend capability
Zhihong Peng [Tue, 21 Jul 2020 03:05:14 +0000 (23:05 -0400)]
net/e1000: report VLAN extend capability

The rte_eth_dev_set_vlan_offload function will check vlan rx offload
capability, the i350/i210/i211 nics have vlan extend feature but
DEV_RX_OFFLOAD_VLAN_EXTEND is not set into the capability, that will
cause setting fail. So need to add this capability in
igb_get_rx_port_offloads_capa function.

Fixes: ef990fb56e55 ("net/e1000: convert to new Rx offloads API")
Cc: stable@dpdk.org
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
3 years agonet/i40e: report VLAN filter capability
Zhihong Peng [Tue, 21 Jul 2020 02:45:28 +0000 (22:45 -0400)]
net/i40e: report VLAN filter capability

The rte_eth_dev_set_vlan_offload function will check vlan rx offload
capability, the i40e vf has vlan filter feature but
DEV_RX_OFFLOAD_VLAN_FILTER is not set into the capability, that will
cause setting fail. So need to add this capability in
i40e_vf_representor_dev_infos_get function.

Fixes: e0cb96204b71 ("net/i40e: add support for representor ports")
Cc: stable@dpdk.org
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
3 years agonet/dpaa: support queue info routines
Hemant Agrawal [Fri, 10 Jul 2020 16:21:37 +0000 (21:51 +0530)]
net/dpaa: support queue info routines

This patch add support for rxq_info_get and txq_info_get

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
3 years agonet/dpaa2: support queue info routines
Hemant Agrawal [Fri, 10 Jul 2020 16:21:36 +0000 (21:51 +0530)]
net/dpaa2: support queue info routines

This patch add support for rxq_info_get and txq_info_get

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
3 years agonet/dpaa2: support using Tx queue descriptor size
Hemant Agrawal [Fri, 10 Jul 2020 16:21:35 +0000 (21:51 +0530)]
net/dpaa2: support using Tx queue descriptor size

Add support to use Tx queue desc size to configure
congestion notification on TX queue

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
3 years agonet/dpaa2: report error on queue deferred start
Hemant Agrawal [Fri, 10 Jul 2020 16:21:34 +0000 (21:51 +0530)]
net/dpaa2: report error on queue deferred start

This patch add support to reports errors on configuring
deferred start for rx or tx queues.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
3 years agonet/dpaa: report error on using deferred start
Hemant Agrawal [Fri, 10 Jul 2020 16:21:33 +0000 (21:51 +0530)]
net/dpaa: report error on using deferred start

This patch add support to report on error
for rx and tx deferred start config

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
3 years agonet/dpaa2: add Tx/Rx burst mode info
Apeksha Gupta [Fri, 10 Jul 2020 16:21:32 +0000 (21:51 +0530)]
net/dpaa2: add Tx/Rx burst mode info

Retrieve burst mode information according to the selected Tx/Rx mode and
offloads.

Signed-off-by: Apeksha Gupta <apeksha.gupta@nxp.com>
3 years agonet/dpaa: add Tx/Rx burst mode info
Apeksha Gupta [Fri, 10 Jul 2020 16:21:31 +0000 (21:51 +0530)]
net/dpaa: add Tx/Rx burst mode info

Retrieve burst mode information according to the selected Rx/Tx mode
and offloads.

Signed-off-by: Apeksha Gupta <apeksha.gupta@nxp.com>
3 years agonet/dpaa2: support per-port Rx mbuf timestamp
Hemant Agrawal [Fri, 10 Jul 2020 16:21:30 +0000 (21:51 +0530)]
net/dpaa2: support per-port Rx mbuf timestamp

DEV_RX_OFFLOAD_TIMESTAMP is per port, so the internal implementation
shall enable it on per port basis only.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
3 years agonet/dpaa2: remove Rx timestamp enable PMD API
Hemant Agrawal [Fri, 10 Jul 2020 16:21:29 +0000 (21:51 +0530)]
net/dpaa2: remove Rx timestamp enable PMD API

This experimental API is no longer required as the same
purpose can be solved with standard DEV_RX_OFFLOAD_TIMESTAMP

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
3 years agonet: check fragmented headers in non-debug as well
Andrew Rybchenko [Mon, 13 Jul 2020 14:22:34 +0000 (15:22 +0100)]
net: check fragmented headers in non-debug as well

Pseudo-header checksum calculation requires contiguous headers.
There is no any formal requirements on data location and mbuf
structure which could be used by the application.

Since

commit dfc6b2fd8da3 ("mbuf: remove Intel offload checks from generic API")

fragmented headers checks are done inside
rte_net_intel_cksum_flags_prepare() in RTE_LIBRTE_ETHDEV_DEBUG build
because it is moved from rte_validate_tx_offload() which is called
under debug only.

Make corresponding check to be done in non-debug build as well
to avoid bad accesses, incorrect checksum calculation and to
return appropriate error from Tx prepare.

Make no-offloads check more precise and do it in non-debug build
as well to avoid contiguous headers check and Tx prepare failure
if it is not actually required.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
3 years agonet/bonding: change state machine to defaulted
Weifeng Li [Sat, 18 Jul 2020 04:35:38 +0000 (12:35 +0800)]
net/bonding: change state machine to defaulted

A dpdk bonding 802.3ad network as follows:
+----------+                        +-----------+
|dpdk lacp |bond1.1 <------> bond2.1|switch lacp|
|          |bond1.2 <------> bond2.2|           |
+----------+                        +-----------+
If a fiber optic go wrong about single pass during normal running like
this:
bond1.2 -----> bond2.2 ok
bond1.2 <--x-- bond2.2 error: bond1.2 receive no LACPDU Some packets
      from switch to dpdk will choose bond2.2
      and lost.

DPDK lacp state machine will transits to the expired state if no LACPDU
is received before the current_while_timer expires. But if no LACPDU is
received before the current_while_timer expires again, DPDK lacp state
machine has no change. Bond2.2 can not change to inactive depend on the
received LACPDU.
According to IEEE 802.3ad, if no lacpdu is received before the
current_while_timer expires again, the state machine should transits
from expired to defaulted. Bond2.2 will change to inactive depend on the
LACPDU with defaulted state.

This patch adds a state machine change from expired to defaulted when no
lacpdu is received before the current_while_timer expires again
according to IEEE 802.3ad:
If no LACPDU is received before the current_while timer expires again,
the state machine transits to the DEFAULTED state. The record Default
function overwrites the current operational parameters for the Partner
with administratively configured values. This allows configuration of
aggregations and individual links when no protocol partner is present,
while still permitting an active partner to override default settings.
The update_Default_Selected function sets the Selected variable FALSE
if the Link Aggregation Group has changed. Since all operational
parameters are now set to locally administered values there can be no
disagreement as to the Link Aggregation Group, so the Matched variable
is set TRUE.

The relevant description is in the chapter 43.4.12 of the link below:
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=850426

Signed-off-by: Weifeng Li <liweifeng96@126.com>
Acked-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
3 years agonet/bonding: delete redundant code
Dongyang Pan [Sat, 4 Jul 2020 01:15:26 +0000 (09:15 +0800)]
net/bonding: delete redundant code

The function valid_bonded_port_id() has already contains function
rte_eth_dev_is_valid_port(), so delete redundant check.

Fixes: 588ae95e7983 ("net/bonding: fix port ID check")
Cc: stable@dpdk.org
Signed-off-by: Dongyang Pan <197020236@qq.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
3 years agonet/bnxt: support exact match templates
Kishore Padmanabha [Fri, 17 Jul 2020 03:25:26 +0000 (23:25 -0400)]
net/bnxt: support exact match templates

Added support for exact match templates

Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
3 years agonet/bnxt: update TruFlow resource allocation numbers
Kishore Padmanabha [Fri, 17 Jul 2020 14:14:49 +0000 (19:44 +0530)]
net/bnxt: update TruFlow resource allocation numbers

The truflow session open allocation parameters are updated to
support NAT records, L2 context regions, engress encap features.

Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
3 years agonet/bnxt: modify default egress rule for VF representor
Kishore Padmanabha [Fri, 17 Jul 2020 14:14:48 +0000 (19:44 +0530)]
net/bnxt: modify default egress rule for VF representor

The default egress rule should include buffer descriptor action
record only if the VF representor is enabled.

Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Mike Baucom <michael.baucom@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: fix null pointer dereference
Kishore Padmanabha [Fri, 17 Jul 2020 14:14:47 +0000 (19:44 +0530)]
net/bnxt: fix null pointer dereference

Avoid dereferencing a null pointer.

Fixes: 313ac35ac701 ("net/bnxt: support ULP session manager init")
Cc: stable@dpdk.org
Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Mike Baucom <michael.baucom@broadcom.com>
3 years agonet/bnxt: use SPDX license tag
Randy Schacher [Fri, 17 Jul 2020 14:14:46 +0000 (19:44 +0530)]
net/bnxt: use SPDX license tag

Update cfa_resource_types.h to use SPDX license header.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Randy Schacher <stuart.schacher@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: remove unused macro
Randy Schacher [Fri, 17 Jul 2020 14:14:45 +0000 (19:44 +0530)]
net/bnxt: remove unused macro

Remove unused define TF_MSG_TCAM_SET_DEV_DATA_SIZE.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Randy Schacher <stuart.schacher@broadcom.com>
Reviewed-by: Shahaji Bhosle <sbhosle@broadcom.com>
3 years agonet/bnxt: use NAT IPv4 action
Jay Ding [Fri, 17 Jul 2020 14:14:44 +0000 (19:44 +0530)]
net/bnxt: use NAT IPv4 action

Use NAT IPv4 instead of NAT IPv4 SRC and DST.

Signed-off-by: Jay Ding <jay.ding@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
3 years agonet/bnxt: fix exact match message size
Farah Smith [Fri, 17 Jul 2020 14:14:43 +0000 (19:44 +0530)]
net/bnxt: fix exact match message size

Fix incorrect EM message size when calling insert_em_internal.

Fixes: 98487d729b4a ("net/bnxt: cleanup and refactor session management")

Signed-off-by: Farah Smith <farah.smith@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: add matching protocol header info
Kishore Padmanabha [Fri, 17 Jul 2020 14:14:42 +0000 (19:44 +0530)]
net/bnxt: add matching protocol header info

The protocol header are implicitly matched based on the proto
field data. For instance, if ether type is set as 0x800 in the
ether header then ipv4 protocol header is assumed to be present
for template matching even if ipv4 header is not present in the
given flow pattern.

Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Mike Baucom <michael.baucom@broadcom.com>
3 years agonet/bnxt: fix accumulation of flow counters
Somnath Kotur [Fri, 17 Jul 2020 14:14:41 +0000 (19:44 +0530)]
net/bnxt: fix accumulation of flow counters

OVS-DPDK is accumulating the flow counters that are returned as part of
the flow_query API and it is being issued at least 3 times every second.
So there is no need to accumulate the counts internally in the driver.

Fixes: 306c2d28e247 ("net/bnxt: support count action in flow query")

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: enable default flows in TruFlow mode
Kishore Padmanabha [Fri, 17 Jul 2020 14:14:40 +0000 (19:44 +0530)]
net/bnxt: enable default flows in TruFlow mode

Removed the check to enable default flows only when VF representor
are enabled. It should be enabled all the time in truflow mode.

Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Mike Baucom <michael.baucom@broadcom.com>
3 years agonet/bnxt: initialize table scope parameter
Farah Smith [Fri, 17 Jul 2020 14:14:39 +0000 (19:44 +0530)]
net/bnxt: initialize table scope parameter

Initialize table scope resource manager parameter.
Clear out rm_is_allocated parms before calling as base_index was added
and used incorrectly in this instance.

Signed-off-by: Farah Smith <farah.smith@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: modify resource management scheme
Peter Spreadborough [Fri, 17 Jul 2020 14:14:38 +0000 (19:44 +0530)]
net/bnxt: modify resource management scheme

Add support for new resource manager to manage CFA resources.
TCAM is split into high and low regions now and CFA resource types
are being updated accordingly.

Signed-off-by: Peter Spreadborough <peter.spreadborough@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Farah Smith <farah.smith@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/ixgbe: fix flow control status
Guinan Sun [Sat, 23 May 2020 05:22:39 +0000 (05:22 +0000)]
net/ixgbe: fix flow control status

mac_ctrl_frame_fwd assignment is missing, so
setting mac_ctrl_frame_fwd should be added in
ixgbe_flow_ctrl_get().
The patch fixes the issue.

Fixes: 56ea46a997b7 ("ethdev: retrieve flow control configuration")
Cc: stable@dpdk.org
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Bo Chen <box.c.chen@intel.com>
3 years agonet/ixgbe: fix MAC control frame forward
Guinan Sun [Sat, 23 May 2020 05:22:38 +0000 (05:22 +0000)]
net/ixgbe: fix MAC control frame forward

mac_ctrl_frame_fwd shouldn't be cleared when port stop,
otherwise it will be inconsistent with the actual status.
This patch fixes the issue.

Fixes: a524f550da6e ("net/ixgbe: fix flow control mode setting")
Cc: stable@dpdk.org
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/i40e: fix filter pctype
Shougang Wang [Wed, 15 Jul 2020 08:08:10 +0000 (08:08 +0000)]
net/i40e: fix filter pctype

The i40e_filter_pctype TCP_SYN_NO_ACK, UNICAST_IPV4_UDP and
MULTICAST_IPV4_UDP for x722 were missing when translating RSS type to
i40e_filter_pctype. This patch fixes it.

Fixes: da7018ec29d4 ("net/i40e: fix queue region in RSS flow")
Cc: stable@dpdk.org
Signed-off-by: Shougang Wang <shougangx.wang@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>