dpdk.git
2 years agonet/mlx5: do not close stdin on error
David Marchand [Thu, 14 Oct 2021 11:37:18 +0000 (13:37 +0200)]
net/mlx5: do not close stdin on error

If for any reason, a socket could not be opened, mlx5_pmd_socket_init()
could close the 0 fd (which is valid, and has a fair chance to be stdin),
since server_socket == 0 from the variable being in .bss.

Fixes: e6cdc54cc0ef ("net/mlx5: add socket server for external tools")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
2 years agodoc: add metering limitation in mlx5 guide
Li Zhang [Wed, 27 Oct 2021 09:13:58 +0000 (12:13 +0300)]
doc: add metering limitation in mlx5 guide

A meter policy with RSS/Queue action is not supported
when dv_xmeta_en enabled.

When dv_xmeta_en enabled in legacy creating flow,
it will split into two flows
(one set_tag with jump flow and one RSS/queue action flow).
For meter policy as termination table,
it cannot split flow and
cannot support when dv_xmeta_en enabled.

Fixes: 51ec04dc7bcf ("net/mlx5: connect meter policy to created flows")
Cc: stable@dpdk.org
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2 years agonet/mlx5: allow meta modifications in legacy mode
Alexander Kozyrev [Wed, 27 Oct 2021 03:52:34 +0000 (06:52 +0300)]
net/mlx5: allow meta modifications in legacy mode

The MODIFY_FIELD RTE action rejects copy to/from metadata
in case of the legacy mode extensive flow metadata support.
It is not consistent with SET_META action that has no such
restriction imposed. Registers A or B are used for META in
legacy mode. Allow meta modifications in legacy mode as well.

On other hand, SET_META rejects actions in case register C
is not available even though it is not needed in legacy mode.
Skip this check for legacy mode and allow setting META.

Fixes: edf325d421e8 ("net/mlx5: check extended metadata for meta modification")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agonet/mlx5: fix Tx meta width for modify field flow rule
Alexander Kozyrev [Tue, 26 Oct 2021 15:13:57 +0000 (18:13 +0300)]
net/mlx5: fix Tx meta width for modify field flow rule

Register C is used for the metadata within NIC Rx domain.
And its width can vary from 0 to 32 bits depending on
its kernel usage. But it is not the case within NIC Tx domain,
register A is always 32 bits there. Fix metadata width detection
for the modify_field flow API within NIC Tx domain.

Fixes: 6d5735c1cba2 ("net/mlx5: fix meta register conversion for extensive mode")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agonet/ngbe: support Tx done cleanup
Jiawen Wu [Thu, 21 Oct 2021 09:50:23 +0000 (17:50 +0800)]
net/ngbe: support Tx done cleanup

Add support for API rte_eth_tx_done_cleanup().

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support Rx and Tx descriptor status
Jiawen Wu [Thu, 21 Oct 2021 09:50:22 +0000 (17:50 +0800)]
net/ngbe: support Rx and Tx descriptor status

Supports to get the number of used Rx descriptors,
and check the status of Rx and Tx descriptors.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support Rx and Tx queue info
Jiawen Wu [Thu, 21 Oct 2021 09:50:21 +0000 (17:50 +0800)]
net/ngbe: support Rx and Tx queue info

Add Rx and Tx queue information get operation.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support timesync
Jiawen Wu [Thu, 21 Oct 2021 09:50:20 +0000 (17:50 +0800)]
net/ngbe: support timesync

Add to support IEEE1588/802.1AS timestamping, and IEEE1588 timestamp
offload on Tx.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support register dump
Jiawen Wu [Thu, 21 Oct 2021 09:50:19 +0000 (17:50 +0800)]
net/ngbe: support register dump

Support to dump registers.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support EEPROM dump
Jiawen Wu [Thu, 21 Oct 2021 09:50:18 +0000 (17:50 +0800)]
net/ngbe: support EEPROM dump

Support to get and set device EEPROM data.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support device LED on/off
Jiawen Wu [Thu, 21 Oct 2021 09:50:17 +0000 (17:50 +0800)]
net/ngbe: support device LED on/off

Support device LED on and off.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support flow control
Jiawen Wu [Thu, 21 Oct 2021 09:50:16 +0000 (17:50 +0800)]
net/ngbe: support flow control

Support to get and set flow control.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: add mailbox process operations
Jiawen Wu [Thu, 21 Oct 2021 09:50:15 +0000 (17:50 +0800)]
net/ngbe: add mailbox process operations

Add check operation for vf function level reset,
mailbox messages and ack from vf.
Waiting to process the messages.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support SR-IOV
Jiawen Wu [Thu, 21 Oct 2021 09:50:14 +0000 (17:50 +0800)]
net/ngbe: support SR-IOV

Initialize and configure PF module to support SRIOV.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support RSS hash
Jiawen Wu [Thu, 21 Oct 2021 09:50:13 +0000 (17:50 +0800)]
net/ngbe: support RSS hash

Support RSS hashing on Rx, and configuration of RSS hash computation.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support MAC filters
Jiawen Wu [Thu, 21 Oct 2021 09:50:12 +0000 (17:50 +0800)]
net/ngbe: support MAC filters

Add MAC addresses to filter incoming packets, support to set
multicast addresses to filter. And support to set unicast table array.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support loopback mode
Jiawen Wu [Thu, 21 Oct 2021 09:50:11 +0000 (17:50 +0800)]
net/ngbe: support loopback mode

Support loopback operation mode.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support FW version query
Jiawen Wu [Thu, 21 Oct 2021 09:50:10 +0000 (17:50 +0800)]
net/ngbe: support FW version query

Add firmware version get operation.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support device promiscuous and allmulticast mode
Jiawen Wu [Thu, 21 Oct 2021 09:50:09 +0000 (17:50 +0800)]
net/ngbe: support device promiscuous and allmulticast mode

Support to enable/disable promiscuous and allmulticast mode for a port.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support MTU set
Jiawen Wu [Thu, 21 Oct 2021 09:50:08 +0000 (17:50 +0800)]
net/ngbe: support MTU set

Support updating port MTU.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support device xstats
Jiawen Wu [Thu, 21 Oct 2021 09:50:07 +0000 (17:50 +0800)]
net/ngbe: support device xstats

Add device extended stats get from reading hardware registers.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support basic statistics
Jiawen Wu [Thu, 21 Oct 2021 09:50:06 +0000 (17:50 +0800)]
net/ngbe: support basic statistics

Support to read and clear basic statistics.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support VLAN offload and VLAN filter
Jiawen Wu [Thu, 21 Oct 2021 09:50:05 +0000 (17:50 +0800)]
net/ngbe: support VLAN offload and VLAN filter

Support to set VLAN and QinQ offload, and filter of a VLAN tag
identifier.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support jumbo frame
Jiawen Wu [Thu, 21 Oct 2021 09:50:04 +0000 (17:50 +0800)]
net/ngbe: support jumbo frame

Add to support Rx jumbo frames.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support CRC offload
Jiawen Wu [Thu, 21 Oct 2021 09:50:03 +0000 (17:50 +0800)]
net/ngbe: support CRC offload

Support to strip or keep CRC in Rx path.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support Rx/Tx burst mode info
Jiawen Wu [Thu, 21 Oct 2021 09:50:02 +0000 (17:50 +0800)]
net/ngbe: support Rx/Tx burst mode info

Support to get Rx/Tx burst mode info.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support TSO
Jiawen Wu [Thu, 21 Oct 2021 09:50:01 +0000 (17:50 +0800)]
net/ngbe: support TSO

Add transmit datapath with offloads, and support TCP segmentation
offload.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support Rx checksum offload
Jiawen Wu [Thu, 21 Oct 2021 09:50:00 +0000 (17:50 +0800)]
net/ngbe: support Rx checksum offload

Support IP/L4 checksum on Rx, and convert it to mbuf flags.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support scattered Rx
Jiawen Wu [Thu, 21 Oct 2021 09:49:59 +0000 (17:49 +0800)]
net/ngbe: support scattered Rx

Add scattered Rx function to support receiving segmented mbufs.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/ngbe: support packet type query
Jiawen Wu [Thu, 21 Oct 2021 09:49:58 +0000 (17:49 +0800)]
net/ngbe: support packet type query

Add packet type macro definition and convert ptype to ptid.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2 years agonet/hns3: fix mailbox communication with HW
Min Hu (Connor) [Thu, 28 Oct 2021 11:52:30 +0000 (19:52 +0800)]
net/hns3: fix mailbox communication with HW

Mailbox is the communication mechanism between SW and HW. There exist
two approaches for SW to recognize mailbox message from HW. One way is
using match_id, the other is to compare the message code. The two
approaches are independent and used in different scenarios.

But for the second approach, "next_to_use" should be updated and written
to HW register. If it not done, HW do not know the position SW steps,
then, the communication between SW and HW will turn to be failed.

Fixes: dbbbad23e380 ("net/hns3: fix VF handling LSC event in secondary process")
Cc: stable@dpdk.org
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
2 years agomempool/cnxk: postpone devargs parsing
Volodymyr Fialko [Thu, 28 Oct 2021 22:14:46 +0000 (00:14 +0200)]
mempool/cnxk: postpone devargs parsing

Use roc_npa_lf_init_cb_register() scheme to register
callback for max_pools argument parsing.
This will remove the dependency on the order of PCI
devices probed.

Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2 years agocommon/cnxk: support ROC NPA init callback
Volodymyr Fialko [Thu, 28 Oct 2021 22:14:45 +0000 (00:14 +0200)]
common/cnxk: support ROC NPA init callback

Add support for registering callback for ROC NPA init.

Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2 years agomempool/cnxk: fix max pools argument parsing
Volodymyr Fialko [Thu, 28 Oct 2021 22:14:44 +0000 (00:14 +0200)]
mempool/cnxk: fix max pools argument parsing

roc_idev_npa_maxpools_set() expects max_pools original value,
not the AURA.

Fixes: 0a50a5aad299 ("mempool/cnxk: add device probe/remove")
Cc: stable@dpdk.org
Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2 years agonet/octeontx_ep: remove octeontx2 dependency
Nalla Pradeep [Thu, 28 Oct 2021 04:11:13 +0000 (21:11 -0700)]
net/octeontx_ep: remove octeontx2 dependency

octeontx_ep driver's dependency on octeontx2 common code is
removed as going forward ep driver will include files from
its own path.

Signed-off-by: Nalla Pradeep <pnalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2 years agocommon/cnxk: update mailbox version to 0xb
Kiran Kumar K [Mon, 25 Oct 2021 03:55:26 +0000 (09:25 +0530)]
common/cnxk: update mailbox version to 0xb

Sync mailbox definition with AF kernel driver.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2 years agocommon/octeontx2: update mailbox version to 0xb
Kiran Kumar K [Mon, 25 Oct 2021 03:55:25 +0000 (09:25 +0530)]
common/octeontx2: update mailbox version to 0xb

Sync mailbox definition with AF kernel driver.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2 years agovhost: increase number of async IO vectors
Maxime Coquelin [Tue, 26 Oct 2021 16:29:04 +0000 (18:29 +0200)]
vhost: increase number of async IO vectors

This patch increases the number of IO vectors for the
asynchronous data path from 512 to 2048. It has been
reported during testing the starvation of IO vectors
during iperf benchmark with 64KB packet size.

As there are no direct relationship between
VHOST_MAX_ASYNC_VEC and BUF_VECTOR_MAX, this patch also
assign VHOST_MAX_ASYNC_VEC value directly instead of being
a multiple of BUF_VECTOR_MAX.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: merge sync and async mbuf to descriptor filling
Maxime Coquelin [Tue, 26 Oct 2021 16:29:03 +0000 (18:29 +0200)]
vhost: merge sync and async mbuf to descriptor filling

This patches merges copy_mbuf_to_desc() used by the sync
path with async_mbuf_to_desc() used by the async path.

Most of these complex functions are identical, so merging
them will make the maintenance easier.

In order not to degrade performance, the patch introduces
a boolean function parameter to specify whether it is called
in async context. This boolean is statically passed to this
always-inlined function, so the compiler will optimize this
out.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: prepare sync for mbuf to descriptor refactoring
Maxime Coquelin [Tue, 26 Oct 2021 16:29:02 +0000 (18:29 +0200)]
vhost: prepare sync for mbuf to descriptor refactoring

This patch extracts the descriptors buffers filling
from copy_mbuf_to_desc() into a dedicated function as a
preliminary step of merging copy_mubf_to_desc() and
async_mbuf_to_desc().

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: prepare async for mbuf to descriptor refactoring
Maxime Coquelin [Tue, 26 Oct 2021 16:29:01 +0000 (18:29 +0200)]
vhost: prepare async for mbuf to descriptor refactoring

This patch extracts the IO vectors filling from
async_mbuf_to_desc() into a dedicated function as a
preliminary step of merging copy_mubf_to_desc() and
async_mbuf_to_desc().

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: simplify getting first in-flight index
Maxime Coquelin [Tue, 26 Oct 2021 16:29:00 +0000 (18:29 +0200)]
vhost: simplify getting first in-flight index

This patch reworks the function getting the index
for the first packet in-flight.

When this index turns out to be zero, let's use the simple
path. Doing that avoid having to do a modulo with the
virtqueue size.

The patch also rename the function for better clarification,
and only pass the virtqueue metadata pointer, as all the
needed information are stored there.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: simplify async enqueue completion
Maxime Coquelin [Tue, 26 Oct 2021 16:28:59 +0000 (18:28 +0200)]
vhost: simplify async enqueue completion

vhost_poll_enqueue_completed() assumes some inflight
packets could have been completed in a previous call but
not returned to the application. But this is not the case,
since check_completed_copies callback is never called with
more than the current count as argument.

In other words, async->last_pkts_n is always 0. Removing it
greatly simplifies the function.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: remove notion of async descriptor
Maxime Coquelin [Tue, 26 Oct 2021 16:28:58 +0000 (18:28 +0200)]
vhost: remove notion of async descriptor

Now that IO vectors iterator have been simplified, the
rte_vhost_async_desc struct only contains a pointer on
the iterator array stored in the async metadata.

This patch removes it, and pass directly the iterators
array pointer to the transfer_data callback. Doing that,
we avoid declaring the descriptor array in the stack, and
also avoid the cost of filling it.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: improve IO vector logic
Maxime Coquelin [Tue, 26 Oct 2021 16:28:57 +0000 (18:28 +0200)]
vhost: improve IO vector logic

IO vectors and their iterators arrays were part of the
async metadata but not their indexes.

In order to makes this more consistent, the patch adds the
indexes to the async metadata. Doing that, we can avoid
triggering DMA transfer within the loop as it IO vector
index overflow is now prevented in the async_mbuf_to_desc()
function.

Note that previous detection mechanism was broken
since the overflow already happened when detected, so OOB
memory access would already have happened.

With this changes done, virtio_dev_rx_async_submit_split()
and virtio_dev_rx_async_submit_packed() can be further
simplified.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: remove useless fields in async iterator struct
Maxime Coquelin [Tue, 26 Oct 2021 16:28:56 +0000 (18:28 +0200)]
vhost: remove useless fields in async iterator struct

Offset and count fields are unused and so can be removed.
The offset field was actually in the Vhost example, but
in a way that does not make sense.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: introduce specific iovec structure
Maxime Coquelin [Tue, 26 Oct 2021 16:28:55 +0000 (18:28 +0200)]
vhost: introduce specific iovec structure

This patch introduces rte_vhost_iovec struct that contains
both source and destination addresses since we always have
a 1:1 mapping between source and destination. While using
the standard iovec struct might have seemed better, having
to duplicate IO vectors and its iterators is memory
inefficient and make the implementation more complex.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: remove async batch threshold
Maxime Coquelin [Tue, 26 Oct 2021 16:28:54 +0000 (18:28 +0200)]
vhost: remove async batch threshold

Reaching the async batch threshold was one of the condition
to trigger the DMA transfer. However, this condition was
never met since the threshold value is 32, same as the
MAX_PKT_BURST value.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: simplify async IO vectors iterators
Maxime Coquelin [Tue, 26 Oct 2021 16:28:53 +0000 (18:28 +0200)]
vhost: simplify async IO vectors iterators

This patch splits the iterator arrays in two, one for
source and one for destination. The goal is make the code
easier to understand.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: simplify async IO vectors
Maxime Coquelin [Tue, 26 Oct 2021 16:28:52 +0000 (18:28 +0200)]
vhost: simplify async IO vectors

IO vectors implementation is unnecessarily complex, mixing
source and destinations vectors in the same array.

This patch declares two arrays, one for the source and one
for the destination. It also gets rid of seg_awaits variable
in both packed and split implementation, which is the same
as iovec_idx.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: hide in-flight async structure
Maxime Coquelin [Tue, 26 Oct 2021 16:28:51 +0000 (18:28 +0200)]
vhost: hide in-flight async structure

This patch moves async_inflight_info struct to internal
header since it should not be part of the API.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agovhost: move async data in dedicated structure
Maxime Coquelin [Tue, 26 Oct 2021 16:28:50 +0000 (18:28 +0200)]
vhost: move async data in dedicated structure

This patch moves async-related metadata from vhost_virtqueue
to a dedicated struct. It makes it clear which fields are
async related, and also saves some memory when async feature
is not in use.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
2 years agonet/virtio: fix link update in speed feature
Ivan Ilchenko [Fri, 22 Oct 2021 13:17:54 +0000 (16:17 +0300)]
net/virtio: fix link update in speed feature

Link update callback reports speed/duplex based on data
filled on device initialization. This is wrong in case of
VIRTIO_NET_F_SPEED_DUPLEX is negotiated since link could
be down at this time. Fix this function to actually
update the HW data in this case with respect to the fact
that specifying speed via devarg is a highest priority.

Fixes: 1357b4b36246 ("net/virtio: support Virtio link speed feature")
Cc: stable@dpdk.org
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2 years agoexamples/l3fwd-power: support virtio/vhost
Miao Li [Mon, 25 Oct 2021 14:47:25 +0000 (14:47 +0000)]
examples/l3fwd-power: support virtio/vhost

In l3fwd-power, there is default port configuration which requires
RSS and IPv4/UDP/TCP checksum. Once device does not support these,
the l3fwd-power will exit and report an error.
This patch updates the port configuration based on device capabilities
after getting the device information to support devices like virtio
and vhost.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2 years agopower: support missing Rx queue info
Miao Li [Mon, 25 Oct 2021 14:47:24 +0000 (14:47 +0000)]
power: support missing Rx queue info

Since some vdevs like virtio and vhost do not support rxq_info_get and
queue state inquiry, the error return value -ENOTSUP need to be ignored
when queue_stopped cannot get rx queue information and rx queue state.
This patch changes the return value of queue_stopped when
rte_eth_rx_queue_info_get return -ENOTSUP to support vdevs which cannot
provide rx queue information and rx queue state enable power management.

Fixes: 209fd585456c ("power: make ethdev power management thread unsafe")
Cc: stable@dpdk.org
Signed-off-by: Miao Li <miao.li@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2 years agonet/vhost: support power monitor
Miao Li [Mon, 25 Oct 2021 14:47:23 +0000 (14:47 +0000)]
net/vhost: support power monitor

According to current semantics of power monitor, this commit adds a
callback function to decide whether aborts the sleep by checking
current value against the expected value and vhost_get_monitor_addr
to provide address to monitor. When no packet come in, the value of
address will not be changed and the running core will sleep. Once
packets arrive, the value of address will be changed and the running
core will wakeup.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2 years agovhost: add power monitor API
Miao Li [Mon, 25 Oct 2021 14:47:22 +0000 (14:47 +0000)]
vhost: add power monitor API

This commit defines rte_vhost_power_monitor_cond which is used to pass
some information to vhost driver. The information is including the
address to monitor, the expected value, the mask to extract value read
from 'addr', the value size of monitor address, the match flag used to
distinguish the value used to match something or not match something.

Vhost driver can use these information to fill rte_power_monitor_cond.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2 years agonet/virtio: support power monitor
Miao Li [Mon, 25 Oct 2021 14:47:21 +0000 (14:47 +0000)]
net/virtio: support power monitor

According to current semantics of power monitor, this commit adds a
callback function to decide whether aborts the sleep by checking
current value against the expected value and virtio_get_monitor_addr
to provide address to monitor. When no packet come in, the value of
address will not be changed and the running core will sleep. Once
packets arrive, the value of address will be changed and the running
core will wakeup.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2 years agovhost: remove async DMA map status
Xuan Ding [Wed, 27 Oct 2021 10:00:26 +0000 (10:00 +0000)]
vhost: remove async DMA map status

Async DMA map status flag was added to prevent the unnecessary unmap
when DMA devices bound to kernel driver. This brings maintenance cost
for a lot of code. This patch removes the DMA map status by using
rte_errno instead.

This patch relies on the following patch to fix a partial
unmap check in vfio unmapping API.
[1] https://www.mail-archive.com/dev@dpdk.org/msg226464.html

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2 years agoapp/testpmd: add missing flow types in port info
Maxime Coquelin [Wed, 27 Oct 2021 14:22:13 +0000 (16:22 +0200)]
app/testpmd: add missing flow types in port info

This patch adds missing IPv6-Ex and GTPU flow types to port
info command. It also add the same definitions to
str2flowtype(), used to configure flow director.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
2 years agonet/mlx5: fix RSS RETA update
Maxime Coquelin [Wed, 27 Oct 2021 14:22:12 +0000 (16:22 +0200)]
net/mlx5: fix RSS RETA update

This patch fixes RETA updating for entries above 64.
Without that, these entries are never updated as
calculated mask value will always be 0.

Fixes: 634efbc2c8c0 ("mlx5: support RETA query and update")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agoapp/testpmd: fix RSS type display
Maxime Coquelin [Wed, 27 Oct 2021 14:22:11 +0000 (16:22 +0200)]
app/testpmd: fix RSS type display

This patch fixes the display of the RSS hash types
configured in the port, which displayed "all" even
if only a single type was configured

Fixes: 3c90743dd3b9 ("app/testpmd: support more types for flow RSS")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2 years agoapp/testpmd: fix RSS key length
Maxime Coquelin [Wed, 27 Oct 2021 14:22:10 +0000 (16:22 +0200)]
app/testpmd: fix RSS key length

port_rss_hash_key_update() initializes rss_conf with the
RSS key configuration provided  by the user, but it calls
rte_eth_dev_rss_hash_conf_get() before calling
rte_eth_dev_rss_hash_update(), which overrides the parsed
RSS config.

While the RSS key value is set again after, this is not
the case of the key length. It could cause out of bounds
access if the key length parsed is smaller than the one
read from rte_eth_dev_rss_hash_conf_get().

This patch restores the key length before the
rte_eth_dev_rss_hash_update() call to ensure the RSS key
value/length pair is consistent.

Fixes: 8205e241b2b0 ("app/testpmd: add missing type to RSS hash commands")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2 years agonet/virtio: support RSS
Maxime Coquelin [Wed, 27 Oct 2021 14:22:09 +0000 (16:22 +0200)]
net/virtio: support RSS

Provide the capability to update the hash key, hash types
and RETA table on the fly (without needing to stop/start
the device). However, the key length and the number of RETA
entries are fixed to 40B and 128 entries respectively. This
is done in order to simplify the design, but may be
revisited later as the Virtio spec provides this
flexibility.

Note that only VIRTIO_NET_F_RSS support is implemented,
VIRTIO_NET_F_HASH_REPORT, which would enable reporting the
packet RSS hash calculated by the device into mbuf.rss, is
not yet supported.

Regarding the default RSS configuration, it has been
chosen to use the default Intel ixgbe key as default key,
and default RETA is a simple modulo between the hash and
the number of Rx queues.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2 years agodoc: remove obsolete option from bnxt guide
Ajit Khaparde [Wed, 27 Oct 2021 17:26:20 +0000 (10:26 -0700)]
doc: remove obsolete option from bnxt guide

host-based-truflow devarg is not used anymore to enable host based
flow table management functionality TruFlow. Instead this feature is
now driven by a capability indicated by the firmware.

TruFlow is not in tech preview anymore. Update the doc accordingly.

Fixes: da3731e2ea00 ("net/bnxt: check FW capability to support TRUFLOW")

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2 years agodoc: update NIC feature matrix for bnxt
Ajit Khaparde [Wed, 27 Oct 2021 17:26:19 +0000 (10:26 -0700)]
doc: update NIC feature matrix for bnxt

Support for runtime Rx/Tx queue setup and inner RSS is not updated.
Update feature matrix for bnxt PMD.

Fixes: 7ed45b1a7c0f ("net/bnxt: support RSS hash selection")
Fixes: 0105ea1296c9 ("net/bnxt: support runtime queue setup")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2 years agonet/bnxt: remove stale compilation option
Ajit Khaparde [Wed, 27 Oct 2021 17:26:18 +0000 (10:26 -0700)]
net/bnxt: remove stale compilation option

Remove a stale compile option from meson build file.
RTE_LIBRTE_BNXT_TF sneaked in incorrectly.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2 years agonet/ice: remove VSI update on DCF reset by PF
Dapeng Yu [Thu, 28 Oct 2021 09:34:43 +0000 (17:34 +0800)]
net/ice: remove VSI update on DCF reset by PF

After DCF is reset by PF, the VSI update service is unable to be
completed since the DCF resource is invalid.

This patch removes the call to service that updates VSI since it is
useless and output too many error messages.

Fixes: c7e1a1a3bfeb ("net/ice: refactor DCF VLAN handling")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2 years agonet/iavf: add watchdog for VF FLR
Radu Nicolau [Thu, 28 Oct 2021 16:04:59 +0000 (17:04 +0100)]
net/iavf: add watchdog for VF FLR

Add watchdog to iAVF PMD which support monitoring the VFLR register. If
the device is not already in reset then if a VF reset in progress is
detected then notify user through callback and set into reset state.
If the device is already in reset then poll for completion of reset.

The watchdog is disabled by default, to enable it set
IAVF_DEV_WATCHDOG_PERIOD to a non zero value (microseconds)

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
2 years agonet/iavf: support xstats for inline IPsec crypto
Radu Nicolau [Thu, 28 Oct 2021 16:04:58 +0000 (17:04 +0100)]
net/iavf: support xstats for inline IPsec crypto

Add per queue counters for maintaining statistics for inline IPsec
crypto offload, which can be retrieved through the
rte_security_session_stats_get() with more detailed errors through the
rte_ethdev xstats.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
2 years agonet/iavf: support IPsec inline crypto
Radu Nicolau [Thu, 28 Oct 2021 16:04:57 +0000 (17:04 +0100)]
net/iavf: support IPsec inline crypto

Add support for inline crypto for IPsec, for ESP transport and
tunnel over IPv4 and IPv6, as well as supporting the offload for
ESP over UDP, and in conjunction with TSO for UDP and TCP flows.
Implement support for rte_security packet metadata

Add definition for IPsec descriptors, extend support for offload
in data and context descriptor to support

Add support to virtual channel mailbox for IPsec Crypto request
operations. IPsec Crypto requests receive an initial acknowledgment
from physical function driver of receipt of request and then an
asynchronous response with success/failure of request including any
response data.

Add enhanced descriptor debugging

Refactor of scalar tx burst function to support integration of offload

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
2 years agonet/iavf: support asynchronous virtual channel message
Radu Nicolau [Thu, 28 Oct 2021 16:04:56 +0000 (17:04 +0100)]
net/iavf: support asynchronous virtual channel message

Add support for asynchronous virtual channel messages, specifically for
inline IPsec messages.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
2 years agonet/iavf: rework Tx path
Radu Nicolau [Thu, 28 Oct 2021 16:04:55 +0000 (17:04 +0100)]
net/iavf: rework Tx path

Rework the Tx path and Tx descriptor usage in order to
allow for better use of offload flags and to facilitate enabling of
inline crypto offload feature.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
2 years agocommon/iavf: support IPsec inline crypto
Radu Nicolau [Thu, 28 Oct 2021 16:04:54 +0000 (17:04 +0100)]
common/iavf: support IPsec inline crypto

Add support for inline crypto for IPsec.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2 years agonet: fix build with sparse on L2TPv2 bitfields
David Marchand [Thu, 28 Oct 2021 10:14:28 +0000 (12:14 +0200)]
net: fix build with sparse on L2TPv2 bitfields

An external project that wants to do additional checks on fields
endianness can remap rte_beXX types to instrumented types and use
sparse.

The current code breaks OVS build with sparse:
../../lib/ofp-packet.c: note: in included file (through
  .../ovs/dpdk-dir/build/include/rte_flow.h, ../../lib/netdev-dpdk.h,
  ../../lib/dp-packet.h):
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:92:37:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:93:37:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:94:40:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:95:37:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:96:40:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:97:37:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:98:37:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:99:40:
  error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:100:39:
  error: invalid bitfield specifier for type restricted ovs_be16.
make[3]: *** [lib/ofp-packet.lo] Error 1

Use simple uint16_t types for bitfields in L2TPv2 struct.

Fixes: 3a929df1f286 ("ethdev: support L2TPv2 and PPP procotol")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2 years agoapp/testpmd: fix MTU configuration before device start
Andrew Rybchenko [Sun, 24 Oct 2021 16:42:37 +0000 (19:42 +0300)]
app/testpmd: fix MTU configuration before device start

There is no point to do rte_eth_dev_mtu_set() before configure since
set MTU value is overwritten on configure anyway. So, setting of MTU
before configure is rejected now on ethdev level.

If testpmd is going to do configure (e.g. just after testpmd start
with disabled devices start up or any configuration changes in stopped
state which require reconfigure), just save requested MTU in device
config to be applied on reconfigure.

Fixes: 1bb4a528c41f ("ethdev: fix max Rx packet length")
Fixes: b26bee10ee37 ("ethdev: forbid MTU set before device configure")

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2 years agoapp/testpmd: fix L2TPv2 message type
Jie Wang [Wed, 27 Oct 2021 02:01:52 +0000 (10:01 +0800)]
app/testpmd: fix L2TPv2 message type

In "msg_type |= 0xc800", wider "51200" has high-order bits (0xc800)
that don't affect the narrower left-hand side.

This patch fixes coverity issue by changing the definition type of
"msg_type" from uint8_t to uint16_t.

Coverity issue: 373651
Fixes: 748530f0354e ("app/testpmd: support L2TPv2 and PPP protocol pattern")

Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2 years agocommon/cnxk: fix build with -O1
Tejasree Kondoj [Wed, 27 Oct 2021 13:12:59 +0000 (18:42 +0530)]
common/cnxk: fix build with -O1

Fixing build failure with EXTRA_CFLAGS='-O1'.

Fixes: d85f9749f915 ("common/cnxk: add hash generation API")

Reported-by: Longfeng Liang <longfengx.liang@intel.com>
Signed-off-by: Tejasree Kondoj <ktejasree@marvell.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
2 years agonet/bnxt: fix flow RSS failure handling
Kalesh AP [Thu, 28 Oct 2021 02:29:44 +0000 (07:59 +0530)]
net/bnxt: fix flow RSS failure handling

With commit 239695f754cb ("net/bnxt: enhance RSS action support"),
when bnxt_hwrm_vnic_rss_cfg() call fails, driver was not setting
flow error using "rte_flow_error_set".

Fixes: 239695f754cb ("net/bnxt: enhance RSS action support")

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
2 years agonet/bnxt: refactor Rx ring cleanup for representors
Ajit Khaparde [Tue, 26 Oct 2021 05:14:55 +0000 (22:14 -0700)]
net/bnxt: refactor Rx ring cleanup for representors

Rx ring for representors does not use aggregation rings for Rx.
Instead they use simple software buffers for handling Rx packets.
So there is no need to use the same cleanup routine as done by
the non-representor code path.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
2 years agonet/bnxt: fix RSS action parser
Ajit Khaparde [Tue, 26 Oct 2021 05:14:32 +0000 (22:14 -0700)]
net/bnxt: fix RSS action parser

Minor fixes are needed in the RTE_FLOW RSS action parser.
1. Update the comment in the parser to indicate RSS level 1 implies RSS
   on outer header.
2. RSS action will not be supported if level is > 1.
3. RSS action will not be supported if user or application specifies
   MARK or COUNT action.
4. If RSS types is not specified i.e., is 0, the best effort RSS should
   use IPv4 and IPv6 headers. Currently we are considering only IPv4.

Fixes: 239695f754cb ("net/bnxt: enhance RSS action support")

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
2 years agonet/bnxt: fix RSS behavior on Thor
Kalesh AP [Tue, 26 Oct 2021 05:14:30 +0000 (10:44 +0530)]
net/bnxt: fix RSS behavior on Thor

Move the Rx queue state update before bnxt_setup_one_vnic()
is called. For Thor, rxq->rx_started and eth_dev->data->rx_queue_state[]
needs to be set for all queues before bnxt_hwrm_vnic_cfg() or
bnxt_vnic_rss_configure() are called.

Fixes: 0105ea1296c9 ("net/bnxt: support runtime queue setup")
Cc: stable@dpdk.org
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
2 years agonet/mlx5: fix integrity item validation and translation
Gregory Etelson [Tue, 26 Oct 2021 09:25:43 +0000 (12:25 +0300)]
net/mlx5: fix integrity item validation and translation

Integrity item validation and translation must verify that integrity
item bits match L3 and L4 items in flow rule pattern.
For cases when integrity item was positioned before L3 header, such
verification must be split into two stages.
The first stage detects integrity flow item and makes initializations
for the second stage.
The second stage is activated after PMD completes processing of all
flow items in rule pattern. PMD accumulates information about flow
items in flow pattern. When all pattern flow items were processed,
PMD can apply that data to complete integrity item validation
and translation.

Fixes: 79f8952783d0 ("net/mlx5: support integrity flow item")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agonet/mlx5: fix integrity match on inner and outer headers
Gregory Etelson [Tue, 26 Oct 2021 09:25:42 +0000 (12:25 +0300)]
net/mlx5: fix integrity match on inner and outer headers

MLX5 PMD can match on integrity bits for inner and outer headers in
a single flow.
That means a single flow rule can reference both inner and outer
integrity bits. That is implemented by adding 2 flow integrity items
to a rule - one item for outer integrity bits and other for
inner integrity bits.
Integrity item `level` parameter specifies what part is being
targeted.

Current PMD treated integrity items for outer and inner headers as
the same.
The patch separates PMD verifications for inner and outer integrity
items.

Fixes: 79f8952783d0 ("net/mlx5: support integrity flow item")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agonet/mlx5: enhance flow dump
Haifei Luo [Tue, 26 Oct 2021 03:56:32 +0000 (06:56 +0300)]
net/mlx5: enhance flow dump

Multiple rules could use the same encap_decap/modify_hdr/counter action.
The flow dump data could be duplicated.

To avoid redundancy, flow dump value is based on the actions' pointer
instead of previous rules' pointer.

For counter, the data is stored in cmng of priv->sh.
For encap_decap/modify_hdr, the data stored in encaps_decaps/modify_cmds.
Traverse the fields and get action's pointer and information.

Formats are same for information in the dump except "id" stands for
actions' pointer:
    Counter:     rec_type,id,hits,bytes
    Modify_hdr:  rec_type,id,actions_number,actions
    Encap_decap: rec_type,id,buf

Signed-off-by: Haifei Luo <haifeil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agonet/mlx5: optimize device spawn time with representors
Jiawei Wang [Wed, 27 Oct 2021 10:35:10 +0000 (13:35 +0300)]
net/mlx5: optimize device spawn time with representors

During the device spawn process, mlx5 PMD queried the available flow
priorities by calling mlx5_flow_discover_priorities, queried
if the DR drop action was supported on the root table by calling
the mlx5_flow_discover_dr_action_support routine, and queried the
availability of metadata register C by calling mlx5_flow_discover_mreg_c

These functions created the test flows to get the supported fields, and
at the end destroyed the test flows. The test flows in the first two
functions was created on the root table.
If the device was spawned with multiple representors, these test flows
were created and destroyed on each representor as well. The above
operations took a significant amount of init time during the device
spawn.

This patch optimizes the device discover functions, if there is
the device with multiple representors (VF/SF) being spawned,
the priority and drop action and metadata register support check can be
done only ones and check results can be shared for all representors.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agocommon/mlx5: optimize debug log
Sean Zhang [Mon, 11 Oct 2021 08:46:03 +0000 (11:46 +0300)]
common/mlx5: optimize debug log

Remove debug log inside of mlx5_list_init to avoid flooding debug
messages when creating hash list with large actual size.

Fixes: 9c373c524bae ("common/mlx5: move list utility from net driver")
Cc: stable@dpdk.org
Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2 years agonet/mlx5: support socket direct mode bonding
Rongwei Liu [Tue, 26 Oct 2021 08:48:30 +0000 (11:48 +0300)]
net/mlx5: support socket direct mode bonding

In socket direct mode, it's possible to bind any two (maybe four
in future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore,

Kernel driver uses "system_image_guid" to identify if devices can
be bound together or not. Sysfs "phys_switch_id" is used to get
"system_image_guid" of each network interface.

OFED 5.4+ is required to support "phys_switch_id".

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agocommon/mlx5: support PCIe device GUID query
Rongwei Liu [Tue, 26 Oct 2021 08:48:29 +0000 (11:48 +0300)]
common/mlx5: support PCIe device GUID query

sysfs entry "phys_switch_id" holds each PCIe device'
guid.

The devices which reside in the same physical NIC should
have the same guid.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agonet/iavf: fix shared data in multi-process
Dapeng Yu [Wed, 27 Oct 2021 08:28:39 +0000 (16:28 +0800)]
net/iavf: fix shared data in multi-process

The shared pointer is initialized to a static local array defined in the
primary process and it shall not be accessed in the secondary process.

This patch copies the local data to shared data, to avoid data access
violation.

Fixes: 040b44551f77 ("net/iavf: unify Rx packet type table")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2 years agonet/ice: fix function pointer in multi-process
Dapeng Yu [Tue, 26 Oct 2021 01:55:42 +0000 (09:55 +0800)]
net/ice: fix function pointer in multi-process

This patch uses the index value to call the function, instead of the
function pointer assignment to save the selection of Receive Flex
Descriptor profile ID.

Otherwise the secondary process will run with wrong function address
from primary process.

Fixes: 7a340b0b4e03 ("net/ice: refactor Rx FlexiMD handling")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
2 years agonet/ice: workaround DCF reset failure
Dapeng Yu [Tue, 26 Oct 2021 09:53:07 +0000 (17:53 +0800)]
net/ice: workaround DCF reset failure

After DCF is reset by PF, the DCF device un-initialization cannot
function normally, ignore the failure does not help since the kernel
does not clean up resource.

The patch workaround the issue by triggering an additional DCF enable/
disable cycle when a passive reset is detected.

Fixes: 1a86f4dbdf42 ("net/ice: support DCF device reset")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2 years agoethdev: warn once when using port not ready
David Marchand [Wed, 27 Oct 2021 12:01:44 +0000 (14:01 +0200)]
ethdev: warn once when using port not ready

Warning continuously is a pain when developping or if a unit test
is/gets broken.

It could also be a problem if application behaves badly only in some
corner cases and a DoS results of those logs being continuously displayed.

Let's warn once per port and per rx/tx.

Getting such a log is scary, but let's make it more eye catching by
dumping a backtrace with it.

Tested by introducing a bug in testpmd:
 static int
 eth_dev_start_mp(uint16_t port_id)
 {
-       if (is_proc_primary())
+       if (!is_proc_primary())
                return rte_eth_dev_start(port_id);

        return 0;

Then, running a basic null test:
$ ./devtools/test-null.sh
...
Start automatic packet forwarding
io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support
  enabled, MP allocation mode: native
Logical Core 1 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

lcore 0 called rx_pkt_burst for not ready port 0
8: [build/app/dpdk-testpmd() [0x59e839]]
7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
4: [build/app/dpdk-testpmd() [0x65e1be]]
3: [build/app/dpdk-testpmd() [0x65a996]]
2: [build/app/dpdk-testpmd() [0xa6cbc7]]
1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
lcore 0 called rx_pkt_burst for not ready port 1
8: [build/app/dpdk-testpmd() [0x59e839]]
7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
4: [build/app/dpdk-testpmd() [0x65e1be]]
3: [build/app/dpdk-testpmd() [0x65a996]]
2: [build/app/dpdk-testpmd() [0xa6cbc7]]
1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
  io packet forwarding packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  port 0: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x0

Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate structure")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2 years agonet/memif: fix driver init with default MTU
Ferruh Yigit [Wed, 27 Oct 2021 09:14:29 +0000 (10:14 +0100)]
net/memif: fix driver init with default MTU

Driver is using 'ETH_FRAME_LEN' Linux defined value as max frame length,
which doesn't include FCS (4 bytes CRC). But ethdev by default uses
frame size with FCS when application doesn't define any explicit value.

As a result device configuration fails because device is tried to be
configured with a frame size length that is bigger than what device
reported as supported. Device reports as max supported frame size is
1514 but configured value is 1518.

Instead use DPDK macro, 'RTE_ETHER_MAX_LEN', that includes FCS in the
driver to report the max supported frame size, this matches to the
initial intention.

Fixes: 1bb4a528c41f ("ethdev: fix max Rx packet length")

Reported-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2 years agonet/af_packet: fix driver init with default MTU
Ferruh Yigit [Wed, 27 Oct 2021 09:14:28 +0000 (10:14 +0100)]
net/af_packet: fix driver init with default MTU

Driver is using 'ETH_FRAME_LEN' Linux defined value as max frame length,
which doesn't include FCS (4 bytes CRC). But ethdev by default uses
frame size with FCS when application doesn't define any explicit value.

As a result device configuration fails because device is tried to be
configured with a frame size length that is bigger than what device
reported as supported. Device reports as max supported frame size is
1514 but configured value is 1518.

Instead use DPDK macro, 'RTE_ETHER_MAX_LEN', that includes FCS in the
driver to report the max supported frame size, this matches to the
initial intention.

Fixes: 1bb4a528c41f ("ethdev: fix max Rx packet length")

Reported-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2 years agomem: fix dynamic hugepage mapping in container
Olivier Matz [Fri, 29 Oct 2021 09:53:10 +0000 (11:53 +0200)]
mem: fix dynamic hugepage mapping in container

Since its introduction in 2018, the SIGBUS handler was never registered,
and all related functions were unused.

A SIGBUS can be received by the application when accessing to hugepages
even if mmap() was successful, This happens especially when running
inside containers when there is not enough hugepages. In this case, we
need to recover. A similar scheme can be found in eal_memory.c.

Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2 years agomalloc: fix allocation with unknown socket ID
Ilyes Ben Hamouda [Fri, 29 Oct 2021 09:49:29 +0000 (11:49 +0200)]
malloc: fix allocation with unknown socket ID

When using rte_malloc() from a thread which is not bound to a numa
socket (the typical case is a control thread, but it can also happen
on a dataplane thread if its cpu affinity is on cores attached to
several sockets), the used heap is the one from numa socket 0, which
may not have available memory.

Fix this by selecting the first socket which has available memory.

Note: malloc_get_numa_socket() is only used from one .c file, so move
it there, and remove the inline keyword.

Fixes: b94580d6887e ("malloc: avoid unknown socket id")
Cc: stable@dpdk.org
Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2 years agoeal: suggest using --lcores option
David Hunt [Wed, 3 Nov 2021 14:32:28 +0000 (14:32 +0000)]
eal: suggest using --lcores option

If the user requests to use an lcore above 128 using -l,
the eal will exit with "EAL: invalid core list syntax" and
very little else useful information.

This patch adds some extra information suggesting to use --lcores
so that physical cores above RTE_MAX_LCORE (default 128) can be
used. This is achieved by using the --lcores option by mapping
the logical cores in the application to physical cores.

For example, if "-l 12-16,130,132" is used, we see the following
additional output on the command line:

EAL: lcore 132 >= RTE_MAX_LCORE (128)
EAL: lcore 133 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@12,1@13,2@14,3@15,4@16,5@132,6@133

The same is added to -c option parsing.

For example, if "-c 0x300000000000000000000000000000000" is
used, we see the following additional output on the command line:

EAL: lcore 128 >= RTE_MAX_LCORE (128)
EAL: lcore 129 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@128,1@129

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2 years agoapp/flow-perf: add destination ports parameter
Sean Zhang [Fri, 29 Oct 2021 05:52:51 +0000 (08:52 +0300)]
app/flow-perf: add destination ports parameter

Add optional destination ports parameter for port-id action.
The parameter is not must, and the value is 1 by default as before
if the parameter not provided.

For example:

$ dpdk-test-flow-perf -w 08:00.0,representor=[0,1] -- --transfer \
  --ingress --transfer --ether --portmask=0x2 --vxlan-encap \
  --port-id=0

This command means the rule created on representor 0 with port 0
as destination, since the portmask is 0x2 and dst-ports is 0:

$ dpdk-test-flow-perf -w 08:00.0,representor=[0,1] \
  -w 08:00.1,representor=[0,1]-- --transfer --ingress --transfer \
  --ether --portmask=0x12 --vxlan-encap --port-id=0,3

This command means the rules created on both representor 0 of PF 0
and PF 1, the destination port for the first representor is PF 0,
and the destination port for the other one is PF 1.

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Reviewed-by: Wisam Jaddo <wisamm@nvidia.com>
2 years agoeal: promote non-EAL lcore API as stable
David Marchand [Fri, 22 Oct 2021 06:55:28 +0000 (08:55 +0200)]
eal: promote non-EAL lcore API as stable

This API has been around for more than a year (and is in LTS 20.11).
It did not receive negative feedback and will be used in a next OVS
release.
Mark it stable.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>