git.droids-corp.org - dpdk.git/log

ethdev: add capability to keep flow rules on restart

Previously, it was not specified what happens to the flow rules
when the device is stopped, possibly reconfigured, then started.
If flow rules were kept, it could be convenient for application
developers, because they wouldn't need to save and restore them.
However, due to the number of flows and possible creation rate it is
impractical to save all flow rules in DPDK layer. This means that flow
rules persistence really depends on whether PMD and HW can implement it
efficiently. It can also be limited by the rule item and action types,
and its attributes transfer bit (a combination of an item/action type
and a value of the transfer bit is called a rule feature).

Add a device capability bit for PMDs that can keep at least some
of the flow rules across restart. Without this capability behavior
is still unspecified and it is declared that the application must
flush the rules before stopping the device.
Allow the application to test for persistence of rules using
a particular feature by attempting to create a flow rule
using that feature when the device is stopped
and checking for the specific error.
This is logical because if the PMD can to create the flow rule
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow rule object
to the same state when the device is stopped and restore the state
when the device is started.

Rule persistence across a reconfigurations is not required,
because tracking all the rules and configuration-dependent resources
they use may be infeasible. In case a PMD cannot keep the rules
across reconfiguration, it is allowed just to report an error.
Application must then flush the rules before attempting it.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

bonding: show Tx policy for 802.3AD mode

As balance xmit policy is supported in bonding mode 4(802.3AD). This
patch adds balance xmit policy show in testpmd commands for mode 4. Like:
testpmd> show bonding config 2
        Bonding mode: 4
        Balance Xmit Policy: BALANCE_XMIT_POLICY_LAYER34
        IEEE802.3AD Aggregator Mode: stable
        Slaves (2): [0 1]
        Active Slaves (2): [1 0]
        Primary: [1]

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/af_xdp: use BPF link for XDP programs

Since v0.4.0, if the underlying kernel supports it, libbpf uses 'bpf
link' to manage the programs on the interfaces of the xsks. This has two
repercussions for the PMD.

1. In the case where the PMD asks libbpf to load the default XDP
   program, the PMD no longer needs to remove it on teardown. This is
   because bpf link handles the unloading under the hood.
2. In the case where the PMD loads a custom program, libbpf expects this
   program to be linked via bpf link prior to creating the socket.

This patch introduces probes for the libbpf version and kernel support
for bpf link and orchestrates the loading and unloading of
programs according to the capabilities of the kernel and libbpf. The
libbpf version is checked with meson and pkg-config. The probe for
kernel support mirrors how it is implemented in libbpf. A bpf_link is
created and looked up on loopback device. If successful, bpf_link will
be used for the AF_XDP netdev.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

net/mlx5: fix RSS expansion with EtherType

The RSS expansion algorithm is using a graph to find the possible
expansion paths. A graph node with the 'explicit' flag will be skipped,
if it is not found in the flow pattern.
The current implementation misses a check for the explicit flag when
expanding the pattern according to ETH item with EtherType.
For example:
testpmd> flow create 0 ingress pattern eth / ipv6 / udp / vxlan / eth
type is 2048 / end actions rss level 2 types udp end / end
The "eth type is 2048" item in the pattern may be expanded to "ETH IPv4".
The ETH node in the expansion graph is followed by VLAN node marked as
explicit. The fix is to skip the VLAN node and continue the expansion
with its next nodes, IPv4 and IPv6.
The expansion paths for the above example will be:
ETH IPV6 UDP VXLAN ETH END
ETH IPV6 UDP VXLAN ETH IPV4 UDP END

Fixes: 69d268b4fff3 ("net/mlx5: fix RSS expansion for explicit graph node")
Cc: stable@dpdk.org
Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix meter action pool protection

The ASO meter action with flows creation could be supported on
multiple threads. The meter pools were created to manage the meter
object resources, if there is no room in the current meter pool then
resize the meter pool to the new pool size and free the old one.

There's a race condition while one thread resizes the meter pool and
the old pool resource be freed, and another thread query the meter
object by index on the old pool, the return value is invalid.

This patch adds a read-write lock to protect the pool resource while
resizing and query.

Fixes: a5835d530f00 ("net/mlx5: optimize Rx queue match")
Cc: stable@dpdk.org
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix age action pool protection

The age action with flows creation could be supported on the multiple
threads. The age pools were created to manage the age resources, if
there is no room in the current pool then resize the age pool to the new
pool size and free the old one.

There's a race condition while one thread resizes the age pool and the
old pool resource be freed, and another thread query the age action
value of the old pool so the queried value is invalid.

This patch uses the read-write lock to protect the pool resource while
resizing and query.

Fixes: a5835d530f00 ("net/mlx5: optimize Rx queue match")
Cc: stable@dpdk.org
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

ethdev: remove deprecation notice for shared Rx queue

Shared Rx queue feature has been supported, remove deprecation notice.

Fixes: dd22740cc291 ("ethdev: introduce shared Rx queue")
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

raw/cnxk_bphy: support telemetry

Added /cnxk/bphy/info telemetry endpoint.

Signed-off-by: Tomasz Duszynski <tduszynski@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/txgbe: fix link negotiation value

Macro is changed unintentionally while adding RTE_ prefix, fixing the
original value.

Fixes: 295968d17407 ("ethdev: add namespace")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/hns3: unify multicast MAC address set list

This patch removes hns3vf_set_mc_mac_addr_list() and uses
hns3_set_mc_mac_addr_list() to do this.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: refactor multicast MAC address set for PF

Currently, when configuring a group of multicast MAC addresses, the PF
driver reorder mc_addr array in hw struct to remove multicast MAC
addresses that are not in mc_addr_set array from user and then adds new
multicast MAC addresses. Actually, it can be simplified by removing all
previous MAC addresses and then adding new MAC addresses.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: unify multicast address check

This patch uniforms a common function to check multicast address
validity for PF and VF.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: unify MAC address add and remove

The code logic of adding and removing MAC address in PF and VF is the
same.
This patch extracts two common interfaces to add and remove them
separately.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: unify MAC and multicast address configuration

Currently, the interface logic for adding and deleting all MAC address
and multicast address in PF and VF driver is the same. This patch
extracts two common interfaces to configure them separately.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: use HW ops to config MAC features

This patch uses APIs in hns3_hw_ops to configure MAC related features.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: add HW ops structure to operate hardware

This patch adds hns3_hw_ops structure to operate hardware in PF and VF
driver.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: remove redundant multicast removal interface

This patch removes redundant hns3_remove_mc_addr_common(), which can be
replaced by hns3_remove_mc_mac_addr().

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: rename unicast address removal function

This patch renames hns3_remove_uc_addr_common() to
hns3_remove_uc_mac_addr() in PF.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: remove redundant multicast MAC interface

This patch removes hns3_add_mc_addr_common() in PF and
hns3vf_add_mc_addr_common() in VF.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: extract common interface to check duplicates

Extract a common interface for PF and VF to check whether the configured
multicast MAC address from rte_eth_dev_mac_addr_add() is the same as the
multicast MAC address from rte_eth_dev_set_mc_addr_list().

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: rename multicast address removal function

This patch renames hns3_remove_mc_addr() to hns3_remove_mc_mac_addr().

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: rename unicast address function

This patch renames hns3_add_uc_addr() to hns3_add_uc_mac_addr().

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: rename multicast address function

This patch renames hns3_add_mc_addr() to hns3_add_mc_mac_addr().

Signed-off-by: Huisong Li <lihuisong@huawei.com>

net/bnxt: fix stat context allocation

stat_ctx_alloc is called within the context of each rx/tx ring.
i.e from bnxt_alloc_hwrm_rx_ring and bnxt_alloc_hwrm_tx_ring().
So, there is no need to invoke bnxt_alloc_all_hwrm_stat_ctxs()
from bnxt_start_nic().

Fixes: 657c2a7f1dd4 ("net/bnxt: create aggregation rings when needed")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/bnxt: fix freeing aggregation rings

During port stop, we clear "eth_dev->data->scattered_rx" at the
beginning. As a result, in bnxt_free_hwrm_rx_ring() the check
bnxt_need_agg_ring() returns false and we end up not freeing
the Rx aggregation rings which results in resource leak in the FW.

Fixes: 657c2a7f1dd4 ("net/bnxt: create aggregation rings when needed")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

net/ice: fix performance for Rx timestamp

In Rx data path, it reads hardware registers per packet, resulting in
big performance drop. This patch improves performance from two aspects:
(1) replace per packet hardware register read by per burst.
(2) reduce hardware register read time from 3 to 2 when the low value of
time is not close to overflow.

Meanwhile, this patch refines "ice_timesync_read_rx_timestamp" and
"ice_timesync_read_tx_timestamp" API in which
"ice_tstamp_convert_32b_64b" is also used.

Fixes: 953e74e6b73a ("net/ice: enable Rx timestamp on flex descriptor")
Fixes: 646dcbe6c701 ("net/ice: support IEEE 1588 PTP")
Suggested-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Simei Su <simei.su@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/i40e: fix 32-bit build

Got error with: gcc 11.2.1 "cc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1)"

Build error:
In function ‘i40e_flow_parse_fdir_pattern’,
    inlined from ‘i40e_flow_parse_fdir_filter’
    at ../drivers/net/i40e/i40e_flow.c:3274:8:
../drivers/net/i40e/i40e_flow.c:3052:69:
    error: writing 1 byte into a region of size 0
    [-Werror=stringop-overflow=]
3052 |                         filter->input.flow_ext.flexbytes[j] =
      |                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
3053 |                                 raw_spec->pattern[i];
      |                                 ~~~~~~~~~~~~~~~~~~~~
In file included from ../drivers/net/i40e/i40e_flow.c:25:
  ../drivers/net/i40e/i40e_flow.c:
  In function ‘i40e_flow_parse_fdir_filter’:
  ../drivers/net/i40e/i40e_ethdev.h:638:17:
  note: at offset 16 into destination object ‘flexbytes’ of size 16
  638 |         uint8_t flexbytes[RTE_ETH_FDIR_MAX_FLEXLEN];
      |                 ^~~~~~~~~

Fixing by adding range checks.

Fixes: 6ced3dd72f5f ("net/i40e: support flexible payload parsing for FDIR")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/mlx5: do not close stdin on error

If for any reason, a socket could not be opened, mlx5_pmd_socket_init()
could close the 0 fd (which is valid, and has a fair chance to be stdin),
since server_socket == 0 from the variable being in .bss.

Fixes: e6cdc54cc0ef ("net/mlx5: add socket server for external tools")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>

doc: add metering limitation in mlx5 guide

A meter policy with RSS/Queue action is not supported
when dv_xmeta_en enabled.

When dv_xmeta_en enabled in legacy creating flow,
it will split into two flows
(one set_tag with jump flow and one RSS/queue action flow).
For meter policy as termination table,
it cannot split flow and
cannot support when dv_xmeta_en enabled.

Fixes: 51ec04dc7bcf ("net/mlx5: connect meter policy to created flows")
Cc: stable@dpdk.org
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: allow meta modifications in legacy mode

The MODIFY_FIELD RTE action rejects copy to/from metadata
in case of the legacy mode extensive flow metadata support.
It is not consistent with SET_META action that has no such
restriction imposed. Registers A or B are used for META in
legacy mode. Allow meta modifications in legacy mode as well.

On other hand, SET_META rejects actions in case register C
is not available even though it is not needed in legacy mode.
Skip this check for legacy mode and allow setting META.

Fixes: edf325d421e8 ("net/mlx5: check extended metadata for meta modification")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: fix Tx meta width for modify field flow rule

Register C is used for the metadata within NIC Rx domain.
And its width can vary from 0 to 32 bits depending on
its kernel usage. But it is not the case within NIC Tx domain,
register A is always 32 bits there. Fix metadata width detection
for the modify_field flow API within NIC Tx domain.

Fixes: 6d5735c1cba2 ("net/mlx5: fix meta register conversion for extensive mode")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/ngbe: support Tx done cleanup

Add support for API rte_eth_tx_done_cleanup().

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support Rx and Tx descriptor status

Supports to get the number of used Rx descriptors,
and check the status of Rx and Tx descriptors.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support Rx and Tx queue info

Add Rx and Tx queue information get operation.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support timesync

Add to support IEEE1588/802.1AS timestamping, and IEEE1588 timestamp
offload on Tx.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support register dump

Support to dump registers.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support EEPROM dump

Support to get and set device EEPROM data.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support device LED on/off

Support device LED on and off.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support flow control

Support to get and set flow control.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: add mailbox process operations

Add check operation for vf function level reset,
mailbox messages and ack from vf.
Waiting to process the messages.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support SR-IOV

Initialize and configure PF module to support SRIOV.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support RSS hash

Support RSS hashing on Rx, and configuration of RSS hash computation.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support MAC filters

Add MAC addresses to filter incoming packets, support to set
multicast addresses to filter. And support to set unicast table array.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support loopback mode

Support loopback operation mode.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support FW version query

Add firmware version get operation.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support device promiscuous and allmulticast mode

Support to enable/disable promiscuous and allmulticast mode for a port.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support MTU set

Support updating port MTU.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support device xstats

Add device extended stats get from reading hardware registers.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support basic statistics

Support to read and clear basic statistics.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support VLAN offload and VLAN filter

Support to set VLAN and QinQ offload, and filter of a VLAN tag
identifier.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support jumbo frame

Add to support Rx jumbo frames.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support CRC offload

Support to strip or keep CRC in Rx path.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support Rx/Tx burst mode info

Support to get Rx/Tx burst mode info.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support TSO

Add transmit datapath with offloads, and support TCP segmentation
offload.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support Rx checksum offload

Support IP/L4 checksum on Rx, and convert it to mbuf flags.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support scattered Rx

Add scattered Rx function to support receiving segmented mbufs.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support packet type query

Add packet type macro definition and convert ptype to ptid.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/hns3: fix mailbox communication with HW

Mailbox is the communication mechanism between SW and HW. There exist
two approaches for SW to recognize mailbox message from HW. One way is
using match_id, the other is to compare the message code. The two
approaches are independent and used in different scenarios.

But for the second approach, "next_to_use" should be updated and written
to HW register. If it not done, HW do not know the position SW steps,
then, the communication between SW and HW will turn to be failed.

Fixes: dbbbad23e380 ("net/hns3: fix VF handling LSC event in secondary process")
Cc: stable@dpdk.org
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

mempool/cnxk: postpone devargs parsing

Use roc_npa_lf_init_cb_register() scheme to register
callback for max_pools argument parsing.
This will remove the dependency on the order of PCI
devices probed.

Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: support ROC NPA init callback

Add support for registering callback for ROC NPA init.

Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>

mempool/cnxk: fix max pools argument parsing

roc_idev_npa_maxpools_set() expects max_pools original value,
not the AURA.

Fixes: 0a50a5aad299 ("mempool/cnxk: add device probe/remove")
Cc: stable@dpdk.org
Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

net/octeontx_ep: remove octeontx2 dependency

octeontx_ep driver's dependency on octeontx2 common code is
removed as going forward ep driver will include files from
its own path.

Signed-off-by: Nalla Pradeep <pnalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: update mailbox version to 0xb

Sync mailbox definition with AF kernel driver.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/octeontx2: update mailbox version to 0xb

Sync mailbox definition with AF kernel driver.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

vhost: increase number of async IO vectors

This patch increases the number of IO vectors for the
asynchronous data path from 512 to 2048. It has been
reported during testing the starvation of IO vectors
during iperf benchmark with 64KB packet size.

As there are no direct relationship between
VHOST_MAX_ASYNC_VEC and BUF_VECTOR_MAX, this patch also
assign VHOST_MAX_ASYNC_VEC value directly instead of being
a multiple of BUF_VECTOR_MAX.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: merge sync and async mbuf to descriptor filling

This patches merges copy_mbuf_to_desc() used by the sync
path with async_mbuf_to_desc() used by the async path.

Most of these complex functions are identical, so merging
them will make the maintenance easier.

In order not to degrade performance, the patch introduces
a boolean function parameter to specify whether it is called
in async context. This boolean is statically passed to this
always-inlined function, so the compiler will optimize this
out.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: prepare sync for mbuf to descriptor refactoring

This patch extracts the descriptors buffers filling
from copy_mbuf_to_desc() into a dedicated function as a
preliminary step of merging copy_mubf_to_desc() and
async_mbuf_to_desc().

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: prepare async for mbuf to descriptor refactoring

This patch extracts the IO vectors filling from
async_mbuf_to_desc() into a dedicated function as a
preliminary step of merging copy_mubf_to_desc() and
async_mbuf_to_desc().

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: simplify getting first in-flight index

This patch reworks the function getting the index
for the first packet in-flight.

When this index turns out to be zero, let's use the simple
path. Doing that avoid having to do a modulo with the
virtqueue size.

The patch also rename the function for better clarification,
and only pass the virtqueue metadata pointer, as all the
needed information are stored there.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: simplify async enqueue completion

vhost_poll_enqueue_completed() assumes some inflight
packets could have been completed in a previous call but
not returned to the application. But this is not the case,
since check_completed_copies callback is never called with
more than the current count as argument.

In other words, async->last_pkts_n is always 0. Removing it
greatly simplifies the function.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: remove notion of async descriptor

Now that IO vectors iterator have been simplified, the
rte_vhost_async_desc struct only contains a pointer on
the iterator array stored in the async metadata.

This patch removes it, and pass directly the iterators
array pointer to the transfer_data callback. Doing that,
we avoid declaring the descriptor array in the stack, and
also avoid the cost of filling it.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: improve IO vector logic

IO vectors and their iterators arrays were part of the
async metadata but not their indexes.

In order to makes this more consistent, the patch adds the
indexes to the async metadata. Doing that, we can avoid
triggering DMA transfer within the loop as it IO vector
index overflow is now prevented in the async_mbuf_to_desc()
function.

Note that previous detection mechanism was broken
since the overflow already happened when detected, so OOB
memory access would already have happened.

With this changes done, virtio_dev_rx_async_submit_split()
and virtio_dev_rx_async_submit_packed() can be further
simplified.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: remove useless fields in async iterator struct

Offset and count fields are unused and so can be removed.
The offset field was actually in the Vhost example, but
in a way that does not make sense.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: introduce specific iovec structure

This patch introduces rte_vhost_iovec struct that contains
both source and destination addresses since we always have
a 1:1 mapping between source and destination. While using
the standard iovec struct might have seemed better, having
to duplicate IO vectors and its iterators is memory
inefficient and make the implementation more complex.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: remove async batch threshold

Reaching the async batch threshold was one of the condition
to trigger the DMA transfer. However, this condition was
never met since the threshold value is 32, same as the
MAX_PKT_BURST value.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: simplify async IO vectors iterators

This patch splits the iterator arrays in two, one for
source and one for destination. The goal is make the code
easier to understand.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: simplify async IO vectors

IO vectors implementation is unnecessarily complex, mixing
source and destinations vectors in the same array.

This patch declares two arrays, one for the source and one
for the destination. It also gets rid of seg_awaits variable
in both packed and split implementation, which is the same
as iovec_idx.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: hide in-flight async structure

This patch moves async_inflight_info struct to internal
header since it should not be part of the API.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: move async data in dedicated structure

This patch moves async-related metadata from vhost_virtqueue
to a dedicated struct. It makes it clear which fields are
async related, and also saves some memory when async feature
is not in use.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

net/virtio: fix link update in speed feature

Link update callback reports speed/duplex based on data
filled on device initialization. This is wrong in case of
VIRTIO_NET_F_SPEED_DUPLEX is negotiated since link could
be down at this time. Fix this function to actually
update the HW data in this case with respect to the fact
that specifying speed via devarg is a highest priority.

Fixes: 1357b4b36246 ("net/virtio: support Virtio link speed feature")
Cc: stable@dpdk.org
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

examples/l3fwd-power: support virtio/vhost

In l3fwd-power, there is default port configuration which requires
RSS and IPv4/UDP/TCP checksum. Once device does not support these,
the l3fwd-power will exit and report an error.
This patch updates the port configuration based on device capabilities
after getting the device information to support devices like virtio
and vhost.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>

power: support missing Rx queue info

Since some vdevs like virtio and vhost do not support rxq_info_get and
queue state inquiry, the error return value -ENOTSUP need to be ignored
when queue_stopped cannot get rx queue information and rx queue state.
This patch changes the return value of queue_stopped when
rte_eth_rx_queue_info_get return -ENOTSUP to support vdevs which cannot
provide rx queue information and rx queue state enable power management.

Fixes: 209fd585456c ("power: make ethdev power management thread unsafe")
Cc: stable@dpdk.org
Signed-off-by: Miao Li <miao.li@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

net/vhost: support power monitor

According to current semantics of power monitor, this commit adds a
callback function to decide whether aborts the sleep by checking
current value against the expected value and vhost_get_monitor_addr
to provide address to monitor. When no packet come in, the value of
address will not be changed and the running core will sleep. Once
packets arrive, the value of address will be changed and the running
core will wakeup.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>

vhost: add power monitor API

This commit defines rte_vhost_power_monitor_cond which is used to pass
some information to vhost driver. The information is including the
address to monitor, the expected value, the mask to extract value read
from 'addr', the value size of monitor address, the match flag used to
distinguish the value used to match something or not match something.

Vhost driver can use these information to fill rte_power_monitor_cond.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>

net/virtio: support power monitor

According to current semantics of power monitor, this commit adds a
callback function to decide whether aborts the sleep by checking
current value against the expected value and virtio_get_monitor_addr
to provide address to monitor. When no packet come in, the value of
address will not be changed and the running core will sleep. Once
packets arrive, the value of address will be changed and the running
core will wakeup.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>

vhost: remove async DMA map status

Async DMA map status flag was added to prevent the unnecessary unmap
when DMA devices bound to kernel driver. This brings maintenance cost
for a lot of code. This patch removes the DMA map status by using
rte_errno instead.

This patch relies on the following patch to fix a partial
unmap check in vfio unmapping API.
[1] https://www.mail-archive.com/dev@dpdk.org/msg226464.html

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

app/testpmd: add missing flow types in port info

This patch adds missing IPv6-Ex and GTPU flow types to port
info command. It also add the same definitions to
str2flowtype(), used to configure flow director.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>

net/mlx5: fix RSS RETA update

This patch fixes RETA updating for entries above 64.
Without that, these entries are never updated as
calculated mask value will always be 0.

Fixes: 634efbc2c8c0 ("mlx5: support RETA query and update")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

app/testpmd: fix RSS type display

This patch fixes the display of the RSS hash types
configured in the port, which displayed "all" even
if only a single type was configured

Fixes: 3c90743dd3b9 ("app/testpmd: support more types for flow RSS")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

app/testpmd: fix RSS key length

port_rss_hash_key_update() initializes rss_conf with the
RSS key configuration provided by the user, but it calls
rte_eth_dev_rss_hash_conf_get() before calling
rte_eth_dev_rss_hash_update(), which overrides the parsed
RSS config.

While the RSS key value is set again after, this is not
the case of the key length. It could cause out of bounds
access if the key length parsed is smaller than the one
read from rte_eth_dev_rss_hash_conf_get().

This patch restores the key length before the
rte_eth_dev_rss_hash_update() call to ensure the RSS key
value/length pair is consistent.

Fixes: 8205e241b2b0 ("app/testpmd: add missing type to RSS hash commands")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

net/virtio: support RSS

Provide the capability to update the hash key, hash types
and RETA table on the fly (without needing to stop/start
the device). However, the key length and the number of RETA
entries are fixed to 40B and 128 entries respectively. This
is done in order to simplify the design, but may be
revisited later as the Virtio spec provides this
flexibility.

Note that only VIRTIO_NET_F_RSS support is implemented,
VIRTIO_NET_F_HASH_REPORT, which would enable reporting the
packet RSS hash calculated by the device into mbuf.rss, is
not yet supported.

Regarding the default RSS configuration, it has been
chosen to use the default Intel ixgbe key as default key,
and default RETA is a simple modulo between the hash and
the number of Rx queues.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

doc: remove obsolete option from bnxt guide

host-based-truflow devarg is not used anymore to enable host based
flow table management functionality TruFlow. Instead this feature is
now driven by a capability indicated by the firmware.

TruFlow is not in tech preview anymore. Update the doc accordingly.

Fixes: da3731e2ea00 ("net/bnxt: check FW capability to support TRUFLOW")
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

doc: update NIC feature matrix for bnxt

Support for runtime Rx/Tx queue setup and inner RSS is not updated.
Update feature matrix for bnxt PMD.

Fixes: 7ed45b1a7c0f ("net/bnxt: support RSS hash selection")
Fixes: 0105ea1296c9 ("net/bnxt: support runtime queue setup")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: remove stale compilation option

Remove a stale compile option from meson build file.
RTE_LIBRTE_BNXT_TF sneaked in incorrectly.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/ice: remove VSI update on DCF reset by PF

After DCF is reset by PF, the VSI update service is unable to be
completed since the DCF resource is invalid.

This patch removes the call to service that updates VSI since it is
useless and output too many error messages.

Fixes: c7e1a1a3bfeb ("net/ice: refactor DCF VLAN handling")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: add watchdog for VF FLR

Add watchdog to iAVF PMD which support monitoring the VFLR register. If
the device is not already in reset then if a VF reset in progress is
detected then notify user through callback and set into reset state.
If the device is already in reset then poll for completion of reset.

The watchdog is disabled by default, to enable it set
IAVF_DEV_WATCHDOG_PERIOD to a non zero value (microseconds)

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>

net/iavf: support xstats for inline IPsec crypto

Add per queue counters for maintaining statistics for inline IPsec
crypto offload, which can be retrieved through the
rte_security_session_stats_get() with more detailed errors through the
rte_ethdev xstats.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>

net/iavf: support IPsec inline crypto

Add support for inline crypto for IPsec, for ESP transport and
tunnel over IPv4 and IPv6, as well as supporting the offload for
ESP over UDP, and in conjunction with TSO for UDP and TCP flows.
Implement support for rte_security packet metadata

Add definition for IPsec descriptors, extend support for offload
in data and context descriptor to support

Add support to virtual channel mailbox for IPsec Crypto request
operations. IPsec Crypto requests receive an initial acknowledgment
from physical function driver of receipt of request and then an
asynchronous response with success/failure of request including any
response data.

Add enhanced descriptor debugging

Refactor of scalar tx burst function to support integration of offload

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>

net/iavf: support asynchronous virtual channel message

Add support for asynchronous virtual channel messages, specifically for
inline IPsec messages.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>

net/iavf: rework Tx path

Rework the Tx path and Tx descriptor usage in order to
allow for better use of offload flags and to facilitate enabling of
inline crypto offload feature.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>