git.droids-corp.org - dpdk.git/log

net/failsafe: fix sub-device ownership race

There is time between the sub-device port probing by the sub-device PMD
to the sub-device port ownership taking by a fail-safe port.

In this time, the port is available for the application usage. For
example, the port will be exposed to the applications which use
RTE_ETH_FOREACH_DEV iterator.

Thus, ownership unaware applications may manage the port in this time
what may cause a lot of problematic behaviors in the fail-safe
sub-device initialization.

Register to the ethdev NEW event to take the sub-device port ownership
before it becomes exposed to the application.

Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

ethdev: fix port probing notification

The new device was notified as soon as it was allocated.
It leads to use a device which is not yet initialized.

The notification must be published after the initialization is done
by the PMD, but before the state is changed, in order to let
notified entities taking ownership before general availability.

Fixes: 29aa41e36de7 ("ethdev: add notifications for probing and removal")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

ethdev: fix port visibility before initialization

The port was set to the state ATTACHED during allocation.
The consequence was to iterate over ports which are not initialized.

The state ATTACHED is now set as the last step of probing.

The uniqueness of port name is now checked before the availability
of a port id for allocation (order reversed).

As the state is not set on allocation anymore, it is also not checked
in the function telling whether a port is allocated or not.
The name of the port is set on allocation, so it is enough as a check.

Fixes: 5588909af21b ("ethdev: add device iterator")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

ethdev: add lock to port allocation check

When comparing the port name, there can be a race condition with
a thread allocating a new port and writing the name at the same time.
It can lead to match with a partial name by error.

The check of the port is now considered as a critical section
protected with locks.

This fix will be even more required for multi-process when the
port availability will rely only on the name, in a following patch.

Fixes: 84934303a17c ("ethdev: synchronize port allocation")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

ethdev: allow ownership operations on unused port

When the state will be updated later than in allocation,
we may need to update the ownership of a port which is
still in state unused.

It will be used to take ownership of a port before it is
declared as available for other entities.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

ethdev: add probing finish function

A new hook function is added and called inside the PMDs at the end
of the device probing:
- in primary process, after allocating, init and config
- in secondary process, after attaching and local init

This new function is almost empty for now.
It will be used later to add some post-initialization processing.

For the PMDs calling the helpers rte_eth_dev_create() or
rte_eth_dev_pci_generic_probe(), the hook rte_eth_dev_probing_finish()
is called from here, and not in the PMD itself.

Note that the helper rte_eth_dev_create() could be used more,
especially for vdevs, avoiding some code duplication in PMDs.

Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

drivers/net: use higher level of probing helper for PCI

The drivers avp, bnx2x and liquidio were using the helper function
rte_eth_dev_pci_allocate() and can be replaced by
rte_eth_dev_pci_generic_probe() which calls the former.

Fixes: dcd5c8112bc3 ("ethdev: add PCI driver helpers")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

ethdev: add doxygen comments for each state

The enum rte_eth_dev_state was not properly documented.
Its values did not appear in the doxygen output,
and may be misunderstood.

The state RTE_ETH_DEV_DEFERRED has no interest anymore
since the ownership mechanism brings a more flexible categorization.
This state could be removed later.

Fixes: d52268a8b24b ("ethdev: expose device states")
Fixes: cb894d99eceb ("ethdev: add deferred intermediate device state")
Fixes: 5b7ba31148a8 ("ethdev: add port ownership")
Fixes: 7106edc12380 ("ethdev: add devop to check removal status")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

net/failsafe: fix sub-device visibility

The iterator function rte_eth_find_next_owned_by(), used by the
iterator macro RTE_ETH_FOREACH_DEV_OWNED_BY, are ignoring the devices
which are neither ATTACHED nor REMOVED. Thus sub-devices, having
the state DEFERRED, cannot be seen with the ethdev iterator.
The state RTE_ETH_DEV_DEFERRED can be replaced by
RTE_ETH_DEV_ATTACHED + owner.

Fixes: dcd0c9c32b8d ("net/failsafe: use ownership mechanism for slaves")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

ethdev: fix debug log of owner id

The owner id is 64-bit.
On 32-bit environment, it must be printed with PRIX64.

Fixes: 5b7ba31148a8 ("ethdev: add port ownership")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

app/testpmd: add commands to test new offload API

Add following testpmd run-time commands to support test of
new Rx offload API:
show port <port_id> rx_offload capabilities
show port <port_id> rx_offload configuration
port config <port_id> rx_offload <offload> on|off
port <port_id> rxq <queue_id> rx_offload <offload> on|off
Above last 2 commands should be run when the port is stopped.
And <offload> can be one of "vlan_strip", "ipv4_cksum", ...

Add following testpmd run-time commands to support test of
new Tx offload API:
show port <port_id> tx_offload capabilities
show port <port_id> tx_offload configuration
port config <port_id> tx_offload <offload> on|off
port <port_id> txq <queue_id> tx_offload <offload> on|off
Above last 2 commands should be run when the port is stopped.
And <offload> can be one of "vlan_insert", "udp_cksum", ...

Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>

net/i40e: print global register change info

Global register change info during enabling
flexible payload is not printed.
This patch changes macro to print the global
register change info.

Fixes: d2f9fe8ae309 ("net/i40e: turn off flexible payload on driver init")
Cc: stable@dpdk.org
Signed-off-by: Beilei Xing <beilei.xing@intel.com>

net/sfc: fix inner TCP/UDP checksum offload control

If application uses Tx offload API and sets ETH_TXQ_FLAGS_IGNORE flag,
it still should have inner TCP/UDP checksum offload enabled if it is
supported and TCP/UDP checksum offload is requested.

Fixes: c78d280e88ef ("net/sfc: convert to new Tx offload API")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>

doc: add XXV710 support in i40e guide

Signed-off-by: Qiming Yang <qiming.yang@intel.com>

net/enic: fix flow drop action

Drop is a fate-deciding action, so mark it as FATE. It was missing in
a previous commit.

Fixes: cc17feb90413 ("ethdev: alter behavior of flow API actions")
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>

net/e1000: report Tx multi segment offload

This feature has been confirmed with testpmd:
testpmd> set fwd txonly
testpmd> port stop all
testpmd> port config all txd 1024
testpmd> set txsplit on
testpmd> set txpkts 70,80,90,100
testpmd> start
It can be observed at peer port that UDP packets
with UDP data length 298 bytes.

Signed-off-by: Wei Dai <wei.dai@intel.com>

net/i40e: fix link status update

Link status is not updated correctly, link speed is 0
when link is up and link speed is not 0 when link is
down. This patch fixes the issue.

Fixes: eef2daf2e199 ("net/i40e: fix link update no wait")
Cc: stable@dpdk.org
Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Signed-off-by: Beilei Xing <beilei.xing@intel.com>

app/testpmd: fix device configure with zero queue

Setup number of Rx & Tx queues to 0 at rte_eth_dev_configure means
take driver's default queue number, so if during a re-configuration
previous queue number will be overwrite, this is not expected when
we configure dcb. The patch fix it by re-configure device with the
original queue number.

Fixes: 3be82f5cc5e ("ethdev: support PMD-tuned Tx/Rx parameters")
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>

net/mlx5: add Multi-Packet Rx support

Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe
bandwidth by posting a single large buffer for multiple packets. Instead of
posting a buffer per a packet, one large buffer is posted in order to
receive multiple packets on the buffer. A MPRQ buffer consists of multiple
fixed-size strides and each stride receives one packet.

Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is
comparatively small, or PMD attaches the Rx packet to the mbuf by external
buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external
buffers will be allocated and managed by PMD.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: add a function to rdma-core glue

mlx5dv_create_wq() is added for the Multi-Packet RQ (a.k.a Striding RQ).

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: separate filling Rx flags

Filling in fields of mbuf becomes a separate inline function so that this
can be reused.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx4: add new memory region support

This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.

There are multiple layers for MR search.

L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
array by linear search. L0/L1 is in an inline function -
mlx4_mr_lookup_cache().

If L1 misses, the bottom-half function is called to look up the address
from the bigger local cache of the queue. This is L2 - mlx4_mr_addr2mr_bh()
and it is not an inline function. Data structure for L2 is the Binary Tree.

If L2 misses, the search falls into the slowest path which takes locks in
order to access global device cache (priv->mr.cache) which is also a B-tree
and caches the original MR list (priv->mr.mr_list) of the device. Unless
the global cache is overflowed, it is all-inclusive of the MR list. This is
L3 - mlx4_mr_lookup_dev(). The size of the L3 cache table is limited and
can't be expanded on the fly due to deadlock. Refer to the comments in the
code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
have to be searched directly bypassing the cache although it is slower.

If L3 misses, a new MR for the address should be created -
mlx4_mr_create(). When it creates a new MR, it tries to register adjacent
memsegs as much as possible which are virtually contiguous around the
address. This must take two locks - memory_hotplug_lock and
priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
allocation/free of memory inside.

In the free callback of the memory hotplug event, freed space is searched
from the MR list and corresponding bits are cleared from the bitmap of MRs.
This can fragment a MR and the MR will have multiple search entries in the
caches. Once there's a change by the event, the global cache must be
rebuilt and all the per-queue caches will be flushed as well. If memory is
frequently freed in run-time, that may cause jitter on dataplane processing
in the worst case by incurring MR cache flush and rebuild. But, it would be
the least probable scenario.

To guarantee the most optimal performance, it is highly recommended to use
an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
and won't be freed dynamically. And it is also recommended to configure
per-lcore cache of Mempool. Even though there're many MRs for a device or
MRs are highly fragmented, the cache of Mempool will be much helpful to
reduce misses on per-queue caches anyway.

'--legacy-mem' is also supported.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx4: remove memory region support

This patch removes current support of Memory Region (MR) in order to
accommodate the dynamic memory hotplug patch. This patch can be compiled
but traffic can't flow and HW will raise faults. Subsequent patches will
add new MR support.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add new memory region support

This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.

There are multiple layers for MR search.

L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
array by linear search. L0/L1 is in an inline function -
mlx5_mr_lookup_cache().

If L1 misses, the bottom-half function is called to look up the address
from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh()
and it is not an inline function. Data structure for L2 is the Binary Tree.

If L2 misses, the search falls into the slowest path which takes locks in
order to access global device cache (priv->mr.cache) which is also a B-tree
and caches the original MR list (priv->mr.mr_list) of the device. Unless
the global cache is overflowed, it is all-inclusive of the MR list. This is
L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and
can't be expanded on the fly due to deadlock. Refer to the comments in the
code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
have to be searched directly bypassing the cache although it is slower.

If L3 misses, a new MR for the address should be created -
mlx5_mr_create(). When it creates a new MR, it tries to register adjacent
memsegs as much as possible which are virtually contiguous around the
address. This must take two locks - memory_hotplug_lock and
priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
allocation/free of memory inside.

In the free callback of the memory hotplug event, freed space is searched
from the MR list and corresponding bits are cleared from the bitmap of MRs.
This can fragment a MR and the MR will have multiple search entries in the
caches. Once there's a change by the event, the global cache must be
rebuilt and all the per-queue caches will be flushed as well. If memory is
frequently freed in run-time, that may cause jitter on dataplane processing
in the worst case by incurring MR cache flush and rebuild. But, it would be
the least probable scenario.

To guarantee the most optimal performance, it is highly recommended to use
an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
and won't be freed dynamically. And it is also recommended to configure
per-lcore cache of Mempool. Even though there're many MRs for a device or
MRs are highly fragmented, the cache of Mempool will be much helpful to
reduce misses on per-queue caches anyway.

'--legacy-mem' is also supported.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: remove memory region support

This patch removes current support of Memory Region (MR) in order to
accommodate the dynamic memory hotplug patch. This patch can be compiled
but traffic can't flow and HW will raise faults. Subsequent patches will
add new MR support.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

ethdev: new Rx/Tx offloads API

This patch check if a input requested offloading is valid or not.
Any reuqested offloading must be supported in the device capabilities.
Any offloading is disabled by default if it is not set in the parameter
dev_conf->[rt]xmode.offloads to rte_eth_dev_configure() and
[rt]x_conf->offloads to rte_eth_[rt]x_queue_setup().
If any offloading is enabled in rte_eth_dev_configure() by application,
it is enabled on all queues no matter whether it is per-queue or
per-port type and no matter whether it is set or cleared in
[rt]x_conf->offloads to rte_eth_[rt]x_queue_setup().
If a per-queue offloading hasn't be enabled in rte_eth_dev_configure(),
it can be enabled or disabled for individual queue in
ret_eth_[rt]x_queue_setup().
A new added offloading is the one which hasn't been enabled in
rte_eth_dev_configure() and is reuqested to be enabled in
rte_eth_[rt]x_queue_setup(), it must be per-queue type,
otherwise trigger an error log.
The underlying PMD must be aware that the requested offloadings
to PMD specific queue_setup() function only carries those
new added offloadings of per-queue type.

This patch can make above such checking in a common way in rte_ethdev
layer to avoid same checking in underlying PMD.

This patch assumes that all PMDs in 18.05-rc2 have already
converted to offload API defined in 17.11 . It also assumes
that all PMDs can return correct offloading capabilities
in rte_eth_dev_infos_get().

In the beginning of [rt]x_queue_setup() of underlying PMD,
add offloads = [rt]xconf->offloads |
dev->data->dev_conf.[rt]xmode.offloads; to keep same as offload API
defined in 17.11 to avoid upper application broken due to offload
API change.
PMD can use the info that input [rt]xconf->offloads only carry
the new added per-queue offloads to do some optimization or some
code change on base of this patch.

Signed-off-by: Wei Dai <wei.dai@intel.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>

net/mlx5: change device reference for secondary process

rte_eth_devices[] is not shared between primary and secondary process, but
a static array to each process. The reverse pointer of device (priv->dev)
is invalid. Instead, priv has the pointer to shared data of the device,
  struct rte_eth_dev_data *dev_data;

Two macros are added,
  #define PORT_ID(priv) ((priv)->dev_data->port_id)
  #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)])

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx4: fix CRC stripping capability report

There are two capabilities related to CRC stripping:
1. mlx4 HW capability to perform CRC stripping on a received packet.
This capability is built in mlx4 HW. It should be returned by the API
call mlx4_get_rx_queue_offloads().
2. mlx4 driver capability to enable/disable HW CRC stripping. This
capability is dependent on the driver version.

Before this commit the second capability was falsely returned by
the mentioned API. This commit fixes it by returning the first
capability.
mlx4 HW performs CRC stripping by default and this capability is
always reported as "true".

The ability to enable/disable CRC stripping is supported since this
commit and requires OFED version 4.3-1.5.0.0 or rdma-core version v18.
CRC stripping will be done by default regardless of its configuration
when working with OFED or rdma-core versions earlier than those
previously specified or before this commit.

Fixes: de1df14e6e6ec ("net/mlx4: support CRC strip toggling")
Cc: stable@dpdk.org
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>

ethdev: fix corrupted device info in configure

Calling dev_infos_get() devops directly in rte_eth_dev_configure cause
random values in uninitialized fields because devops doesn't reset the
dev_info structure.

Call rte_eth_dev_info_get() API instead which memset the struct.

Also remove duplicated dev_infos_get existence check.

Fixes: 3be82f5cc5e3 ("ethdev: support PMD-tuned Tx/Rx parameters")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>

net/failsafe: fix probe cleanup

The hot-plug alarm mechanism is responsible to practically execute both
plug in and out operations. It periodically tries to detect missed
sub-devices to be reconfigured and clean the resources of the removed
sub-devices.

The hot-plug alarm is started by the failsafe probe function, and it's
wrongly not stopped if failsafe instance got an error. for example
when starting failsafe with a MAC option, and giving it an invalid MAC
address this will lead to a NULL pointer for the dev private field. Then
when the hotplug alarm is called it will try to access this pointer,
which will lead to a segmentation fault.

Uninstall the hot-plug alarm in case of error in probe function.

Fixes: ebea83f8 ("net/failsafe: add plug-in support")
Cc: stable@dpdk.org
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>

net/failsafe: advertise supported RSS functions

Advertise failsafe supported RSS functions as part of dev_infos_get
callback. Set failsafe default RSS hash functions to be:
ETH_RSS_IP, ETH_RSS_UDP, and ETH_RSS_TCP.
The result of failsafe RSS hash functions is the logical AND of the
RSS hash functions among all failsafe sub_devices and failsafe own
defaults.

Previous to this commit RSS support was reported as none. Since the
introduction of [1] it is required that all RSS configurations be
verified.

[1] commit 8863a1fbfc66 ("ethdev: add supported hash function check")

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>

net/dpaa: update optimal burst size in device info

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

net/dpaa: fix max push mode queue

Split default and max push mode queues to 4 and 8, respectively.

Fixes: 0c504f6950b6 ("net/dpaa: support push mode")
Cc: stable@dpdk.org
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

net/tap: report on supported RSS hash functions

Report on TAP supported RSS functions as part of dev_infos_get
callback: ETH_RSS_IP, ETH_RSS_UDP and ETH_RSS_TCP.
Known limitation: TAP supports all of the above hash functions together
and not in partial combinations.
Previous to this commit RSS support was reported as none. Since the
introduction of [1] it is required that all RSS configurations will be
verified.

[1] commit 8863a1fbfc66 ("ethdev: add supported hash function check")

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

app/testpmd: fix weak RSS hash key for flow

The default RSS hash key automatically provided by testpmd for RSS actions
specified without one is so weak that traffic can't spread properly on L4
with it (as seen with TCPv6).

It is only 30 bytes long, zero-padded to RSS_HASH_KEY_LENGTH (64 bytes),
later truncated to 40 bytes for most PMDs. The presence of padding is
really what kills balancing.

This patch provides a full 64-byte (non-zero-terminated) string to address
this issue.

Fixes: d0ad8648b1c5 ("app/testpmd: fix RSS flow action configuration")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

app/testpmd: fix empty list of RSS queues for flow

Since the commit referenced below, specifying a RSS action without any
queues (e.g. "actions rss queues end / end") does not override the default
set automatically generated by testpmd.

In short, one cannot instantiate a RSS action with 0 target queues anymore
in order to determine how PMDs react (hint: this is currently undocumented
so they may reject it, however ideally they should interpret it as a
default setting like for other fields where empty values stand for
"defaults".)

Fixes: d0ad8648b1c5 ("app/testpmd: fix RSS flow action configuration")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/bnxt: add NVM specific HWRM commands

This patch adds new NVM specific HWRM commands.
Code using these commands will be added in future patches.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: add HWRM commands for more filtering support

Add HWRM structures to support more filtering features
like metering, tunnel filters and more.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: add async event HWRM commands

add structures to add async_event support.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: update HWRM to version 1.9.2

Update HWRM structures to version 1.9.2

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

doc: remove mention of unreleased nics from enic guide

Company policy discourages the mention of unreleased hardware in
guides and release notes.

Fixes: 08df773 ("doc: update enic guide and features")
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>

net/enic: update UDP RSS controls

Current adapters which support UDP RSS piggyback on TCP RSS. Change
the controls to be forward compatible with future adapters, which will
have independent control of UDP and TCP.

Fixes: 9bd04182bb01 ("net/enic: support UDP RSS on 1400 series adapters")
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>

net/enic: fix RSS hash type advertisement

The NIC can hash these RSS packet types, but they are not advertised
via flow_type_rss_offloads. So add them.
- Part of the IPv4 hash:
  ETH_RSS_FRAG_IPV4
  ETH_RSS_NONFRAG_IPV4_OTHER
- Part of the IPv6 hash:
  ETH_RSS_FRAG_IPV6
  ETH_RSS_NONFRAG_IPV6_OTHER
- Part of the UDP hash:
  ETH_RSS_IPV6_UDP_EX

Also, do not use NIC_CFG_RSS_HASH_TYPE_IPV6_EX and
NIC_CFG_RSS_HASH_TYPE_TCP_IPV6_EX, as they are not needed to enable
RSS over IPv6 with extension headers.

Fixes: c2fec27b5cb0 ("net/enic: allow to change RSS settings")
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>

doc: update the enic guide and features

Add more descriptions regarding SR-IOV and RSS settings.

Remove 'Multicast MAC filter' and add 'Allmulticast mode' to the
features.

Cc: stable@dpdk.org
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>

net/enic: set rte errno to positive value

Related to d9fff8a31, where rte_errno should always have positive
errno values.

Technically this is an ABI change since it fixes an error code
introduced in 18.02, but is minor and inconsequential.

Fixes: 1e81dbb5321b ("net/enic: add Tx prepare handler")
Cc: stable@dpdk.org
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>

net/enic: fix the MTU handler to rely on max packet length

The RQ setup functions (enic_alloc_rq and enic_alloc_rx_queue_mbufs)
have changed to rely on max_rx_pkt_len to determine the use of scatter
and buffer size. But, the MTU handler only updates ethdev's MTU
value. So make it update max_rx_pkt_len as well. Other PMDs also
update both mtu and max_rx_pkt_len in their MTU handlers.

Also the condition for taking a short cut (scatter is disabled) in the
MTU handler is wrong. Even when scatter is disabled, a change in
max_rx_pkt_len may affect the buffer size posted to the NIC. So remove
that condition.

Finally, fix a comment and a warning message condition.

Fixes: 422ba91716a7 ("net/enic: heed the requested max Rx packet size")
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>

net/enic: enable RQ first and then post Rx buffers

Future VIC adapters may require that the driver enable RQ before
posting new buffers to the NIC. So split enic_alloc_rx_queue_mbufs()
into two functions, one that allocates buffers and fills RQ and the
other that posts them (i.e. PIO write to a doorbell). And, call the
post function only after enabling RQ.

Currently released models are not affected by this change, as they
work fine whether the driver posts buffers before or after enabling RQ.

Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>

net/virtio-user: support memory hotplug

When memory is hot-added or hot-removed, the virtio-user driver has to
notify the vhost-user backend with sending a VHOST_USER_SET_MEM_TABLE
request with the new memory map as payload.

This patch implements and registers a mem_event callback, it pauses the
datapath and updates memory regions to vhost in case of hot-add or
hot-remove event. This memory region update has only to be done when the
device is already started, so a new status flag is added to the device to
keep track of the status.

As the device can now be managed by different threads, a mutex is
introduced to protect against concurrent device configuration.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>

vhost: retranslate vring addr when memory table changes

When the vhost-user master sends memory updates using
VHOST_USER_SET_MEM request, the user backends unmap and then
mmap again the memory regions in its address space.

If the ring addresses have already been translated, it needs to
be translated again as they point to unmapped memory.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/mlx5: fix calculation of Tx TSO inline room size

rdma-core doesn't add up max_tso_header size to max_inline_data size. The
library takes bigger value between the two.

Fixes: 43e9d9794cde ("net/mlx5: support upstream rdma-core")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: fix resource leak in case of error

If something went wrong in mlx5_pci_prob the allocated eth dev
will cause a memory leak.

This commit release the eth dev that was previously allocated.

Fixes: 771fa900b73a ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Cc: stable@dpdk.org
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: fix double free on error handling

When attr_ctx is NULL it will attempt to free the list of devices twice.
Avoid double freeing the list by directly going to error handling.

Fixes: 771fa900b73a ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Cc: stable@dpdk.org
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: fix SW parser enabling

Fixes: 5f8ba81c4228 ("net/mlx5: support generic tunnel offloading")
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: fix SW parsing feature detection

Fixes: 5f8ba81c4228 ("net/mlx5: support generic tunnel offloading")
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: document update for Tx

Add document for hw header parsing and SWP.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/ixgbevf: save interrupt mask for performance

If dpdk APPs call the rte_eth_dev_rx_intr_enable or
rte_eth_dev_rx_intr_disable frequently, and ixgbe vf will read
the IXGBE_VTEIMS register everytime. The patch saves the IXGBE_VTEIMS
to mask to avoid read frequently.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Wei Dai <wei.dai@intel.com>

net/ixgbe: write disable to EITR counter

ixgbe doesn't write the EITR counter, disable it now.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/ixgbe: set the default value for EITR

The ixgbe PF and VF use the same default value
for EITR defined in header file.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/ixgbevf: set the interrupt interval for EITR

Set EITR interval as default. This patch can improve the
performance when we enable the rx-interrupt to process the
packets because we hope rx-interrupt reduce CPU. For example,
the 200us value of EITR makes the performance better with
the low CPU. The default value of ITR is 500us, compatible
with RSC of ixgbe PF, and next patch will use the default value.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/sfc/base: add Medford2 head-of-line blocking stats

These stats are availble on Medford2 DPDK firmware variant
which support equal stride super-buffer Rx mode. RXDP_HLB_IDLE
capability bit is set when the stats are available.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>

net/sfc/base: support RxDP scatter disabled truncate counter

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>

net/sfc/base: generate Medford2 RxDP stats

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>

net/sfc/base: fix Medford2 FEC stats range

Fixes: 400ba3daeeb1 ("net/sfc/base: decode Medford2 FEC stats if available")
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>

net/bnxt: clear HWRM sniffer list for PFs

Clear HWRM sniffer list for DPDK PFs so that VFs on
DPDK PFs initialize successfully. DPDK PF driver does not
handle HWRM commands from VFs.

Signed-off-by: Randy Schacher <stuart.schacher@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix usage of vnic id

VNIC ID returned by the FW is a 16-bit field.
We are incorrectly using it as a 32-bit value in few places.
This patch corrects that.

Fixes: daef48efe5e5 ("net/bnxt: support set MTU")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Michael Wildt <michael.wildt@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>

net/bnxt: fix Rx mbuf and agg ring leak in dev stop

In the start/stop_op operation, mbufs allocated for rings were not freed

1) add bnxt_free_tx_mbuf/bnxt_free_rx_mbuf in bnxt_dev_stop_op to free MBUF
   before freeing the rings.
2) MBUF allocation and free routines were not in sync. Allocation uses the
   ring->ring_size including any rounded up and multiple factors while the
   free routine uses the requested queue size.

Fixes: c09f57b49c13 ("net/bnxt: add start/stop/link update operations")
Cc: stable@dpdk.org
Signed-off-by: Jay Ding <jay.ding@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Signed-off-by: Xiaoxin Peng <xiaoxin.peng@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix to reset status of initialization

clear flag on stop at proper location to avoid race conditions.

Fixes: ed2ced6fe927 ("net/bnxt: check initialization before accessing stats")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: return error in stats if init is not complete

return error if init is not complete before accessing stats.

Fixes: ed2ced6fe927 ("net/bnxt: check initialization before accessing stats")
Cc: stable@dpdk.org
Signed-off-by: Jay Ding <jay.ding@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix MTU calculation

We were not considering the case of nested VLANs while
calculating MTU. This patch takes care of the same.

Fixes: daef48efe5e5 ("net/bnxt: support set MTU")
Cc: stable@dpdk.org
Signed-off-by: Qingmin Liu <qingmin.liu@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Jay Ding <jay.ding@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: set MTU in dev config for jumbo packets

MTU setting does not take effect after rte_eth_dev_configure
is called with jumbo enable unless it is configured using the
set_mtu dev_op.

Fixes: daef48efe5e5 ("net/bnxt: support set MTU")
Cc: stable@dpdk.org
Signed-off-by: Qingmin Liu <qingmin.liu@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Jay Ding <jay.ding@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: validate structs and pointers before use

Validate pointers aren't pointing to uninitialized areas
including txq and rxq before using them to avoid
bnxt driver from crashing.

Signed-off-by: Rahul Gupta <rahul.gupta@broadcom.com>
Signed-off-by: Jay Ding <jay.ding@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Tested-by: Randy Schacher <stuart.schacher@broadcom.com>

net/bnxt: update returned error on invalid max ring

Return EINVAL instead of ENOSPC when invalid queue_idx passed in
during rx and tx queue_setup_op routines.

Signed-off-by: Jay Ding <jay.ding@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: rename driver version from Cumulus to NetXtreme

Rename driver version from "Broadcom Cumulus driver" to
"Broadcom NetXtreme driver" to reflect this driver is applicable to
NetXtreme family beyond Cumulus.

Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: rename function checking MAC address

rename check_zero_bytes to bnxt_check_zero_bytes to match proper prefix.

Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: add support for LSC interrupt event

Add support to bnxt driver to register RTE_ETH_EVENT_INTR_LSC
event and monitor physical link status.

Signed-off-by: Qingmin Liu <qingmin.liu@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>

net/mlx4: fix UDP flow rule limitation enforcement

For some unknown reason, priorities do not have any effect on flow rules
that happen to match UDP destination ports. Those are always matched first
regardless.

This patch is a workaround that enforces this limitation at the PMD level;
such flow rules can only be created at the highest priority level for
correctness.

Fixes: a5171594fc3b ("net/mlx4: expose support for flow rule priorities")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

net/mlx5: fix flow validation

Item spec and last are wrongly compared to the NIC capability causing a
validation failure when the mask is null.
This validation function should only verify the user is not configuring
unsupported matching fields.

Fixes: 2097d0d1e2cc ("net/mlx5: support basic flow items and actions")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

net/mlx5: fix probe return value polarity

mlx5 prefixed function returns a negative errno value.
the error handler on mlx5_pci_probe is doing the same.

Fixes: a6d83b6a9209 ("net/mlx5: standardize on negative errno values")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

net/mlx5: fix socket connection return value

Upon success, mlx5_socket_connect should return the fd descriptor of the
primary process

Fixes: a6d83b6a9209 ("net/mlx5: standardize on negative errno values")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

net/mlx5: add Rx and Tx tuning parameters

A new ethdev API was exposed by
commit 3be82f5cc5e3 ("ethdev: support PMD-tuned Tx/Rx parameters")

Enabling the PMD to provide default parameters in case no strict request
from application in order to improve the out of the box experience.

While the current API lacks the means for the PMD to provide the best
possible value, providing the best default the PMD can guess.
The values are based on Mellanox performance report and depends on the
underlying NIC capabilities.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

net/mlx5: fix ethtool link setting call order

According to ethtool_link_setting API recommendation ETHTOOL_GLINKSETTINGS
should be called before ETHTOOL_GSET as the later one deprecated.

Fixes: f47ba80080ab ("net/mlx5: remove kernel version check")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

net/i40e: fix flow RSS key array error

This is a bug introduced into RSS key array span.

Fixes: ac8d22de2394 ("ethdev: flatten RSS configuration in flow API")
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/i40e: fix missing some offload capabilities

MULTI_SEGS and JUMBO_FRAME offload capability should be exposed
both VF and PF since i40e does support it.

Fixes: c3ac7c5b0b8a ("net/i40e: convert to new Rx offloads API")
Fixes: 7497d3e2f777 ("net/i40e: convert to new Tx offloads API")
Signed-off-by: Yanglong Wu <yanglong.wu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/e1000: add minimum support for Broadcom 54616 PHY

If we find a Broadcom 54616, handle as a e1000_phy_none assuming that
the NIC reset has initialized the PHY to a sane state.

Signed-off-by: Chas Williams <chas3@att.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

net/i40e: add workaround promiscuous disable

In scenario of Kernel Driver runs on PF and PMD runs on VF, PMD exit
doesn't disable promiscuous mode, this will cause vlan filter set by
Kernel Driver will not take effect.

This patch will fix it, add promiscuous disable at device disable.

Signed-off-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/ixgbe: add API to update SBP bit

Add ixgbe API to enable SBP bit in FCTRL register
to allow receiving packets that may otherwise be
considered length errors by ixgbe and dropped

Signed-off-by: Shweta Choudaha <shweta.choudaha@att.com>
Reviewed-by: Chas Williams <chas3@att.com>
Reviewed-by: Luca Boccassi <bluca@debian.org>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

net/virtio-user: fix hugepage files enumeration

After the commit 2a04139f66b4 ("eal: add single file segments option"),
one hugepage file could contain multiple hugepages which are further
mapped to different memory regions.

Original enumeration implementation cannot handle this situation.

This patch filters out the duplicated files; and adjust the size after
the enumeration.

Fixes: 6a84c37e3975 ("net/virtio-user: add vhost-user adapter layer")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/vhost: initialise device as inactive

rte_eth_vhost_get_vid_from_port_id returns a value of 0 if
called before the first call to the new_device callback.
A vid value >=0 suggests the device is active which is not
the case in this instance. Initialise vid to a negative
value to prevent this.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")
Cc: stable@dpdk.org
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost/crypto: fix symmetric ciphering

A bracket was misplaced in a condition check, this patch
fixes it.

Coverity issue: 277232, 277237
Fixes: 3bb595ecd682 ("vhost/crypto: add request handler")
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix header copy to discontiguous desc buffer

In the loop to copy virtio-net header to the descriptor buffer,
destination pointer was incremented instead of the source
pointer.

Fixes: fb3815cc614d ("vhost: handle virtually non-contiguous buffers in Rx-mrg")
Fixes: 6727f5a739b6 ("vhost: handle virtually non-contiguous buffers in Rx")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

examples/vhost: fix header copy to discontiguous desc buffer

In the loop to copy virtio-net header to the descriptor buffer,
destination pointer was incremented instead of the source
pointer.

Coverity issue: 277240
Fixes: 82c93a567d3b ("examples/vhost: move to safe GPA translation API")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix typo in comment

Fixes: 3670686ab99f ("vhost: fix race for connection fd")
Cc: stable@dpdk.org
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix crash on closing in client mode

when rte_vhost_driver_unregister detstroy the vsocket, we
should set it to NULL after freeing it, because in client mode,
the conn may be added to reconnect thread while vsocket is
destroyed. In one case, if qemu create vhostuser port as a
server with the same unix path, the reconnect thread will
reconnect to it while vsocket is destroyed.

To fix this:
1. set vsocket to NULL after free it.
2. remove the reconnection from reconnection thread in suitable
position.

Cc: stable@dpdk.org
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix dead lock on closing in server mode

When qemu close the unix socket fd of the vhostuser as a
server, and then immediately delete the vhostuser port on
openvswitch. There will be a deadlock.

A thread (fdset event thread):       B thread:
1. fdset_event_dispatch              rte_vhost_driver_unregister
2. set the fd busy to 1.             lock vsocket->conn_mutex
3. vhost_user_read_cb                fdset_del waits busy changed to 0.
4. vhost peer closed, remove the
   conn from vsocket->conn_list:
   lock vsocket->conn_mutex

5. set the fd busy to 0

Fixes: 65388b43f592 ("vhost: fix fd leaks for vhost-user server mode")
Cc: stable@dpdk.org
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

doc/vhost: update zero copy performance tip

In VM2NIC case zero copy may need some tuning to get best performance.
This patch describes the zero copy starved case and provides a tuning
tip.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/thunderx: remove deprecated Txq flags

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>

net/octeontx: fix missing offload flags

Fix missing DEV_RX_OFFLOAD_CHECKSUM flag in RX offloads.
Remove depricated txq_flags field.

Fixes: a92870896b4a ("net/octeontx: use the new offload APIs")
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>

net/bnxt: remove unused Txq flags

We are still using the txq_flags which is no longer needed with the
new offload API. Cleaning it up.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/fm10k: remove dependence on Tx queue flags

Since we move to new offload APIs, txq_flags is no long needed.
This patch remove the dependence on that.

Fixes: 30f3ce999e6a ("net/fm10k: convert to new Tx offloads API")
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

net/e1000: remove dependence on Tx queue flags

Since we move to new offload APIs, txq_flags is no long needed.
This patch remove the dependence on that.

Fixes: e5c05e6590ea ("net/e1000: convert to new Tx offloads API")
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>