The mbuf offload flags do not match the DPDK namespace (they are
not prefixed by RTE_). Announce their rename in 21.11, and the
removal of the old names in 22.11.
A draft coccinelle script is provided to anticipate what the
renaming will be.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Ray Kinsella [Wed, 4 Aug 2021 09:34:31 +0000 (10:34 +0100)]
doc: add policy for promotion of experimental API
Clarifying the ABI policy on the promotion of experimental APIs to stable.
We have a fair number of APIs that have been experimental for more than
2 years. This policy amendment indicates that these APIs should be
promoted or removed, or should at least form a conversation between the
maintainer and original contributor.
Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Gregory Etelson [Mon, 2 Aug 2021 18:13:16 +0000 (21:13 +0300)]
app/testpmd: fix IPv4 checksum
UDP protocol reserves 0 checksum value for special purposes.
Other protocols, like IPv4, TCP and SCTP must calculate checksum value
in software or offload checksum calculation to hardware.
If IPv4 TX checksum offload was off and header checksum was set to 0,
testpmd csum engine did not calculate checksum value for IPv4, TCP and
SCTP.
The patch always calculates IPv4, TCP and SCTP TX checksums if it is
not offloaded.
Dmitry Kozlyuk [Wed, 4 Aug 2021 08:03:01 +0000 (11:03 +0300)]
bus: clarify log for non-NUMA-aware devices
PCI, vmbus, and auxiliary drivers printed a warning
when NUMA node had been reported as (-1) or not reported by OS:
EAL: Invalid NUMA socket, default to 0
This message and its level might confuse users because the configuration
is valid and nothing happens that requires attention or intervention.
It was also printed without the device identification and with an indent
(PCI only), which is confusing unless DEBUG logging is on to print
the header message with the device name.
Reduce level to INFO, reword the message, and suppress it when there is
only one NUMA node because NUMA awareness does not matter in this case.
Also, remove the indent for PCI.
Fixes: f0e0e86aa35d ("pci: move NUMA node check from scan to probe") Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support") Fixes: 1afce3086cf4 ("bus/auxiliary: introduce auxiliary bus") Cc: stable@dpdk.org Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Reviewed-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Gregory Etelson [Tue, 3 Aug 2021 15:06:58 +0000 (18:06 +0300)]
net/mlx5: fix find sibling devices
The routine mlx5_eth_find_next() and related iterating macro
MLX5_ETH_FOREACH_DEV is used to iterate through sibling devices (all
representors share the same configuration and switching domain) on top
of specified root device.
The root device parameter was specified as NULL, and it caused
missing siblings in iteration during representor device probing,
causing:
1. allocating new domain_id for the device being probed.
2. discrepancy in representor configurations and potential overall
driver malfunctions.
Shun Hao [Wed, 4 Aug 2021 07:26:47 +0000 (10:26 +0300)]
net/mlx5: fix domains detection in meter hierarchy
Meters in one hierarchy might support different domains. For
example, one meter may support ingress only, but the root meter
can support all the domains.
If the later meter in the meter hierarchy wrongly doesn't inherit
the first meter's domains, it will lead to invalid domain table
access.
Fix is when creating meter hierarchy, try to inherit the first meter
domains in the meter hierarchy.
Fixes: a3b7af90baba ("net/mlx5: validate meter action in policy") Cc: stable@dpdk.org Signed-off-by: Shun Hao <shunh@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Shun Hao [Wed, 4 Aug 2021 07:26:46 +0000 (10:26 +0300)]
net/mlx5: fix meter flow counter translation
When a flow rule uses a meter without any modify packet action,
there will be an internal drop flow with meter counter created,
matching the same 5-tuple as the original flow.
In this case, the meter flow count action is wrongly reused as the
original flow counter, leading to wrong flow statistics.
Add a check in the count action translation to detect the meter case
and use the meter drop dedicated counter in the meter 5-tuple flow
only.
Suanming Mou [Mon, 2 Aug 2021 14:30:24 +0000 (17:30 +0300)]
net/mlx5: workaround drop action with old kernel
Currently, there are two types of drop action implementation
in the PMD. One is the DR (Direct Rules) dummy placeholder drop
action and another is the dedicated dummy queue drop action.
When creates flow on the root table with DR drop action, the
action will be converted to MLX5_IB_ATTR_CREATE_FLOW_FLAGS_DROP
Verbs attribute in rdma-core.
In some inbox systems, MLX5_IB_ATTR_CREATE_FLOW_FLAGS_DROP Verbs
attribute may not be supported in the kernel driver. Create flow
with drop action on the root table will be failed as it is not
supported. In this case, the dummy queue drop action should be
used instead of DR dummy placeholder drop action.
This commit adds the DR drop action support detect on the root
table. If MLX5_IB_ATTR_CREATE_FLOW_FLAGS_DROP Verbs is not
supported in the system, a dummy queue will be used as drop
action.
Fixes: da845ae9d7c1 ("net/mlx5: fix drop action for Direct Rules/Verbs") Cc: stable@dpdk.org Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Rongwei Liu [Mon, 2 Aug 2021 12:20:48 +0000 (15:20 +0300)]
net/mlx5: fix VXLAN VNI matching on ConnectX-5
In the recent update, the misc5 matcher was introduced to
match VxLAN header extra fields. However, ConnectX-5
doesn't support misc5 for the UDP ports different from
VXLAN's standard one (4789).
Need to fall back to the previous approach and use legacy
misc matcher if non-standard UDP port is recognized
in VxLAN flow.
Fixes: 630a587bfb37 ("net/mlx5: support matching on VXLAN reserved field") Cc: stable@dpdk.org Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Gregory Etelson [Mon, 2 Aug 2021 14:55:24 +0000 (17:55 +0300)]
net/mlx5: fix port initialization of switch domain
All active ports that belong to the same E-switch share domain_id
value.
Port initialization procedure searches through a database for existing
port with matching properties. New domain_id allocated if match was
not located. Otherwise, new port inherits existing domain_id.
Port initialization did not pass enough info to search procedure to
find existing matches. Therefore, each port was created with a private
domain_id value. As the result, port_id flow action failed because it
could not match ports in a rule to E-switch.
The patch adds dpdk_dev with port properties to device search.
Raja Zidane [Thu, 29 Jul 2021 14:11:08 +0000 (17:11 +0300)]
compress/mlx5: fix compression level translation
Compression Level is interpreted by each PMD differently.
However, lower numbers give faster compression
at the expense of compression ratio, while higher numbers
may give better compression ratios but are likely slower.
The level affects the block size, which affects performance,
the bigger the block, the faster the compression is.
The problem was that higher levels caused bigger blocks:
size = min_block_size - 1 + level.
the solution is to reverse the above:
size = max_block_size + 1 - level.
For Thor, the number of action records is being wrongly configured
to 128 because of incorrect definition of divider. This results in
an incorrect number of action records being negotiated with the FW.
Remove the divider from the templates and delete the logic which
uses the field in the resource manager logic.
Fixes: 3fe124d2536c ("net/bnxt: support Thor platform") Cc: stable@dpdk.org Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Tested-by: Shuanglin Wang <shuanglin.wang@broadcom.com>
doc: announce API changes for Windows compatibility
Windows headers define `s_addr`, `min`, and `max` as macros.
If DPDK headers are included after Windows ones, DPDK structure
definitions containing fields with these names get broken (example 1),
as well as any usage of such fields (example 2). If DPDK headers
undefined these macros, it could break consumer code (example 3).
It is proposed to rename structure fields in DPDK, because Win32 headers
are used more widely than DPDK, as a general-purpose platform compared
to domain-specific kit, and are harder to fix because of that.
Exact new names are left for further discussion.
Example 1:
/* in DPDK public header included after windows.h */
struct rte_type {
int min; /* ERROR: `min` is a macro */
};
Example 2:
#include <rte_ether.h>
#include <winsock2.h>
struct rte_ether_hdr eh;
eh.s_addr.addr_bytes[0] = 0; /* ERROR: `addr_s` is a macro */
Example 3:
#include <winsock2.h>
#include <rte_ether.h>
struct in_addr addr;
addr.s_addr = 0; /* ERROR: there is no `s_addr` field,
and `s_addr` macro is undefined by DPDK. */
Commit 6c068dbd9fea ("net: work around s_addr macro on Windows")
modified definition of `struct rte_ether_hdr` to avoid the issue.
However, the workaround assumes `#define s_addr S_addr.S_un`
in Windows headers, which is not a part of official API.
It also complicates the definition of `struct rte_ether_hdr`.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Khoa To <khot@microsoft.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Akhil Goyal <gakhil@marvell.com>
Thomas Monjalon [Wed, 14 Apr 2021 20:15:43 +0000 (22:15 +0200)]
doc: announce extension of crypto data-unit length
The struct member dataunit_len is introduced in DPDK 21.05.
It is limited to 16 bits to fit a padding hole in 32-bit build.
This means the maximum data-unit length is 64 KB.
Some use cases may benefit of a bigger size as the proposed 32 bits.
Ferruh Yigit [Wed, 30 Jun 2021 09:21:16 +0000 (10:21 +0100)]
doc: announce common prefix for ethdev
Announce adding 'RTE_ETH_' prefix to all public ethdev macros/enums on
v21.11.
Backward compatibility macros will be added on v21.11 and they will be
removed on v22.11.
Henry Nadeau [Thu, 29 Jul 2021 16:48:05 +0000 (12:48 -0400)]
doc: fix spelling
Spell checked and corrected documentation.
If there are any errors, or I have changed something that wasn't an error
please reach out to me so I can update the dictionary.
Cc: stable@dpdk.org Signed-off-by: Henry Nadeau <hnadeau@iol.unh.edu>
Currently the sample app user guides use hard coded code snippets,
this patch changes these to use literalinclude which will dynamically
update the snippets as changes are made to the code.
This was introduced in commit 413c75c33c40 ("doc: show how to include
code in guides"). Comments within the sample apps were updated to
accommodate this as part of this patch. This will help to ensure that
the code within the sample app user guides is up to date and not out
of sync with the actual code.
doc: announce security API changes for inline IPsec
Announce changes to make rte_security_set_pkt_metadata() and
rte_security_get_userdata() inline instead of C functions and
also addition of another field in structure rte_security_ctx for
holding flags.
examples/l2fwd-crypto: support cipher multiple data-unit
The support for multiple data-units includes the next:
- Add a new command-line argument to provide the data-unit length.
- Set the length in the cipher xform.
- Validate device capabilities for this feature.
- Pad the AES-XTS operation length to be aligned to the defined data-unit.
When the PMD is removed, rte_cryptodev_pmd_release_device
is called which frees cryptodev->data, and then tries to free
cryptodev->data->dev_private, which causes the heap use
after free issue.
A temporary pointer is set before the free of cryptodev->data,
which can then be used afterwards to free dev_private.
Ciara Power [Wed, 21 Jul 2021 12:51:22 +0000 (12:51 +0000)]
cryptodev: fix freeing after device release
The PMD destroy function was calling the release function, which frees
cryptodev->data, and then tries to free cryptodev->data->dev_private,
which causes the heap use after free issue.
A temporary pointer is set before the free of cryptodev->data,
which can then be used afterwards to free dev_private.
The free cannot be moved to before the release function is called,
as dev_private is used in the PMD close function while being released.
Fan Zhang [Tue, 27 Jul 2021 15:42:46 +0000 (16:42 +0100)]
crypto/qat: fix raw data path dequeue
This patch fixes the raw data path dequeue burst fail problem.
Previously in case the queue is full and not all packets
asked to be dequeued are processed, the dequeue burst will
never happen.
Fixes: c21574edc52a ("cryptodev: add dequeue count parameter in raw API") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
In a few cases with Thor device, PMD can segfault when VF
representors are specified. Temporarily fix it by preventing
VF reps for Thor device. This will be addressed in next release.
Joyce Kong [Tue, 20 Jul 2021 03:51:25 +0000 (22:51 -0500)]
test/rcu: use compiler atomics for data sync
Covert rte_atomic usages to compiler atomic built-ins in
rcu_perf testcases.
Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Joyce Kong [Tue, 20 Jul 2021 03:51:24 +0000 (22:51 -0500)]
test/service: use compiler atomics for lock sync
Convert rte_atomic usages to compiler atomic built-ins for lock
sync in service_cores testcases.
Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Joyce Kong [Tue, 20 Jul 2021 03:51:23 +0000 (22:51 -0500)]
test/mempool: use compiler atomics for lcores sync
Convert rte_atomic usages to compiler atomic built-ins for lcores
sync in mempool_perf testcases. Meanwhile, remove unnecessary
synchro init as it would be set to 0 when launching cores.
Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Joyce Kong [Tue, 20 Jul 2021 03:51:22 +0000 (22:51 -0500)]
test/mempool: remove unused variable for lcores sync
Remove the unused synchro variable as there is no lcores
sync in mempool function test.
Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Joyce Kong [Tue, 20 Jul 2021 03:51:21 +0000 (22:51 -0500)]
test/mcslock: use compiler atomics for lcores sync
Convert rte_atomic usages to compiler atomic built-ins for lcores
sync in mcslock testcases.
Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Joyce Kong [Tue, 20 Jul 2021 03:51:20 +0000 (22:51 -0500)]
test/rwlock: use compiler atomics for lcores sync
Convert rte_atomic usages to compiler atomic built-ins for lcores
sync in rwlock testcases.
Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Joyce Kong [Tue, 20 Jul 2021 03:51:19 +0000 (22:51 -0500)]
test/spinlock: use compile atomics for lcores sync
Convert rte_atomic usages to compiler atomic built-ins for lcores
sync in spinlock testcases.
Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Chaoyong He [Mon, 10 May 2021 16:53:19 +0000 (18:53 +0200)]
examples/l3fwd: disable multi-queue for single queue
Set the Rx multi-queue mode to NONE when configuring a port that is
associated with hardware that only supports a single Rx queue.
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com> Signed-off-by: Heinrich Kuhn <heinrich.kuhn@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Allow comment (lines starting with '#') and empty lines in input
(rules, traces) files. These lines will be just skipped and shouldn't
affect the result anyhow.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The flow counters is reduced from 8192 to 6912 for Whitney
for compatibility with different versions of FW.
The FW resource manager splits resources for flow offload
and other use cases. A higher value used for flow offload
by the PMD can cause overriding the resources set aside by
FW. This in turn can lead to FW rejecting filter creation
requests during initialization.
Jay Ding [Tue, 20 Jul 2021 14:40:27 +0000 (14:40 +0000)]
net/bnxt: fix initialization with old firmware
Fix the resource qcap list handling to use size based on
FW response.
The size of resource qcap list could be different when FW
and application are not matching. Application should be able
to handle this scenario when the FW is older and the size of
qcap is smaller. Failure to do this causes initialization failure.
This patch is needed for backward compatibility on different
firmware versions.
Fixes: 873661aa641a1 ("net/bnxt: support shared session") Cc: stable@dpdk.org Signed-off-by: Jay Ding <jay.ding@broadcom.com> Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
The event port config set by application in
rte_event_eth_tx_adapter_create API is modified in
default configuration callback function. This patch removes
this hardcode to use application provided event port
config value.
Fixes: a3bbf2e09756 ("eventdev: add eth Tx adapter implementation") Cc: stable@dpdk.org Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
After removing rte_eth_devices from testpmd the vm_hotplug no longer
recovered after removal of a device, because the port was closed
before querying it.
Fixes: 0a0821bcf312 ("app/testpmd: remove most uses of internal ethdev array") Signed-off-by: Paulis Gributs <paulis.gributs@intel.com> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
app/testpmd: fix Tx checksum calculation for tunnel
csumonly engine calculates Tx checksum of a tunnelled packet
for outer headers only or separately for outer and inner headers.
The calculation method is determined by checksum configuration options.
If Tx checksum calculation is separated,
the inner headers are processed before outer headers.
Inner headers processing sets checksum values to 0 unconditionally.
If Tx configuration offloads inner checksums only, outer checksum
calculation in software will read 0 instead of real values
and produce wrong result.
The patch zeroes inner checksums only before software calculation.
Fixes: 6b520d54ebfe ("app/testpmd: use Tx preparation in checksum engine") Cc: stable@dpdk.org Signed-off-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
In function softnic_table_action_profile_free(), the memory referenced
by pointer "ap" in the instance of "struct softnic_table_action_profile"
is not freed.
net/softnic: fix null dereference in arguments parsing
When there is no "firmware" in arguments, the "firmware" pointer is
null, and will be dereferenced by rte_strscpy().
This patch moves the code block which copies character string from
"firmware" to "p->firmware" into the "if" statements where "firmware"
argument exists and it is duplicated successfully.
This fixes using abstract sockets with memifs.
We were not passing the exact addr_len,
which requires zeroing the remaining sun_path
and doesn't appear well in other utilities (e.g. lsof -U)
Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com> Reviewed-by: Jakub Grajciar <jgrajcia@cisco.com>
Ivan Malov [Thu, 29 Jul 2021 09:32:59 +0000 (12:32 +0300)]
common/sfc_efx/base: do not validate MAE action COUNT order
In DPDK + Open vSwitch use case, action COUNT is always the
first one to be added. In particular, it goes before action
DECAP in that use case. The current code enforces the right
order (DECAP goes before COUNT), and this provokes failures.
As an exception, do not validate the order for action COUNT.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
The DPDK ENA driver does not provide multi-segment tx offload capability.
Let's add DEV_TX_OFFLOAD_MULTI_SEGS to ports offload capability by
default, and always set it in dev->data->dev_conf.txmode.offload.
This flag in not listed in doc/guides/nics/features/default.ini, so
ena.ini does not need to be updated.
net/mlx5: fix meter hierarchy validation with yellow
In mlx5 PMD, the meter hierarchy only supports the green color. It
means that a meter action can only be in the green action list. In
the meanwhile, the yellow action list should be empty now. Any
action for the yellow color policy will be considered invalid if
the green color policy is a hierarchy.
Also, the error message printing of meter hierarchy validation is
fixed by removing an incorrect checking.
Fixes: 4b7bf3ffb473 ("net/mlx5: support yellow in meter policy validation") Fixes: a3b7af90baba ("net/mlx5: validate meter action in policy") Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Both green policy and yellow policy could support RSS actions
simultaneous, the Rx queues configuration may be different between
them while the other fields should be the same.
When the only green color policy was supported in the past, the
queues copied and saved in the temporary workspace were used. Since
the yellow support was added, the queues stored in the thread
workspace would be overwritten by the yellow color policy. The flow
rule created using a meter with such a policy would have the same
RSS distribution for both green and yellow packets.
By using the meter action containers RSS information instead of the
workspace RSS, this overwritten can be prevented.
Fixes: b38a12272b3a ("net/mlx5: split meter color policy handling") Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Before the yellow color policy was supported, the only supported
profile of metering is RFC2697 and EIR is not part of the profile.
When creating a meter with this profile, the EIR part was always
zero.
After the yellow color policy supported and RFC2698 & 4115 support
was introduced, EIR is relevant and should be calculated. Usually
the EIR could not be zero and the formula for calculating CIR
mantissa & exponent could be reused.
The EIR could be 0 and then only green and red colors will be
supported from the specification. Both the mantissa and exponent
parts should be set to 0. Currently, the formula wrongly sets
non-zero values for the EIR=0 case.
Setting the mantissa and the exponent parts to zeros when EIR is 0
will solve the issue.
Fixes: 33a7493c8df8 ("net/mlx5: support meter for trTCM profiles") Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
After the support for yellow color and RFC2698 & RFC4115 were added,
the profile validation adjustment was missed. With this fix, the
validation is like below:
1. Legacy metering only supports RFC2697 without EBS.
2. ASO metering can support all three profiles.
3. For backward compatibility, none EBS with RFC2697 profile is
still supported and the checking is done in the meter
creation stage.
In the meanwhile, some checking which was done in the parameters
calculation stage is moved in the validation in order to skip the
useless checking.
Fixes: 33a7493c8df8 ("net/mlx5: support meter for trTCM profiles") Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
net/mlx5: add Tx scheduling check on queue creation
The send scheduling on timestamp offload requires the Send
Queue (SQ) shares its User Access Region (UAR) with the
pacing Clock Queue. The SQ can be created by mlx5 PMD either
with DevX or with Verbs. If the SQ is being created with
DevX, the dedicated UAR can be specified and all the SQs
share the single UAR. Once SQ is being created with Verbs
the SQ's UAR is allocated by the rdma-core library internally
on its own and there is no UAR sharing. This caused hardware
errors on WAIT WQEs and overall send scheduling malfunction.
If SQs are going to be created with Verbs and the send
scheduling offload is explicitly requested via tx_pp devarg
the device probing is rejected as device configuration
can't satisfy the requirements.
net/mlx5: fix timestamp initialization on empty clock queue
The committing completions by clock queue might be delayed
after queue initialization is done and the only Clock Queue
completion entry (CQE) might keep the invalid status till
the CQE first update happens.
The mlx5_txpp_update_timestamp() wrongly recognized invalid
status as error and reported about lost synchronization.
The patch recognizes the invalid status as "not updated yet"
and accurate scheduling initialization routine waits till
CQE first update happens.
Some collateral typos in comment are fixed as well.
net/mlx5: limit implicit MPLS RSS expansion over GRE
As [1] optimized the MPLS RSS expansion before, this commit limits
the implicitly MPLS RSS expansion for MPLSoGRE as well. For the
RSS flow matcher to GRE level only, it will not expand the MPLS
match item for the sub flows due to performance consideration.
The original RSS flow match item:
ETH VLAN IPV6 GRE GRE_KEY END
The previous RSS expansion:
ETH VLAN IPV6 GRE GRE_KEY END
ETH VLAN IPV6 GRE GRE_KEY IPV4 END
ETH VLAN IPV6 GRE GRE_KEY MPLS IPV4 END
ETH VLAN IPV6 GRE GRE_KEY MPLS ETH IPV4 END
New RSS expansion:
ETH VLAN IPV6 GRE GRE_KEY END
ETH VLAN IPV6 GRE GRE_KEY IPV4 END
[1]
commit a26cc30fa046 ("net/mlx5: limit inner RSS expansion for MPLS")
net/mlx5: fix default queue number in RSS flow rule
The selection flags for the RX hash define how the received packets will
be distributed between multiple queues.
When creating a new TIR, the queue_num is set to 1 if none of the selection
flags is set.
Applied the same to the RSS desc before checking if it matches a cached
TIR object to save creating a new object every time.
The RSS hash types defined in the API do not support setting the L4 proto
type (TCP or UDP) without setting the L3 proto. For example, ETH_RSS_TCP
is defined as
(ETH_RSS_NONFRAG_IPV4_TCP | \
ETH_RSS_NONFRAG_IPV6_TCP | \
ETH_RSS_IPV6_TCP_EX).
The L3 proto of the RSS hash type may be different than the one defined
in the pattern, for example:
testpmd> flow create .../ ipv4 / tcp / end actions rss types ipv6-tcp-ex
end / end
If the RSS hash type also includes L4 proto type as in the above example,
the selection flags for the RX hash are currently set with SPORT/DPORT
without setting SRC/DST IP. As this combination is not supported, it does
not match any of the pre-created TIRs of the indirect RSS action
and the flow creation fails.
The fix is to prevent setting the selection flags for the RX hash with
SPORT/DPORT without setting SRC/DST IP. It applies non-RSS processing of
the received packets. In case of indirect RSS action, it will match the
MLX5_RSS_HASH_NONE pre-created TIR.
Fixes: b1d63d829378 ("net/mlx5: support RSS on src or dst fields only") Fixes: 4a78c88e3bae ("net/mlx5: fix Verbs flow tunnel") Cc: stable@dpdk.org Signed-off-by: Lior Margalit <lmargalit@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Jiawei Wang [Mon, 26 Jul 2021 06:22:33 +0000 (09:22 +0300)]
net/mlx5: fix mirror flow split with L3 encapsulation
Due to hardware limitations, the decap action (such as
VXLAN/NVGRE/RAW decap) can't follow the sample action in the
same flow, to keep the original action order of sample and decap
actions the flow was internally split into two subflows by PMD,
the sample action was moved into prefix subflow in the original table,
and decap action was moved into suffix subflow in the new table.
There is a specific combination of raw decap and raw encap actions
to specify "L3 encapsulation" packet transformation - raw decap action
to remove L2 header and raw encap to add the tunnel header.
This specific L3 encapsulation is encoded as a single packet reformat
hardware transaction and is supported by hardware after sample
action (no hardware limitations for packet reformat).
The "L3 encapsulation" with mirror actions in the same flow was not handled
correctly in the previous commit.
The patch checks whether the decap action is part of "L3 encapsulation"
and does not move the decap action into suffix subflow for the case.
Fixes: cafd87f62a06 ("net/mlx5: fix VLAN push/pop and decap actions with mirror") Cc: stable@dpdk.org Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
net/mlx5: fix queue leaking in hairpin auto bind check
During the start up stage, the hairpin auto bind was executed for
each port. All the Tx and Rx queues configured for this port should
be checked to confirm if the auto bind of hairpin is needed.
1. The queue is hairpin queue.
2. The peer port is the same one and the peer queue should also be
with hairpin type.
3. The manual bind attribute is not set for this queue.
If the queue is not a hairpin queue or it doesn't need to be bound
automatically, the reference count should be decreased by 1 since
the count was increased when calling the mlx5_*xq_get().
When the peer port is not the same, it means that no auto bind is
supported and the mlx5_*xq_release() was missed in the current
implementation.
By calling the release function before continue, the count is
correct when calling the device close.
In mlx5 PMD the PCI device interrupt vector was used by Uplink
representor exclusively and other VF representors did not support
interrupt mode.
All the VFs and Uplink representors are separate ethernet devices
and must have dedicated interrupt vectors.
The fix provides each representor with a dedicated interrupt
vector.
Kernel PF may not respond to virtual channel commands
VIRTCHNL_OP_GET_RSS_HENA_CAPS and VIRTCHNL_OP_SET_RSS_HENA, which
will cause VF to fail to start.
RSS offload type configuration is not a necessary feature for VF,
so in order to improve VF compatibility, in this patch the PMD will
ignore the error result of above two commands and will print warnings
instead.
Xiaoyun Li [Thu, 22 Jul 2021 07:56:20 +0000 (15:56 +0800)]
net/iavf: fix Tx threshold check
Function check_tx_thresh is called with wrong parameter. If the
check fails, tx_queue_setup should return error not keep going.
This patch fixes above issues.
Fixes: 69dd4c3d0898 ("net/avf: enable queue and device") Cc: stable@dpdk.org Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> Acked-by: Beilei Xing <beilei.xing@intel.com>
When virtio front-end initializes, the duplex mode should be set
unknown before reading any duplex mode information from configuration
space. This patch fixes the issue that duplex mode is by default set
to zero, which equals ETH_LINK_HALF_DUPLEX. This will lead to duplex
mode being half duplex when front-end does not have the feature
named VIRTIO_NET_F_SPEED_DUPLEX.
Fixes: 1357b4b36246 ("net/virtio: support Virtio link speed feature") Cc: stable@dpdk.org Signed-off-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Gaoxiang Liu [Mon, 26 Jul 2021 14:42:05 +0000 (22:42 +0800)]
net/virtio: fix interrupt handle leak
Free memory of interrupt handle in virtio_user_dev_uninit() to
avoid memory leak.
when virtio user dev closes, memory of interrupt handle is not freed
that is allocated in virtio_user_fill_intr_handle().
Fixes: 3d4fb6fd2505 ("net/virtio-user: support Rx interrupt") Cc: stable@dpdk.org Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Maxime Coquelin [Mon, 26 Jul 2021 07:58:14 +0000 (09:58 +0200)]
vhost: fix crash on reconnect
When the vhost-user frontend like Virtio-user tries to
reconnect to the restarted Vhost backend, the Vhost backend
segfaults when multiqueue is enabled.
This is caused by VHOST_USER_GET_VRING_BASE being called for
a virtqueue that has not been created before, causing a NULL
pointer dereferencing.
This patch adds the VHOST_USER_GET_VRING_BASE requests to
the list of requests that trigger queue pair allocations.
Fixes: 160cbc815b41 ("vhost: remove a hack on queue allocation") Cc: stable@dpdk.org Reported-by: Yinan Wang <yinan.wang@intel.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Yinan Wang <yinan.wang@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Ivan Ilchenko [Wed, 21 Jul 2021 09:22:25 +0000 (12:22 +0300)]
net/virtio: report maximum MTU in device info
Fix the driver to report maximum MTU obtained from config if
VIRTIO_NET_F_MTU is supported or calculated based on maximum
Rx packet length.
Fixes: ad97ceece12c ("ethdev: add min/max MTU to device info") Cc: stable@dpdk.org Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Andrew Rybchenko [Thu, 17 Jun 2021 14:20:25 +0000 (17:20 +0300)]
app/testpmd: send failure logs to stderr
Running with stdout suppressed or redirected for further processing
is very confusing in the case of errors. Fix it by logging errors and
warnings to stderr.
Since lines with log messages are touched anyway concatenate split
format strings to make it easier to search using grep.
Fix indent of format string arguments.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
app/testpmd: remove most uses of internal ethdev array
This patch removes most uses of the global variable rte_eth_devices
from testpmd. This was done to avoid using the object directly which
applications should not do.
Most uses have been replaced with standard function calls, however
the use of it in the show_macs function could not be replaced as no
function call exists to get all mac addresses of a given port.
MAC address of each port in global variable ports hasn't been updated
after resetting. It was the initial one after resetting VF MAC address.
This patch gets correct port MAC address when starting port.
Fixes: a5279d25616d ("app/testpmd: check status of getting MAC address") Cc: stable@dpdk.org Signed-off-by: Yuying Zhang <yuying.zhang@intel.com> Acked-by: Aman Deep Singh <aman.deep.singh@intel.com> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
Huisong Li [Fri, 23 Apr 2021 11:01:11 +0000 (19:01 +0800)]
sched: fix profile allocation failure handling
This patch fixes return value judgment when allocate memory to store the
subport profile, and releases memory of 'rte_sched_port' if code fails to
apply for this memory.
Fixes: 0ea4c6afcaf1 ("sched: add subport profile table") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
power: check frequencies count before filling array
The freqs array size is RTE_MAX_LCORE_FREQS. Before filling the
array with num_freqs elements, restrict the total num to
RTE_MAX_LCORE_FREQS. This fix aims to fix the coverity scan issue
like:
Overrunning array "pi->freqs" of 256 bytes by passing it to a
function which accesses it at byte offset 464.
Coverity issue: 371913 Fixes: ef1cc88f1837 ("power: support cppc_cpufreq driver") Cc: stable@dpdk.org Signed-off-by: Richael Zhuang <richael.zhuang@arm.com> Acked-by: David Hunt <david.hunt@intel.com>
The first argument to rte_bsf32_safe was incorrectly declared as
a 64 bit value. The code only works on 32 bit values and the underlying
function rte_bsf32 only accepts 32 bit values. This was a mistake
introduced when the safe version was added and probably cause
by copy/paste from the 64 bit version.
The bug passed silently under the radar until some other code was
built with -Wall and -Wextra in C++ and C++ complains about the
missing cast.
Yes, this is a API signature change, but the original code was wrong.
It is an inline so not an ABI change.