David Marchand [Wed, 7 Apr 2021 09:06:56 +0000 (11:06 +0200)]
service: clean references to removed symbol
rte_service_get_id() was removed in v17.11 but the API description
still referenced it and a version node was still present in EAL map.
Fixes: 8edc9aaaf217 ("service: use id in get by name function") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Avoid race with unregister interrupt handler if interrupt
source has some active callbacks at the moment, use wrapper
around rte_intr_callback_unregister() to check for -EAGAIN
return value and to loop until rte_intr_callback_unregister()
succeeds.
Roy Shterman [Mon, 22 Feb 2021 10:41:31 +0000 (12:41 +0200)]
mem: fix freeing segments in --huge-unlink mode
When using huge_unlink we unlink the segment right
after allocation. Although we unlink the file we keep
the fd in fd_list so file still exist just the path deleted.
When freeing the hugepage we need to close the fd and assign
it with (-1) in fd_list for the page to be released.
The current flow fails rte_malloc in the following flow when working
with --huge-unlink option:
1. alloc_seg() for segment A -
We allocate a segment, unlink the path to the segment
and keep the file descriptor in fd_list.
2. free_seg() for segment A -
We clear the segment metadata and return - without closing fd
or assigning (-1) in fd list.
3. alloc_seg() for segment A again -
We find segment A as available, try to allocate it,
find the old fd in fd_list try to unlink it
as part of alloc_seg() but failed because path doesn't exist.
The impact of such error is falsely failing rte_malloc()
although we have hugepages available.
Fixes: d435aad37da7 ("mem: support --huge-unlink mode") Cc: stable@dpdk.org Signed-off-by: Roy Shterman <roy.shterman@vastdata.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Thomas Monjalon [Mon, 5 Apr 2021 09:15:05 +0000 (11:15 +0200)]
bus/pci: rename probe/remove operation types
The names of the prototypes pci_probe_t and pci_remove_t
are missing a prefix rte_.
These function types are simply renamed.
No compatibility break is expected for the applications
because it is considered as an internal name in the driver interface.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: David Marchand <david.marchand@redhat.com>
power: do not skip saving original P-state governor
Currently, when we set the pstate governor to "performance", we check if
it is already set to this value, and if it is, we skip setting it.
However, we never save this value anywhere, so that next time we come
back and request the governor to be set to its original value, the
original value is empty.
Fix it by saving the original pstate governor first. While we're at it,
replace `strlcpy` with `rte_strscpy`.
Previous fix for base frequency handling in pstate mode introduced a
couple of issues:
- When base_frequency file does not exist, it simply bails out because
of what appears to be accidental addition of FOPEN_OR_ERR_RET. This is
incorrect, as absence of this file is not fatal and is in fact
expected on kernel versions earlier than 5.3
- When base_frequency file does exist, it gets opened, but never gets
closed, resulting in a resource leak
Both issues also manifest themselves as Coverity defects (dead code, and
a resource leak), so this fix addresses both.
David Marchand [Thu, 1 Apr 2021 19:58:42 +0000 (21:58 +0200)]
doc: fix sphinx rtd theme import in GHA
If the rtd theme is available, passing it by name is enough to select
it. Sphinx itself recognises the "sphinx_rtd_theme" name as a special
case and tries to find its path automatically.
On the other hand, passing a html_theme_path makes sphinx parse all
themes availables in this path, which in some environment (like GHA) is
/usr/share and makes sphinx error on the first zipfile it finds (in GHA,
some Azure CLI thingy) that has no sphinx theme in it.
Fixes: 46562be65094 ("doc: import sphinx rtd theme when available") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com>
Matan Azrad [Mon, 1 Mar 2021 10:41:31 +0000 (10:41 +0000)]
vdpa/mlx5: fix virtq cleaning
The HW virtq object can be destroyed either when the device is closed or
when the state of the virtq becomes disabled.
Some parameters of the virtq should continue to be managed when the
virtq state is changed but all of them must be initialized when the
device is closed.
Wrongly, the enable parameter stayed on when the device is closed what
might cause creation of invalid virtq in the next time a device is
assigned to the driver.
Clean all the virtqs memory when the device is closed.
Fixes: c47d6e83334e ("vdpa/mlx5: support queue update") Cc: stable@dpdk.org Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
David Marchand [Mon, 1 Feb 2021 17:46:01 +0000 (18:46 +0100)]
net/virtio: remove duplicated port ID from virtio-user
The private virtio_user_dev structure embeds a virtio_hw which itself
contains the ethdev port_id.
Make use of it and remove the duplicate port_id field.
Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Similar as single dequeue, the multiple accesses of descriptor length
will lead to potential risk. One-time access of descriptor length can
eliminate this risk.
Fixes: 75ed51697820 ("vhost: add packed ring batch dequeue") Cc: stable@dpdk.org Signed-off-by: Marvin Liu <yong.liu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Marvin Liu [Wed, 31 Mar 2021 06:49:38 +0000 (14:49 +0800)]
vhost: fix packed ring potential buffer overflow
Similar as split ring, the multiple accesses of descriptor length will
lead to potential risk. One-time access of descriptor length can
eliminate this risk.
Fixes: 2f3225a7d69b ("vhost: add vector filling support for packed ring") Cc: stable@dpdk.org Signed-off-by: Marvin Liu <yong.liu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Marvin Liu [Wed, 31 Mar 2021 06:49:37 +0000 (14:49 +0800)]
vhost: fix split ring potential buffer overflow
In vhost datapath, descriptor's length are mostly used in two coherent
operations. First step is used for address translation, second step is
used for memory transaction from guest to host. But the interval between
two steps will give a window for malicious guest, in which can change
descriptor length after vhost calculated buffer size. Thus may lead to
buffer overflow in vhost side. This potential risk can be eliminated by
accessing the descriptor length once.
Fixes: 1be4ebb1c464 ("vhost: support indirect descriptor in mergeable Rx") Cc: stable@dpdk.org Signed-off-by: Marvin Liu <yong.liu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Maxime Coquelin [Tue, 23 Mar 2021 09:02:18 +0000 (10:02 +0100)]
vhost: move dirty logging cache out of virtqueue
This patch moves the per-virtqueue's dirty logging cache
out of the virtqueue struct, by allocating it dynamically
only when live-migration is enabled.
It saves 8 cachelines in vhost_virtqueue struct.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Balazs Nemeth <bnemeth@redhat.com>
Maxime Coquelin [Tue, 16 Mar 2021 09:38:24 +0000 (10:38 +0100)]
net/virtio: allocate fake mbuf in Rx queue
While it is worth clarifying whether the fake mbuf
in virtnet_rx struct is really necessary, it is sure
that it heavily impacts cache usage by being part of
the struct. Indeed, it uses two cachelines, and
requires alignment on a cacheline.
Before this series, it means it took 120 bytes in
virtnet_rx struct:
Balazs Nemeth [Fri, 26 Mar 2021 11:01:30 +0000 (12:01 +0100)]
net/qede: remove unnecessary field in Rx entry and simplify
The member page_offset is always zero. Having this in the qede_rx_entry
makes it larger than it needs to be and this has cache performance
implications so remove that field. In addition, since qede_rx_entry only
has an rte_mbuf*, remove the definition of qede_rx_entry.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Balazs Nemeth [Fri, 26 Mar 2021 11:01:29 +0000 (12:01 +0100)]
net/qede: prefetch next packet to free
While handling the current mbuf, pull the next mbuf into the cache. Note
that the last four mbufs pulled into the cache are not handled, but that
doesn't matter.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Balazs Nemeth [Fri, 26 Mar 2021 11:01:27 +0000 (12:01 +0100)]
net/qede: free packets in bulk
rte_pktmbuf_free_bulk calls rte_mempool_put_bulk with the number of
pending packets to return to the mempool. In contrast, rte_pktmbuf_free
calls rte_mempool_put that calls rte_mempool_put_bulk with one object.
An important performance related downside of adding one packet at a time
to the mempool is that on each call, the per-core cache pointer needs to
be read from tls while a single rte_mempool_put_bulk only reads from the
tls once.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Balazs Nemeth [Fri, 26 Mar 2021 11:01:26 +0000 (12:01 +0100)]
net/qede: assume mbuf to free is never null
The ring txq->sw_tx_ring is managed with txq->sw_tx_cons. As long as
txq->sw_tx_cons is correct, there is no need to check if
txq->sw_tx_ring[idx] is null explicitly.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Leaving the unused flags member here has a few performance implications.
First, each qede_tx_entry takes up more memory which has caching
implications as less entries fit in a cache line while multiple entries
are frequently handled in batches. Second, an array of qede_tx_entry
entries is incompatible with existing APIs that expect an array of
rte_mbuf pointers. Consequently, an extra array need to be allocated
before calling such APIs and each entry needs to be copied over.
This patch omits the flags field and replaces the qede_tx_entry entry
by a simple rte_mbuf pointer.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Modification of the 802.1Q Tag Identifier, VXLAN Network
Identifier or GENEVE Network Identifier is not supported.
Reject attempt to modify these fields via the MODIFY_FIELD
action and document this mlx5 driver limitation.
Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
There is a limitation about copying one header field to another for
the Flow group 0. Such copy action is not allowed there. But setting
a header field with an immediate value is perfectly fine.
Allow the MODIFY_FIELD action on group 0 in case the source field
is an immediate value or a pointer to it.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
net/mlx5: check extended metadata for mark modification
The MODIFY_FIELD action requires the extended metadata support
in order to manipulate on MARK register. Check if it is supported
and reject the MODIFY_FIELD action if it is not.
Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Xueming Li [Sun, 28 Mar 2021 13:48:13 +0000 (13:48 +0000)]
net/mlx5: fix setting VF default MAC through representor
With kernel bonding, there was an error when setting VF MAC address
through representor. The Netlink API requires ifindex of owner PF, not
bonding device ifindex.
Uses owner PF ifindex to modify VF default MAC in case of bonding
device.
Fixes: c21e5facf7d2 ("net/mlx5: use bond index for netdev operations") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Xueming Li [Sun, 28 Mar 2021 13:48:12 +0000 (13:48 +0000)]
net/mlx5: save bonding member ports information
Since kernel bonding netdev doesn't provide statistics counter that
reflects all member ports, PMD has to manually summarize counters from
each member ports.
As a preparation, this patch collects bonding member port information
and saves to shared context data.
Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Xueming Li [Sun, 28 Mar 2021 13:48:11 +0000 (13:48 +0000)]
net/mlx5: support list of representor PF
To probe representors from different kernel bonding PFs, had to specify
2 separate devargs like this:
-a 03:00.0,representor=pf0vf[0-3] -a 03:00.0,representor=pf1vf[0-3]
This patch supports range or list of PF section in devargs, so the
alternative short devargs of above is:
-a 03:00.0,representor=pf[0-1]vf[0-3]
Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Xueming Li [Sun, 28 Mar 2021 13:48:10 +0000 (13:48 +0000)]
net/mlx5: refactor bonding representor probing
To probe representor on 2nd PF of kernel bonding device, had to specify
PF1 BDF in devarg:
<PF1_BDF>,representor=0
When closing bonding device, all representors had to be closed together
and this implies all representors have to use primary PF of bonding
device. So after probing representor port on 2nd PF, when locating new
probed device using device argument, the filter used 2nd PF as PCI
address and failed to locate new device.
Conflict happened by using current representor devargs:
- Use PCI BDF to specify representor owner PF
- Use PCI BDF to locate probed representor device.
- PMD uses primary PCI BDF as PCI device.
To resolve such conflicts, new representor syntax is introduced here:
<primary BDF>,representor=pfXvfY
All representors must use primary PF as owner PCI device, PMD internally
locate owner PCI address by checking representor "pfX" part. To EAL, all
representors are registered to primary PCI device, the 2nd PF is hidden
to EAL, thus all search should be consistent.
Same to VF representor, HPF (host PF on BlueField) uses same syntax to
probe, example: representor=pf1vf[0-3,-1]
This patch also adds pf index into kernel bonding representor port name:
<BDF>_<ib_name>_representor_pf<X>vf<Y>
Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Xueming Li [Sun, 28 Mar 2021 13:48:09 +0000 (13:48 +0000)]
net/mlx5: revert setting bonding representor to first PF
With kernel bonding, representors on second PF are being probed by
devargs:
<primary_bdf>,representor=pf1vf<N>
No need to save primary PF port ID and lookup when probing sibling
ports, revert patch [1]
[1]:
commit e6818853c022 ("net/mlx5: set representor to first PF in bonding mode")
Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Xueming Li [Sun, 28 Mar 2021 13:48:07 +0000 (13:48 +0000)]
common/mlx5: support sub-function representor parsing
This patch supports representor name parsing for SF.
In sysfs, representor name stored under "phys_port_name" sysfs key,
similar to VF representor, switch port name of SF representor is
"pf<x>sf<y>".
For netlink message, net SF type is supported.
Examples:
pf0sf1
pf0sf[0-3]
Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Qi Zhang [Thu, 25 Mar 2021 12:42:41 +0000 (20:42 +0800)]
net/ice: refine RSS configure
The ICE_RSS_ANY_HEADERS will try to enable outer RSS for
non-tunnel case and inner RSS for tunnel case. This confuse
user.
As we already have ICE_RSS_INNER_HEADER for tunnel case,
So, replace ICE_RSS_ANY_HEADERS with ICE_RSS_OUTER_HEADERS
for all exist flow which only specified the outer pattern.
To enable inner RSS for any tunnel cases, a separated rule
should be enabled.
The patch also remove some unnecessary condition check for GTPU
in base code, as we already can support outer RSS for GTPU.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Xuan Ding <xuan.ding@intel.com>
Murphy Yang [Mon, 29 Mar 2021 08:28:45 +0000 (08:28 +0000)]
net/ixgbe: fix RSS RETA being reset after port start
If one calls ‘rte_eth_dev_rss_reta_update’ with ixgbe before starting
the device (but after setting everything else), then RSS RETA
configuration will be zero after starting the device.
This patch gives a notification if the port not started.
Bugzilla ID: 664 Fixes: 249358424eab ("ixgbe: RSS RETA configuration") Cc: stable@dpdk.org Signed-off-by: Murphy Yang <murphyx.yang@intel.com> Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Qi Zhang [Mon, 1 Mar 2021 07:57:14 +0000 (15:57 +0800)]
net/iavf: fix TSO max segment size
According to Intel® AVF spec
(https://www.intel.com/content/dam/
www/public/us/en/documents/product-specifications/
ethernet-adaptive-virtual-function-hardware-spec.pdf)
section 2.2.2.3:
The max segment size(MSS) of TSO should not be set lower than 88.
Robin Zhang [Fri, 12 Mar 2021 08:52:09 +0000 (08:52 +0000)]
net/i40e: announce request queue capability in PF
A new feature requesting additional queues from PF is added in iavf;
before sending VIRTCHNL_OP_REQUEST_QUEUES op code, the offload
capability flag VIRTCHNL_VF_OFFLOAD_REQ_QUEUES will be checked.
And due to DPDK PF is still used by some cases, add this offload
capability flag in i40e PF.
Fixes: cbdbd360f77f ("net/i40e: support AVF basic interface") Cc: stable@dpdk.org Signed-off-by: Robin Zhang <robinx.zhang@intel.com> Acked-by: Jeff Guo <jia.guo@intel.com>
David Hunt [Thu, 11 Mar 2021 11:55:10 +0000 (11:55 +0000)]
net/iavf: implement power management
Implement support for the power management API by implementing a
`get_monitor_addr` function that will return an address of an RX ring's
status bit.
This patch is basically a cut-and-paste of the changes already
committed in ixgbe, i40e and ice drivers in 21.02. This extends
the availability of the power-saving mechanism to the iavf driver,
which is needed for those use-cases using virtual functions.
Patchset where PMD Power Management added in 21.02:
http://patchwork.dpdk.org/project/dpdk/list/?series=14756
Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Feifei Wang [Wed, 10 Mar 2021 02:40:29 +0000 (10:40 +0800)]
net/i40e: fix parsing packet type for NEON
In i40e NEON vector Rx path, the packet descs processing is incorrect.
This caused wrong packet type been filled in mbuf.
To fix this, when shifting the pktlen field to be 16-bit aligned, it
only needs to process the high 16bit of the packet descs instead of
the high 32bit.
Test Results:
Architecture: arm64
NIC: XL710
Driver: i40e
Package: Ether()/IP()/
Without this patch:
desc_to_ptype_v: ptype = 7 (error)
With this patch:
desc_to_ptype_v: ptype = 23 (correct)
Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM") Cc: stable@dpdk.org Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Tested-by: Kathleen Capella <kathleen.capella@arm.com>
Ivan Malov [Fri, 26 Mar 2021 09:39:27 +0000 (12:39 +0300)]
net/sfc: fix error path inconsistency
At the fail label, there's a statement to set general errno and
error message. However, before the label is reached, a custom
error message can be set by the code which parses actions.
This custom (action-specific) message, when present,
must not be replaced by the general one.
Fixes: 662286ae61d2 ("net/sfc: add actions parsing stub to MAE backend") Cc: stable@dpdk.org Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Ciara Loftus [Thu, 25 Mar 2021 08:22:09 +0000 (08:22 +0000)]
net/af_xdp: mark recvfrom return as ignored
Coverity complains that the return value of recvfrom() in the AF_XDP
datapath is not checked. We don't care about the return value because in
the case of an error we still return 0 from the receive function to
indicate no packets were received. So to make Coverity happy we cast the
return to 'void'.
Coverity issue: 369671 Fixes: 63e8989fe5a4 ("net/af_xdp: use recvfrom instead of poll syscall") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Guoyang Zhou [Tue, 23 Mar 2021 13:17:51 +0000 (21:17 +0800)]
net/hinic: fix crash in secondary process
Some apps, such as fstack, will use secondary process to access the
memory of eth_dev_ops, and they want to get the info of dev, but hinic
driver does not initialized it when in secondary process.
Tyler Retzlaff [Fri, 12 Mar 2021 22:20:06 +0000 (14:20 -0800)]
ethdev: install driver headers
Introduce a meson option 'enable_driver_sdk', when true installs internal
driver headers for ethdev. This allows drivers that do not depend on
stable api/abi to be built external to the dpdk source tree.
Chengchang Tang [Tue, 23 Mar 2021 13:45:56 +0000 (21:45 +0800)]
net/hns3: fix long task queue pairs reset time
Currently, the queue reset process needs to be performed one by one,
which is inefficient. However, the queues reset in the same function is
almost at the same stage. To optimize the queue reset process, a new
function has been added to the firmware command HNS3_OPC_CFG_RST_TRIGGER
to reset all queues in the same function at a time. And the related
queue reset MBX message is adjusted in the same way too.
Fixes: bba636698316 ("net/hns3: support Rx/Tx and related operations") Cc: stable@dpdk.org Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Huisong Li [Tue, 23 Mar 2021 13:45:55 +0000 (21:45 +0800)]
net/hns3: fix link update when failed to get link info
In the "hns3_dev_link_update" API, the link information of the port is
obtained first, and then 'dev_link' in dev->data is updated. When the
driver is resetting or fails to obtain link info, the current driver
still reports the previous link info to the user. This may cause that
the dev->data->dev_link may be inconsistent with the hw link status.
Therefore, the link status consistency between the hardware, driver,
and framework can be ensured in this interface regardless of whether
the driver is normal or abnormal.
Fixes: 109e4dd1bd7a ("net/hns3: get link state change through mailbox") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Chengchang Tang [Tue, 23 Mar 2021 13:45:54 +0000 (21:45 +0800)]
net/hns3: fix Tx checksum for UDP packets with special port
For Kunpeng920 network engine, UDP packets with destination port 6081,
4789 or 4790 will be identified as tunnel packets. If the UDP CKSUM
offload is set in the mbuf, and the TX tunnel mask is not set, the
CKSUM of these packets will be wrong. In this case, the upper layer
user may not identify the packet as a tunnel packet, and processes it
as non-tunnel packet, and expect to offload the outer UDP CKSUM, so
they may not fill the outer L2/L3 length to mbuf. However, the HW
identifies these packet as tunnel packets and therefore offload the
inner UDP CKSUM. As a result, the inner and outer UDP CKSUM are
incorrect. And for non-tunnel UDP packets with preceding special
destination port will also exist similar checksum error.
For the new generation Kunpeng930 network engine, the above errata
have been fixed. Therefore, the concept of udp_cksum_mode is
introduced. There are two udp_cksum_mode for hns3 PMD,
HNS3_SPECIAL_PORT_HW_CKSUM_MODE means HW could solve the above
problem. And in HNS3_SPECIAL_PORT_SW_CKSUM_MODE, hns3 PMD will check
packets in the Tx prepare and perform the UDP CKSUM for such packets
to avoid a checksum error.
Fixes: bba636698316 ("net/hns3: support Rx/Tx and related operations") Cc: stable@dpdk.org Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Huisong Li [Tue, 23 Mar 2021 13:45:52 +0000 (21:45 +0800)]
net/hns3: fix build for SVE path
The 'queue_full_cnt' stats have been encapsulated in 'dfx_stats'.
However, the modification in the SVE algorithm is omitted.
As a result, the driver fails to be compiled when the SVE
algorithm is used.
Fixes: 9b77f1fe303f ("net/hns3: encapsulate DFX stats in datapath") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Huisong Li [Tue, 23 Mar 2021 13:45:51 +0000 (21:45 +0800)]
net/hns3: fix reporting undefined speed
There may be a case in future that the speed obtained from firmware
is undefined (such as, 400G or other rate), and link status of device is
up. At this case, PMD driver will reports 100Mbps to the user in the
"hns3_dev_link_update" API, which is unreasonable. Besides, if the
speed from firmware is zero, driver should report zero instead of
100Mbps.
Fixes: 59fad0f32135 ("net/hns3: support link update operation") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Jiawen Wu [Mon, 29 Mar 2021 03:17:21 +0000 (11:17 +0800)]
net/txgbe: update link setup process of backplane NICs
Add device arguments to support runtime options.
And use these configuration to control the link setup flow, to adapt to
different NIC's construction. Use firmware version to control the impact
of firmware update. And fix some left bugs.
Since rte_flow is the only API for filtering operations,
the legacy driver interface filter_ctrl was too much complicated
for the simple task of getting the struct rte_flow_ops.
The filter type RTE_ETH_FILTER_GENERIC and
the filter operarion RTE_ETH_FILTER_GET are removed.
The new driver callback flow_ops_get replaces filter_ctrl.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Xiaoyu Min [Thu, 18 Mar 2021 11:03:57 +0000 (11:03 +0000)]
net/mlx5: support RSS expansion for IPv6 GRE
Currently RSS expansion only support IPv4 as GRE payload or
delivery protocol (RFC2784). IPv6 as GRE payload or delivery protocol
(RFC7676) is not supported.
This patch add RSS expansion for RFC7676 so PMD can expand flow item
correctly.
Fixes: f4b901a46aec ("net/mlx5: add flow GRE item") Cc: stable@dpdk.org Signed-off-by: Xiaoyu Min <jackmin@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Li Zhang [Tue, 16 Mar 2021 12:05:17 +0000 (14:05 +0200)]
net/mlx5: fix flow actions index in cache
When using port id or push VLAN action index to find
the action in cache, it will fail to find actions.
The root cause is the index is not saved in cache when
creating the port id action or push vlan action.
To fix these issues, update the index in cache when creating.
Fixes: 0fd5f82aaa07 ("net/mlx5: make port ID action cache thread safe") Fixes: 3422af2af2e4 ("net/mlx5: make push VLAN action cache thread safe") Cc: stable@dpdk.org Signed-off-by: Li Zhang <lizh@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Qi Zhang [Wed, 17 Mar 2021 06:02:19 +0000 (14:02 +0800)]
net/ice/base: fix memory allocation for MAC addresses
Not enough memory be allocated for dev->data->mac_address which
cause out of bound memory access when iterate all mac addresses by
dev_info.max_mac_addrs.
Fixes: f9cf4f864150 ("net/ice: support device initialization") Cc: stable@dpdk.org Reported-by: Ferruh Yigit <ferruh.yigit@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Qiming Yang <qiming.yang@intel.com>
Alvin Zhang [Mon, 1 Mar 2021 07:06:07 +0000 (15:06 +0800)]
net/i40e: fix input set field mask
The absolute field offsets of IPv4 or IPv6 header are related to
hardware configuration. The X710 and X722 have different hardware
configurations, and users can even modify the hardware configuration.
Therefore, The default values cannot be used when calculating mask
offset.
The following flows can be created on X722 NIC, but the packet will
not enter the queue 3:
flow create 0 ingress pattern eth / ipv4 proto is 255 / end
actions queue index 3 / end
pkt = Ether()/IP(ttl=63, proto=255)/Raw('X'*40)
flow create 0 ingress pattern eth / ipv4 tos is 50 / udp / end
actions queue index 3 / end
pkt = Ether()/IP(tos=50)/UDP()/Raw('X'*40)
flow create 0 ingress pattern eth / ipv6 tc is 12 / udp / end
actions queue index 3 / end
pkt = Ether()/IPv6(tc=12,hlim=34,fl=0x98765)/UDP()/Raw('X'*40)
flow create 0 ingress pattern eth / ipv6 hop is 34 / end actions
queue index 3 / end
pkt = Ether()/IPv6(tc=12,hlim=34,fl=0x98765)/Raw('X'*40)
This patch read the field offsets from the NIC and return the mask
register value.
Fixes: 98f055707685 ("i40e: configure input fields for RSS or flow director") Fixes: 92cf7f8ec082 ("i40e: allow filtering on more IP header fields") Cc: stable@dpdk.org Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com> Tested-by: Lingli Chen <linglix.chen@intel.com>
Kalesh AP [Sat, 20 Mar 2021 06:49:17 +0000 (12:19 +0530)]
net/bnxt: fix memory allocation for command response
Driver re-allocates memory for the command response buffer
when the installed firmware version is newer (and has a larger
max response length) than the version of HWRM that was used to
build the PMD.
This change helps to avoid the re-allocation by allocating the
memory for the command response buffer with PAGE_SIZE.
Lance Richardson [Thu, 18 Mar 2021 19:52:13 +0000 (15:52 -0400)]
net/bnxt: fix Rx buffer posting
Remove early buffer posting logic from burst receive loop to address
several issues:
- Posting receive descriptors without first posting completion
entries risks overflowing the completion queue.
- Posting receive descriptors without updating rx_raw_prod
creates the possibility that the receive descriptor doorbell
can be written twice with the same value.
- Having this logic in the inner descriptor processing loop
can impact performance.
Kalesh AP [Thu, 18 Mar 2021 09:35:22 +0000 (15:05 +0530)]
net/bnxt: fix link state operations
VFs does not have the privilege to change link configuration.
But the driver silently returns success to these ethdev callbacks
without actually issuing the HWRM command to bring the link up/down.
Fixes: 5c206086feaa ("net/bnxt: add link state operations") Cc: stable@dpdk.org Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Ajit Khaparde [Tue, 16 Mar 2021 05:42:40 +0000 (22:42 -0700)]
net/bnxt: fix RSS context cleanup
The PMD is allocating an extra RSS context with each port start.
But it is freeing only one RSS context during port stop. So at some point
we run out of RSS contexts when we do multiple port stop/start sequences.
bnxt_hwrm_vnic_ctx_alloc() is called by bnxt_setup_one_vnic(), but
bnxt_hwrm_vnic_ctx_free() is not called in the corresponding
bnxt_free_one_vnic().
Fix this by calling bnxt_hwrm_vnic_ctx_free() in bnxt_free_one_vnic().
Fixes: 7fe5668d2ea3 ("net/bnxt: support VLAN filter and strip") Cc: stable@dpdk.org Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Kalesh AP [Tue, 16 Mar 2021 05:41:25 +0000 (11:11 +0530)]
net/bnxt: fix PCI write check
CID 363716 (#1 of 1): Unchecked return value (CHECKED_RETURN)
check_return: Calling rte_pci_write_config without checking
return value (as is done elsewhere 46 out of 49 times).
Coverity issue: 363716 Fixes: be14720def9c ("net/bnxt: support FW reset") Cc: stable@dpdk.org Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Somnath Kotur [Tue, 16 Mar 2021 05:40:48 +0000 (11:10 +0530)]
net/bnxt: fix Tx timestamp init
Fix to read the sequence ID register to get Tx timestamp.
Reading the sequence ID register is necessary for the HW FIFO to
advance and thereby get the correct value of the timestamp on Tx side.
This patch fixes that.
These items allow the user to avoid having to set the EtherType field in an
ETH item to match PPPoE traffic. Using a PPPoED (PPPoE discovery) or PPPoES
(PPPoE session) item will lead to EtherType filter being set up with
a corresponding value. If an ETH item provides its own EtherType value,
it will be checked for correctness.
Matching on PPPoE fields is not supported.
Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Andy Moreton <amoreton@xilinx.com>