common/cnxk: use aggregate level RR priority from mbox
Use aggregate level Round Robin Priority from mbox response instead of
fixing it to single macro. This is useful when kernel AF driver
changes the constant.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Akhil Goyal [Sun, 8 May 2022 07:48:19 +0000 (13:18 +0530)]
common/cnxk: convert warning to debug print
Inbound SA SPI if not in min-max range specified in devargs,
was marked as a warning. But this is not converted to debug
print because if the entry is found to be duplicate in the mask,
it will give another error print. Hence, warning print is not needed
and is now converted to debug print.
Signed-off-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Fix issues in mode where soft expiry is disabled in ROC.
When soft expiry support is not enabled in inline device,
memory is not allocated for the ring base array and should
not be accessed.
Fixes: bea5d990a93b ("net/cnxk: support outbound soft expiry notification") Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
common/cnxk: add new PKIND for CPT when ts is enabled
With timestamp enabled, time stamp will be added to second pass packets
from CPT. NPC needs different configuration to parse second pass packets
with and without timestamp.
New PKIND is defined for CPT when time stamp is enabled on NIX.
CPT should use this PKIND for second pass packets when TS is enabled for
corresponding ethdev port.
Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
common/cnxk: support multi channel for SDP send queues
Currently only base channel number is configured as default
channel for all the SDP send queues. Due to this, packets
sent on different SQ's are landing on the same output queue
on the host. Channel number in the send queue should be
configured according to the number of queues assigned to the
SDP PF or VF device.
Signed-off-by: Subrahmanyam Nilla <snilla@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Kiran Kumar K [Wed, 4 May 2022 05:11:18 +0000 (10:41 +0530)]
net/cnxk: support custom SA index
Adding cnxk device driver support to configure custom SA index.
Custom SA index can be configured as part of the session create
as SPI, and later original SPI can be updated using session update.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Jerin Jacob <jerinj@marvell.com>
Kiran Kumar K [Wed, 4 May 2022 05:11:16 +0000 (10:41 +0530)]
common/cnxk: support parsing custom SA action
Adding ROC Flow changes to parse custom SA action for cnxk device.
When custom sa action is enabled, VTAG actions are not allowed.
And custom SA index will be calculated based on SA_HI and SA_LO
values. This allows the potential for a MCAM entry to match
many SAs, rather than only match a single SA.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Rahul Bhansali [Wed, 30 Mar 2022 08:43:55 +0000 (14:13 +0530)]
common/cnxk: add ROC errata list
Created roc_errata.h to list the errata handled in userspace drivers.
Added no_drop_re, cq_min_size_4k, no_fc_stype_ststp, no_drop_aging,
no_vwqe_flush_op etc erratas.
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Sunil Kumar Kori [Tue, 29 Mar 2022 10:28:57 +0000 (15:58 +0530)]
net/cnxk: fix crash during hotplug detach operation
hot_plug application does not perform any port setup
configuration via rte_eth_dev_configure() API. All the probed
Ethernet ports do not contain any Rx and Tx queues.
While detaching a device via rte_eal_hotplug_remove(), CNXK
driver expects Rx and Tx queues structures populated during
reset of PFC. So application gets crashed as data->rx_queues
and data->tx_queues are NULL.
Fixes: 9544713564f5 ("net/cnxk: support priority flow control") Cc: stable@dpdk.org Signed-off-by: Sunil Kumar Kori <skori@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Satheesh Paul [Thu, 17 Mar 2022 03:50:36 +0000 (09:20 +0530)]
common/cnxk: fix QinQ ROC item mismatch
ROC code is assuming presence of vlan extension headers in
case of QinQ, because of this, there is incompatibility
between the driver and ROC. Fixed this in ROC by treating
QINQ as multiple VLAN pattern items for DPDK (as opposed to
treating QINQ as separate pattern item).
Fixes: b8ac8b089ce ("common/cnxk: support matching VLAN existence") Cc: stable@dpdk.org Signed-off-by: Satheesh Paul <psatheesh@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Satheesh Paul [Mon, 28 Feb 2022 04:53:21 +0000 (10:23 +0530)]
common/cnxk: support CPT second pass flow rules
Added support to create flow rules to match packets
from CPT's second pass packets. With this change, ingress
rules will be created with bits 10 and 11 of channel field
in the MCAM ignored by default. For rules specific to
second pass packets, the CPT channel bits will be set
in the MCAM.
Signed-off-by: Satheesh Paul <psatheesh@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Min Hu (Connor) [Wed, 6 Apr 2022 08:45:36 +0000 (16:45 +0800)]
app/testpmd: check statistics query before printing
In function 'fwd_stats_display', if function 'rte_eth_stats_get' fails,
'stats' is uncertainty value. The display result will be abnormal.
This patch check the return value of 'rte_eth_stats_get' to avoid
display abnormal stats.
Fixes: 53324971a14e ("app/testpmd: display/clear forwarding stats on demand") Cc: stable@dpdk.org Signed-off-by: Min Hu (Connor) <humin29@huawei.com> Acked-by: Aman Singh <aman.deep.singh@intel.com>
Huisong Li [Wed, 6 Apr 2022 06:57:00 +0000 (14:57 +0800)]
ethdev: fix RSS update when RSS is disabled
The RTE_ETH_MQ_RX_RSS_FLAG flag is a switch to enable RSS. If the flag
is not set in dev_configure, RSS will be not configured and enabled.
However, RSS hash and reta can still be configured by ethdev ops to
enable RSS if the flag isn't set. The behavior is inconsistent.
Fixes: 99a2dd955fba ("lib: remove librte_ prefix from directory names") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
Huisong Li [Thu, 5 May 2022 12:27:07 +0000 (20:27 +0800)]
net/hns3: remove redundant RSS tuple field
The 'rss_tuple_fields' in struct struct hns3_rss_conf::rss_tuple_sets is
redundant. Because the enabled RSS tuple in PMD is already managed by
the 'types' in struct hns3_rss_conf::conf. This patch removes this
redundant variable.
Fixes: c37ca66f2b27 ("net/hns3: support RSS") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Huisong Li [Thu, 5 May 2022 12:27:06 +0000 (20:27 +0800)]
net/hns3: fix rollback on RSS hash update
The RSS tuple isn't restored when RSS key length is invalid or setting
algo key failed. This patch fixes it.
Fixes: c37ca66f2b27 ("net/hns3: support RSS") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Huisong Li [Thu, 5 May 2022 12:27:05 +0000 (20:27 +0800)]
net/hns3: fix RSS disable
Currently, hns3 PMD disable RSS by resetting redirection table when user
set rss_hf to 0 so as to all packets go to queue 0. The implementation
may cause following problems:
1) the same type packet may go to different queue on the case of
disabling all tuples and partial tuples. The problem is determined by
hardware design.
2) affect the configuration of redirection table and user experience.
For hns3 hardware, the packets with RSS disabled are always go to the
queue corresponding to first entry of the redirection table. Generally,
disable RSS should be implemented by disabling all tuples, This patch
fix the implementation.
Fixes: c37ca66f2b27 ("net/hns3: support RSS") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Huisong Li [Thu, 5 May 2022 12:27:03 +0000 (20:27 +0800)]
net/hns3: fix pseudo-sharing between threads
Some fields in the end of 'struct hns3_rx_queue' and
'struct hns3_tx_queue' are not accessed in the I/O path.
But these fields may be accessed in other threads, which may lead to the
problem of cache pseudo-sharing of IO threads. This patch add a
cacheline alignment to avoid it.
Fixes: 9261fd3caf1f ("net/hns3: improve IO path data cache usage") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Huisong Li [Thu, 5 May 2022 12:27:02 +0000 (20:27 +0800)]
net/hns3: fix MAC and queues HW statistics overflow
The MAC and queues statistics are 32-bit registers in hardware. If
hardware statistics are not obtained for a long time, these statistics
will be overflow.
So PF and VF driver have to periodically obtain and save these
statistics. Since the periodical task and the stats API are in different
threads, we introduce a statistics lock to protect the statistics.
Fixes: 8839c5e202f3 ("net/hns3: support device stats") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Huisong Li [Thu, 5 May 2022 12:27:01 +0000 (20:27 +0800)]
net/hns3: fix order of clearing imissed register in PF
Clearing imissed registers in PF hardware depends on the
'drop_stats_mode' in struct hns3_hw. The variable is initialized after
the "hns3_get_configuration". But, in current code, the clearing
operation runs before the function.
So this patch fixes this order. In addition, this patch extracts a
public function to initialize and uninitialize statistics to improve the
maintainability of these codes.
Fixes: 3e9f3042d7c8 ("net/hns3: add imissed packet stats") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
rte_pmd_tun/tap_probe() allocates pmd->intr_handle in eth_dev_tap_create()
and it should not be freed until rte_pmd_tap_remove() is called.
Inspection of tap_rx_intr_vec_set() shows that the call to
tap_tx_intr_vec_uninstall() was calling rte_intr_instance_free() but
tap_tx_intr_vec_install() can then be immediately called, and this then
uses pmd->intr_handle without it being reallocated.
Move rte_intr_instance_free() call from tap_tx_intr_vec_uninstall()
to rte_pmd_tap_remove().
Fixes: d61138d4f0e2 ("drivers: remove direct access to interrupt handle") Cc: stable@dpdk.org Signed-off-by: Quentin Armitage <quentin@armitage.org.uk> Reviewed-by: David Marchand <david.marchand@redhat.com>
Huisong Li [Tue, 3 May 2022 10:02:14 +0000 (18:02 +0800)]
net/bonding: fix slave stop and remove on port close
All slaves will be stopped and removed when closing a bonded port.
But the while loop can not end if both rte_eth_dev_stop and
rte_eth_bond_slave_remove fails, runs infinitely.
This is because the skipped slave port counted in both function failures
but it should be counted only one.
Fixing by not continue to process in the loop after first failure.
Fixes: fb0379bc5db3 ("net/bonding: check stop call status") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Huisong Li [Tue, 3 May 2022 10:02:13 +0000 (18:02 +0800)]
net/bonding: fix stopping non-active slaves
When stopping a bonded port, all slaves should be stopped. But only
active slaves are stopped.
So fix by stopping all slave ports and later do "deactivate_slave()" for
active slaves.
Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 port") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
net/iavf: improve performance of Rx timestamp offload
In this patch, We use CPU ticks instead of HW register
to determine whether low 32 bits timestamp has turned
over. It can avoid requesting register value frequently
and improve receiving performance.
Simei Su [Thu, 28 Apr 2022 08:13:45 +0000 (16:13 +0800)]
net/iavf: enable Rx timestamp on flex descriptor
Dump Rx timestamp value into dynamic mbuf field by flex descriptor.
This feature is turned on by dev config "enable-rx-timestamp".
Currently, it's only supported under scalar path.
Signed-off-by: Simei Su <simei.su@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Simei Su [Thu, 28 Apr 2022 08:13:44 +0000 (16:13 +0800)]
common/iavf: support Rx timestamp in virtual channel
Add new ops and structures to support VF to support Rx timestamp
on flex descriptor.
"VIRTCHNL_OP_1588_PTP_GET_CAPS" ops is sent by the VF to request PTP
capabilities and responded by the PF with capabilities enabled for
that VF.
"VIRTCHNL_OP_1588_PTP_GET_TIME" ops is sent by the VF to request
the current time of the PHC. The PF will respond by reading the
device time and reporting it back to the VF.
Signed-off-by: Simei Su <simei.su@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
net/ice: optimize maximum queue number calculation
Remove the limitation that max queue pair number must be 2^n.
With this patch, even on a 8 ports device, the max queue pair
number increased from 128 to 254.
Walter Heymans [Wed, 20 Apr 2022 13:46:39 +0000 (15:46 +0200)]
net/nfp: update how max MTU is read
The 'max_rx_pktlen' value was previously read from hardware, which was
set by the running firmware. This caused confusion due to different
meanings of 'MAX_MTU'. This patch updates the 'max_rx_pktlen' to the
maximum value that the NFP NIC can support. The 'max_mtu' value that is
read from hardware, is assigned to the 'dev_info->max_mtu' variable.
If more layer 2 metadata must be used, the firmware can be updated to
report a smaller 'max_mtu' value.
The constant defined for NFP_FRAME_SIZE_MAX is derived for the maximum
supported buffer size of 10240, minus 136 bytes that is reserved by the
hardware and another 56 bytes reserved for expansion in firmware. This
results in a usable maximum packet length of 10048 bytes.
Signed-off-by: Walter Heymans <walter.heymans@corigine.com> Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Chaoyong He <chaoyong.he@corigine.com> Reviewed-by: Richard Donkin <richard.donkin@corigine.com>
Xueming Li [Sun, 8 May 2022 14:25:53 +0000 (17:25 +0300)]
vdpa/mlx5: support device cleanup callback
This patch supports device cleanup callback API which is called when
the device is disconnected from the VM. Cached resources like VM MR and
VQ memory are released.
Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xueming Li [Sun, 8 May 2022 14:25:52 +0000 (17:25 +0300)]
vdpa/mlx5: cache and reuse hardware resources
During device suspend and resume, resources are not changed normally.
When huge resources were allocated to VM, like huge memory size or lots
of queues, time spent on release and recreate became significant.
To speed up, this patch reuses resources like VM MR and VirtQ memory if
not changed.
Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xueming Li [Sun, 8 May 2022 14:25:51 +0000 (17:25 +0300)]
vdpa/mlx5: reuse resources in reconfiguration
To speed up device resume, create reuseable resources during device
probe state, release when device is removed. Reused resources includes
TIS,
TD, VAR Doorbell mmap, error handling event channel and interrupt
handler, UAR, Rx event channel, NULL MR, steer domain and table.
Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xueming Li [Sun, 8 May 2022 14:25:49 +0000 (17:25 +0300)]
vdpa/mlx5: fix dead loop when process interrupted
In Ctrl+C handling, sometimes kick handling thread gets endless EGAIN
error and fall into dead lock.
Kick happens frequently in real system due to busy traffic or retry
mechanism. This patch simplifies kick firmware anyway and skip setting
hardware notifier due to potential device error, notifier could be set
in next successful kick request.
Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
David Marchand [Mon, 25 Apr 2022 12:54:30 +0000 (14:54 +0200)]
vhost: validate FDs attached to messages
Some message handlers do not expect any file descriptor attached as
ancillary data.
Provide a common way to enforce this by adding a accepts_fd boolean in
the message handler structure. When a message handler sets accepts_fd to
true, it is responsible for calling validate_msg_fds with a right
expected file descriptor count.
This will avoid leaking some file descriptor by mistake when adding
support for new vhost user message types.
Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xuan Ding [Fri, 8 Apr 2022 10:22:14 +0000 (10:22 +0000)]
examples/vhost: use API to check in-flight packets
In async data path, call rte_vhost_async_get_inflight_thread_unsafe()
API to directly return the number of in-flight packets instead of
maintaining a local variable.
Signed-off-by: Xuan Ding <xuan.ding@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xuan Ding [Fri, 8 Apr 2022 10:22:13 +0000 (10:22 +0000)]
vhost: add unsafe API to check in-flight packets
In async data path, when vring state changes or device is destroyed,
it is necessary to know the number of in-flight packets in DMA engine.
This patch provides a thread unsafe API to return the number of
in-flight packets for a vhost queue without using any lock.
Signed-off-by: Xuan Ding <xuan.ding@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Yuan Wang [Fri, 11 Mar 2022 16:35:12 +0000 (00:35 +0800)]
net/vhost: fix access to freed memory
This patch fixes heap-use-after-free reported by ASan.
It is possible for the rte_vhost_dequeue_burst() to access the vq
is freed when numa_realloc() gets called in the device running state.
The control plane will set the vq->access_lock to protected the vq
from the data plane. Unfortunately the lock will fail at the moment
the vq is freed, allowing the rte_vhost_dequeue_burst() to access
the fields of the vq, which will trigger a heap-use-after-free error.
In the case of multiple queues, the vhost pmd can access other queues
that are not ready when the first queue is ready, which makes no sense
and also allows numa_realloc() and rte_vhost_dequeue_burst() access to
vq to happen at the same time. By controlling vq->allow_queuing we can make
the pmd access only the queues that are ready.
Fixes: 1ce3c7fe149 ("net/vhost: emulate device start/stop behavior") Signed-off-by: Yuan Wang <yuanx.wang@intel.com> Tested-by: Wei Ling <weix.ling@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Harold Huang [Wed, 2 Mar 2022 09:41:05 +0000 (17:41 +0800)]
net/virtio: support NAPI when using vhost-net backend
In patch [1], NAPI has been supported in kernel tun driver to accelerate
packet processing received from vhost-net. This will greatly improve the
throughput of the tap device in the vhost-net backend.
Match the closest supported Rx payload buffer size with the mempool
data size and program it for the Rx queue. This removes unnecessary
need for handling additional padding, packing, and alignment, when
posting Rx buffers to hardware.
net/cxgbe: fix Tx queue stuck with mbuf chain coalescing
When trying to coalesce mbufs with chain on Tx side, it is possible
to get stuck during queue wrap around. When coalescing this mbuf
chain fails, the Tx path returns EBUSY and when the same packet
is retried again, it couldn't get coalesced again, and the loop
repeats. Fix by pushing the packet through the normal Tx path.
Also use FW_ETH_TX_PKTS_WR to handle mbufs with chain for FW
to optimize.
Ke Zhang [Mon, 11 Apr 2022 05:40:03 +0000 (05:40 +0000)]
net/bonding: fix RSS key config with extended key length
When creating a bonding device, if the slave device's
RSS key length = standard_rss_key length + extended_hash_key length,
then bonding device will be same as slave,
in function bond_ethdev_configure(), the default_rss_key length is 40,
it is not matched, so it should calculate a new key for bonding device
if the default key could not be used.
Fixes: 6b1a001ec546 ("net/bonding: fix RSS key length") Cc: stable@dpdk.org Signed-off-by: Ke Zhang <ke1x.zhang@intel.com> Acked-by: Min Hu (Connor) <humin29@huawei.com>
Long Li [Thu, 24 Mar 2022 17:46:17 +0000 (10:46 -0700)]
net/netvsc: fix hot adding multiple VF PCI devices
This patch fixes two issues with hot removing/adding a VF PCI device:
1. The original device argument is lost when it's hot added
2. If there are multiple VFs hot adding at the same time, some of the
VFs may not get added successfully because only one single VF status
is stored in the netvsc.
Fix these by storing the original device arguments and maintain a list
of hot add contexts to deal with multiple VF devices.
Fixes: a2a23a794b ("net/netvsc: support VF device hot add/remove") Cc: stable@dpdk.org Signed-off-by: Long Li <longli@microsoft.com>
David Marchand [Thu, 5 May 2022 09:29:52 +0000 (11:29 +0200)]
ci: build some job with ASan
Enable ASan, this can greatly help identify leaks and buffer overflows.
Running unit tests relying on multiprocess is unreliable with ASan
enabled, so skip them.
Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com>
David Marchand [Thu, 5 May 2022 09:29:51 +0000 (11:29 +0200)]
test/mem: disable ASan when accessing unallocated memory
As described in bugzilla, ASan reports accesses to all memory segment as
invalid, since those parts have not been allocated with rte_malloc.
Move __rte_no_asan to rte_common.h and disable ASan on a part of the test.
Bugzilla ID: 880 Fixes: 6cc51b1293ce ("mem: instrument allocator for ASan") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
test/hash: report non HTM numbers for single thread
In hash_readwrite_perf_autotest a single read and write operation is
benchmarked for both HTM and non HTM cases. However the result summary
only shows the HTM value. Therefore add the non HTM value for
completeness.
Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency") Signed-off-by: Stanislaw Kardach <kda@semihalf.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com>