Xueming Li [Sun, 8 May 2022 14:25:51 +0000 (17:25 +0300)]
vdpa/mlx5: reuse resources in reconfiguration
To speed up device resume, create reuseable resources during device
probe state, release when device is removed. Reused resources includes
TIS,
TD, VAR Doorbell mmap, error handling event channel and interrupt
handler, UAR, Rx event channel, NULL MR, steer domain and table.
Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xueming Li [Sun, 8 May 2022 14:25:49 +0000 (17:25 +0300)]
vdpa/mlx5: fix dead loop when process interrupted
In Ctrl+C handling, sometimes kick handling thread gets endless EGAIN
error and fall into dead lock.
Kick happens frequently in real system due to busy traffic or retry
mechanism. This patch simplifies kick firmware anyway and skip setting
hardware notifier due to potential device error, notifier could be set
in next successful kick request.
Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
David Marchand [Mon, 25 Apr 2022 12:54:30 +0000 (14:54 +0200)]
vhost: validate FDs attached to messages
Some message handlers do not expect any file descriptor attached as
ancillary data.
Provide a common way to enforce this by adding a accepts_fd boolean in
the message handler structure. When a message handler sets accepts_fd to
true, it is responsible for calling validate_msg_fds with a right
expected file descriptor count.
This will avoid leaking some file descriptor by mistake when adding
support for new vhost user message types.
Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xuan Ding [Fri, 8 Apr 2022 10:22:14 +0000 (10:22 +0000)]
examples/vhost: use API to check in-flight packets
In async data path, call rte_vhost_async_get_inflight_thread_unsafe()
API to directly return the number of in-flight packets instead of
maintaining a local variable.
Signed-off-by: Xuan Ding <xuan.ding@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xuan Ding [Fri, 8 Apr 2022 10:22:13 +0000 (10:22 +0000)]
vhost: add unsafe API to check in-flight packets
In async data path, when vring state changes or device is destroyed,
it is necessary to know the number of in-flight packets in DMA engine.
This patch provides a thread unsafe API to return the number of
in-flight packets for a vhost queue without using any lock.
Signed-off-by: Xuan Ding <xuan.ding@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Yuan Wang [Fri, 11 Mar 2022 16:35:12 +0000 (00:35 +0800)]
net/vhost: fix access to freed memory
This patch fixes heap-use-after-free reported by ASan.
It is possible for the rte_vhost_dequeue_burst() to access the vq
is freed when numa_realloc() gets called in the device running state.
The control plane will set the vq->access_lock to protected the vq
from the data plane. Unfortunately the lock will fail at the moment
the vq is freed, allowing the rte_vhost_dequeue_burst() to access
the fields of the vq, which will trigger a heap-use-after-free error.
In the case of multiple queues, the vhost pmd can access other queues
that are not ready when the first queue is ready, which makes no sense
and also allows numa_realloc() and rte_vhost_dequeue_burst() access to
vq to happen at the same time. By controlling vq->allow_queuing we can make
the pmd access only the queues that are ready.
Fixes: 1ce3c7fe149 ("net/vhost: emulate device start/stop behavior") Signed-off-by: Yuan Wang <yuanx.wang@intel.com> Tested-by: Wei Ling <weix.ling@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Harold Huang [Wed, 2 Mar 2022 09:41:05 +0000 (17:41 +0800)]
net/virtio: support NAPI when using vhost-net backend
In patch [1], NAPI has been supported in kernel tun driver to accelerate
packet processing received from vhost-net. This will greatly improve the
throughput of the tap device in the vhost-net backend.
Match the closest supported Rx payload buffer size with the mempool
data size and program it for the Rx queue. This removes unnecessary
need for handling additional padding, packing, and alignment, when
posting Rx buffers to hardware.
net/cxgbe: fix Tx queue stuck with mbuf chain coalescing
When trying to coalesce mbufs with chain on Tx side, it is possible
to get stuck during queue wrap around. When coalescing this mbuf
chain fails, the Tx path returns EBUSY and when the same packet
is retried again, it couldn't get coalesced again, and the loop
repeats. Fix by pushing the packet through the normal Tx path.
Also use FW_ETH_TX_PKTS_WR to handle mbufs with chain for FW
to optimize.
Ke Zhang [Mon, 11 Apr 2022 05:40:03 +0000 (05:40 +0000)]
net/bonding: fix RSS key config with extended key length
When creating a bonding device, if the slave device's
RSS key length = standard_rss_key length + extended_hash_key length,
then bonding device will be same as slave,
in function bond_ethdev_configure(), the default_rss_key length is 40,
it is not matched, so it should calculate a new key for bonding device
if the default key could not be used.
Fixes: 6b1a001ec546 ("net/bonding: fix RSS key length") Cc: stable@dpdk.org Signed-off-by: Ke Zhang <ke1x.zhang@intel.com> Acked-by: Min Hu (Connor) <humin29@huawei.com>
Long Li [Thu, 24 Mar 2022 17:46:17 +0000 (10:46 -0700)]
net/netvsc: fix hot adding multiple VF PCI devices
This patch fixes two issues with hot removing/adding a VF PCI device:
1. The original device argument is lost when it's hot added
2. If there are multiple VFs hot adding at the same time, some of the
VFs may not get added successfully because only one single VF status
is stored in the netvsc.
Fix these by storing the original device arguments and maintain a list
of hot add contexts to deal with multiple VF devices.
Fixes: a2a23a794b ("net/netvsc: support VF device hot add/remove") Cc: stable@dpdk.org Signed-off-by: Long Li <longli@microsoft.com>
David Marchand [Thu, 5 May 2022 09:29:52 +0000 (11:29 +0200)]
ci: build some job with ASan
Enable ASan, this can greatly help identify leaks and buffer overflows.
Running unit tests relying on multiprocess is unreliable with ASan
enabled, so skip them.
Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com>
David Marchand [Thu, 5 May 2022 09:29:51 +0000 (11:29 +0200)]
test/mem: disable ASan when accessing unallocated memory
As described in bugzilla, ASan reports accesses to all memory segment as
invalid, since those parts have not been allocated with rte_malloc.
Move __rte_no_asan to rte_common.h and disable ASan on a part of the test.
Bugzilla ID: 880 Fixes: 6cc51b1293ce ("mem: instrument allocator for ASan") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
test/hash: report non HTM numbers for single thread
In hash_readwrite_perf_autotest a single read and write operation is
benchmarked for both HTM and non HTM cases. However the result summary
only shows the HTM value. Therefore add the non HTM value for
completeness.
Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency") Signed-off-by: Stanislaw Kardach <kda@semihalf.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Raja Zidane [Thu, 7 Apr 2022 11:42:49 +0000 (14:42 +0300)]
examples/l2fwd-crypto: fix stats refresh rate
TIMER_MILLISECOND is defined as the number of cpu cycles per millisecond,
current definition is correct for cores with frequency of 2GHZ, for cores
with different frequency, it caused different periods between refresh,
(i.e. the definition is about 14ms on ARM cores).
The devarg that stated the period between stats print was not used,
instead, it was always defaulted to 10 seconds (on 2GHZ core).
Use DPDK API to get CPU frequency, to define TIMER_MILLISECOND.
Use the refresh period devarg instead of defaulting to 10s always.
Driver is creating a fle pool with a fixed number of
buffers for all queue pairs of a DPSECI object.
These fle buffers are equivalent to the number of descriptors.
In this patch, creating the fle pool for each queue pair
so that user can control the number of descriptors of a
queue pair using API rte_cryptodev_queue_pair_setup().
To perform crypto operations on DPAA platform,
QI interface of HW must be enabled.
Earlier DPAA crypto driver was dependent on
kernel for QI enable. Now with this patch
there is no such dependency on kernel.
crypto/dpaa_sec: fix chained FD length in raw datapath
DPAA sec raw driver is calculating the wrong lengths while
creating the FD for chain.
This patch fixes lengths for chain FD.
Fixes: 78156d38e112 ("crypto/dpaa_sec: support authonly and chain with raw API") Cc: stable@dpdk.org Signed-off-by: Gagandeep Singh <g.singh@nxp.com> Acked-by: Akhil Goyal <gakhil@marvell.com>
Driver allocates a fle buffer for each packet
before enqueue and free the buffer on dequeue. But in case if
there are enqueue failures, then code should free the fle buffers.
test/crypto-perf: extend asymmetric crypto throughput test
Extended support for asymmetric crypto perf throughput test.
Added support for new modulus lengths.
Added new parameter --modex-len.
Supported lengths are 60, 128, 255, 448. Default length is 128.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Acked-by: Akhil Goyal <gakhil@marvell.com>
crypto/cnxk: prevent out-of-bound access in capabilities
In a situation where crypto_caps elements are checked only for
RTE_CRYPTO_OP_TYPE_UNDEFINED until valid op defined, there is
possibility for an out of bound access. Add this array by one
element for current capabilities.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com> Acked-by: Anoob Joseph <anoobj@marvell.com>
Anoob Joseph [Mon, 25 Apr 2022 05:38:25 +0000 (11:08 +0530)]
crypto/cnxk: use set ctx operation for session destroy
Usage of flush and invalidate would involve delays to account
for flush delay. Use set_ctx operation instead. When set_ctx fails,
fall back to flush + invalidate scheme.
Signed-off-by: Anoob Joseph <anoobj@marvell.com> Acked-by: Akhil Goyal <gakhil@marvell.com>
David Marchand [Fri, 6 May 2022 11:57:36 +0000 (13:57 +0200)]
ci: add MinGW cross-compilation in GHA
Add mingw cross compilation in our public CI so that users with their
own github repository have a first level of checks for Windows compilation
before submitting to the mailing list.
This does not replace our better checks in other entities of the CI.
Only the helloworld example is compiled (same as what is tested in
test-meson-builds.sh).
Note: the mingw cross compilation toolchain (version 5.0) in Ubuntu
18.04 was broken (missing a ENOMSG definition).
Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com>
David Marchand [Fri, 6 May 2022 11:57:35 +0000 (13:57 +0200)]
ci: switch to Ubuntu 20.04
Ubuntu 18.04 is now rather old.
Besides, other entities in our CI are also testing this distribution.
Switch to a newer Ubuntu release and benefit from more recent
tool(chain)s: for example, net/cnxk now builds fine and can be
re-enabled.
Note: Ubuntu 18.04 and 20.04 seem to preserve the same paths for the ARM
and PPC cross compilation toolchains, so we can use a single
configuration file (with the hope, future releases of Ubuntu will do the
same).
Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Tianhao Chai [Thu, 5 May 2022 04:39:35 +0000 (23:39 -0500)]
eal: fix C++ include for device event and DMA
Currently the "extern C" section ends right before rte_dev_dma_unmap
and other DMA function declarations, causing some C++ compilers to
produce C++ mangled symbols to rte_dev_dma_unmap instead of C symbols.
This leads to build failures later when linking a final executable
against this object.
Anatoly Burakov [Wed, 4 May 2022 14:31:58 +0000 (14:31 +0000)]
malloc: fix ASan handling for unmapped memory
Currently, when we free previously allocated memory, we mark the area as
"freed" for ASan purposes (flag 0xfd). However, sometimes, freeing a
malloc element will cause pages to be unmapped from memory and re-backed
with anonymous memory again. This may cause ASan's "use-after-free"
error down the line, because the allocator will try to write into
memory areas recently marked as "freed".
To fix this, we need to mark the unmapped memory area as "available",
and fixup surrounding malloc element header/trailers to enable later
malloc routines to safely write into new malloc elements' headers or
trailers.
Bugzilla ID: 994 Fixes: 6cc51b1293ce ("mem: instrument allocator for ASan") Cc: stable@dpdk.org Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
mem: skip attaching external memory in secondary process
Currently, EAL init in secondary processes will attach all fbarrays
in the memconfig to have access to the primary process's page tables.
However, fbarrays corresponding to external memory segments should
not be attached at initialization, because this will happen as part
of `rte_extmem_attach` [1] or `rte_malloc_heap_memory_attach` [2] calls.
Michael Baum [Mon, 25 Apr 2022 09:30:20 +0000 (12:30 +0300)]
net/mlx5: fix LRO configuration in drop Rx queue
The driver wrongly set the LRO configurations to the TIR of the DevX
drop queue even when LRO is not supported.
Actually, the LRO configuration is not relevant to the drop queue at
all.
This causes failure in the initialization of the device, which doesn't
support LRO where the drop queue is created.
Probably, the drop queue creation by DevX missed the fact that LRO is
set by default in the TIR creation function and didn't unset it in the
drop queue case like other cases that unset LRO.
Move the default LRO configuration to unset it and set it only in the
case of all the TIR queues configured with LRO.
Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
When an indirect action was created with an RSS action configured to
hash on both source and destination L3 addresses (or L4 ports), it caused
shared hrxq to be configured to hash only on destination address
(or port).
This patch fixes this behavior by refining RSS types specified in
configuration before calculating hash types used for hrxq. Refining RSS
types removes *_SRC_ONLY and *_DST_ONLY flags if they are both set.
Raja Zidane [Wed, 20 Apr 2022 15:32:17 +0000 (18:32 +0300)]
net/mlx5: fix Rx/Tx stats concurrency
Queue statistics are being continuously updated in Rx/Tx burst
routines while handling traffic. In addition to that, statistics
can be reset (written with zeroes) on statistics reset in other
threads, causing a race condition, which in turn could result in
wrong stats.
The patch provides an approach with reference values, allowing
the actual counters to be writable within Rx/Tx burst threads
only, and updating reference values on stats reset.
net/mlx5: fix GTP handling in header modify action
GTP items were ignored during conversion of modify header actions. This
caused modify TTL action to generate a wrong modify header command when
tunnel and inner headers used different IP versions.
This patch adds GTP item handling to modify header action conversion.
Fixes: 04233f36c712 ("net/mlx5: fix layer type in header modify action") Cc: stable@dpdk.org Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Mlx5Devx library has new API's for setting and getting MTU.
Added new glue functions that wrap the new mlx5devx lib API's.
Implemented the os_ethdev callbacks to use the new glue
functions in Windows.
Support of the set promiscuous modes by calling the new API
In Mlx5DevX Lib.
Added new glue API for Windows which will be used to communicate
with Windows driver to enable/disable PROMISC or ALLMC.
Michael Baum [Sun, 10 Apr 2022 09:25:28 +0000 (12:25 +0300)]
net/mlx5: remove redundant check for hairpin queue
The mlx5_rxq_is_hairpin() function checks whether RxQ type is Hairpin.
It is done by reading a flag in Rx control structure coming from
mlx5_rxq_ctrl_get() function.
The function verifies that the queue index is valid even though it has
been checked within the mlx5_rxq_ctrl_get() function.
This patch removes the redundant check.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
In rte_flow, if a counter action is before a meter which has
non-termination policy, the counter value only includes packets not
being dropped.
This patch fixes this issue by differentiating the order of counter and
non-termination meter:
1. counter + meter, counts all packets hitting this flow.
2. meter + counter, only counts packets not being dropped.
Fixes: 51ec04dc7bcf ("net/mlx5: connect meter policy to created flows") Cc: stable@dpdk.org Signed-off-by: Shun Hao <shunh@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Rongwei Liu [Wed, 6 Apr 2022 07:12:24 +0000 (10:12 +0300)]
net/mlx5: fix probing with secondary bonding member
Users can probe primary or secondary PCIe id when bonding is
configured.
1. -a 0a:00.0,representor=pf[0-1]vf[0-1], PMD probes 5 ports
totally: bonding device plus 4 representor ports.
2. -a 0a:00.1,representor=pf[0-1]vf[0-1], PMD only probes 2
representor ports.
Under the 2nd condition, bonding IB device doesn't have the same
PCIe id and PMD needs to check bonding relationship otherwise
probe failure.
Fixes: 6856efa54eea ("net/mlx5: fix PF leak on PCI probing failure") Cc: stable@dpdk.org Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Dmitry Kozlyuk [Thu, 31 Mar 2022 14:38:41 +0000 (17:38 +0300)]
net/mlx5: fix Tx when inlining is impossible
When txq_inline_max is too large and an mbuf is multi-segment
it may be impossible to inline data and build a valid WQE,
because WQE length would be larger then HW can represent.
It is impossible to detect misconfiguration at startup,
because the condition depends on the mbuf composition.
The check on the data path to prevent the error
treated the length limit as expressed in 64B units,
while the calculated length and limit are in 16B units.
Fix the condition to avoid subsequent TxQ failure and recovery.
Dmitry Kozlyuk [Thu, 31 Mar 2022 14:33:16 +0000 (17:33 +0300)]
common/mlx5: fix memory region range calculation
MR end for a mempool chunk may be calculated incorrectly.
For example, for chunk with addr=1.5M and len=1M with 2M page size
the range would be [0, 2M), while the proper result is [0, 4M).
Fix the calculation.
net/mlx5: handle MPRQ incompatibility with external buffers
Multi-Packet Rx queue uses PMD-managed buffers to store packets.
These buffers are externally attached to user mbufs.
This conflicts with the feature that allows using user-managed
externally attached buffers in an application.
Fall back to SPRQ in case external buffers mempool is configured.
The limitation is already documented in mlx5 guide.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Kathleen Capella [Thu, 24 Mar 2022 22:11:32 +0000 (22:11 +0000)]
net/iavf: remove extra copy step in Rx bulk path
In the Rx bulk path, packets which are taken from the HW ring, are first
copied to the stage data structure and then later copied from the stage
to the rx_pkts array. For the number of packets requested immediately
by the receiving function, this two-step process adds extra overhead
that is not necessary.
Instead, put requested number of packets directly into the rx_pkts array
and only stage excess packets. On N1SDP with 1 core/port, l3fwd saw up
to 4% performance improvement. On x86, no difference in performance was
observed.
When parsing raw flow pattern in FDIR, the input parameter spec and
mask are used directly and the original value will be changed. It
will cause error if these values are used in other functions. In this
patch, temporary variables are created to store the spec and mask.
Not necessary to create / destroy a parser instance for every raw packet
rule. A global parser instance will be created in ice_flow_init and be
destroyed in ice_flow_uninit.
Also, ice_dev_udp_tunnel_port_add has been hooked to perform corresponding
parser configure. This also fix the issue that RSS engine can't support
VXLAN inner through raw packet filter.
Mike Pattrick [Tue, 22 Mar 2022 03:19:37 +0000 (23:19 -0400)]
net/i40e: populate error in flow director parser
Errors from i40e_flow_parse_fdir_pattern() can bubble up to
rte_flow_create. If rte_flow_error is not initialized a caller may
dereference error->message. This may be uninitialized memory, leading
to a segemntation fault.
Fixes: 4a072ad43442 ("net/i40e: fix flow director config after flow validate") Cc: stable@dpdk.org Signed-off-by: Mike Pattrick <mkp@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com>
Wenjun Wu [Mon, 28 Feb 2022 07:36:07 +0000 (15:36 +0800)]
net/ice: improve performance of Rx timestamp offload
Previously, each time a burst of packets is received, SW reads HW
register and assembles it and the timestamp from descriptor together to
get the complete 64 bits timestamp.
This patch optimizes the algorithm. The SW only needs to check the
monotonicity of the low 32bits timestamp to avoid crossing borders.
Each time before SW receives a burst of packets, it should check the
time difference between current time and last update time to avoid
the low 32 bits timestamp cycling twice.
The patch proved a 50% ~ 70% single core performance improvement on a
main stream Xeon server, this fix the performance gap for some use cases.
Steve Yang [Mon, 14 Mar 2022 09:31:46 +0000 (09:31 +0000)]
net/iavf: fix HW ring scan method selection
When setup Rx queue, the rxdid would be changed if it's
"IAVF_RXDID_LEGACY_0/1", that caused the scan HW ring used the wrong
function 'iavf_rx_scan_hw_ring_flex_rxd()'.
Ignore the rxdid changed when equals "IAVF_RXDID_LEGACY_0/1".
Fixes: 0ed16e01313e ("net/iavf: fix function pointer in multi-process") Cc: stable@dpdk.org Signed-off-by: Steve Yang <stevex.yang@intel.com> Acked-by: Beilei Xing <beilei.xing@intel.com>
net/iavf: replace SMP barrier with thread fence in Rx
Replace the SMP barrier with atomic thread fence for iavf hw ring scan
in the bulk Rx path.
This patch introduces a change to the iavf driver that was already added
to the i40e driver [1] as part of the adoption of the use of compiler
atomics.
[1]Commit 8649e2356689 ("net/i40e: replace SMP barrier with thread fence
in Rx")
Stephen Douthit [Wed, 23 Mar 2022 20:03:46 +0000 (16:03 -0400)]
net/ixgbe: retry misbehaving SFP read
Some XGS-PON SFPs have been observed ACKing I2C reads and returning
uninitialized garbage while their uC boots. This can lead to the SFP ID
code marking an otherwise working SFP module as unsupported if a bogus
ID value is read while its internal PHY/microcontroller is still
booting.
Retry the ID read several times looking not just for NAK, but also for a
valid ID field.
Since the device isn't NAKing the transaction, the existing longer retry
code in ixgbe_read_i2c_byte_generic_int() doesn't apply here.
Signed-off-by: Stephen Douthit <stephend@silicom-usa.com> Signed-off-by: Jeff Daly <jeffd@silicom-usa.com> Reviewed-by: Haiyue Wang <haiyue.wang@intel.com>