Harman Kalra [Fri, 22 Oct 2021 20:49:30 +0000 (02:19 +0530)]
interrupts: remove direct access to interrupt handle
Making changes to the interrupt framework to use interrupt handle
APIs to get/set any field.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
Harman Kalra [Fri, 22 Oct 2021 20:49:29 +0000 (02:19 +0530)]
interrupts: add allocator and accessors
Prototype/Implement get set APIs for interrupt handle fields.
User won't be able to access any of the interrupt handle fields
directly while should use these get/set APIs to access/manipulate
them.
Internal interrupt header i.e. rte_eal_interrupt.h is rearranged,
as APIs defined are moved to rte_interrupts.h and epoll specific
definitions are moved to a new header rte_epoll.h.
Later in the series rte_eal_interrupt.h will be removed.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
Dmitry Kozlyuk [Mon, 25 Oct 2021 12:20:52 +0000 (15:20 +0300)]
eal/windows: fix IOVA mode detection and handling
Windows EAL did not detect IOVA mode and worked incorrectly
if physical addresses could not be obtained
(if virt2phys driver was missing or inaccessible).
In this case, rte_mem_virt2iova() reported RTE_BAD_IOVA for any address.
Inability to obtain IOVA, be it PA or VA, should cause a failure
for the DPDK allocator, but it was hidden by the implementation,
so allocations did not fail when they should.
The mode when DPDK cannot obtain PA but can work is IOVA-as-VA mode.
However, rte_eal_iova_mode() always returned RTE_IOVA_DC
(while it should only ever return RTE_IOVA_PA or RTE_IOVA_VA),
because IOVA mode detection was not implemented.
Implement IOVA mode detection:
1. Always allow to force --iova-mode=va.
2. Allow to force --iova-mode=pa only if virt2phys is available.
3. If no mode is forced and virt2phys is available,
select the mode according to bus requests, default to PA.
4. If no mode is forced but virt2phys is unavailable, default to VA.
Fix rte_mem_virt2iova() by returning VA when using IOVA-as-VA.
Fix rte_eal_iova_mode() by returning the selected mode.
Fixes:
2a5d547a4a9b ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org
Reported-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
Harman Kalra [Fri, 8 Oct 2021 12:44:07 +0000 (18:14 +0530)]
mem: add telemetry infos
Registering new telemetry callbacks to list named (memzones)
and unnamed (malloc) memory reserved and return information
based on arguments provided by user.
Example:
Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 21.11.0-rc0", "pid": 59754, "max_output_len": 16384}
Connected to application: "dpdk-testpmd"
-->
--> /eal/memzone_list
{"/eal/memzone_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}
-->
-->
--> /eal/memzone_info,0
{"/eal/memzone_info": {"Zone": 0, "Name": "rte_eth_dev_data", \
"Length": 225408, "Address": "0x13ffc0280", "Socket": 0, "Flags": 0, \
"Hugepage_size":
536870912, "Hugepage_base": "0x120000000", \
"Hugepage_used": 1}}
-->
-->
--> /eal/memzone_info,6
{"/eal/memzone_info": {"Zone": 6, "Name": "MP_mb_pool_0_0", \
"Length":
669918336, "Address": "0x15811db80", "Socket": 0, \
"Flags": 0, "Hugepage_size":
536870912, "Hugepage_base": "0x140000000", \
"Hugepage_used": 2}}
-->
-->
--> /eal/memzone_info,14
{"/eal/memzone_info": null}
-->
-->
--> /eal/heap_list
{"/eal/heap_list": [0]}
-->
-->
--> /eal/heap_info,0
{"/eal/heap_info": {"Head id": 0, "Name": "socket_0", \
"Heap_size":
1610612736, "Free_size":
927645952, \
"Alloc_size":
682966784, "Greatest_free_size":
529153152, \
"Alloc_count": 482, "Free_count": 2}}
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Vladimir Medvedkin [Mon, 6 Sep 2021 15:54:32 +0000 (16:54 +0100)]
rib: fix IPv6 depth mask
Fixes:
03b8372a9a73 ("rib: fix max depth IPv6 lookup")
Cc: stable@dpdk.org
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Vladimir Medvedkin [Thu, 21 Oct 2021 17:15:49 +0000 (18:15 +0100)]
lpm6: fix buffer overflow
This patch fixes buffer overflow reported by ASAN,
please reference https://bugs.dpdk.org/show_bug.cgi?id=819
The rte_lpm6 keeps routing information for control plane purpose
inside the rte_hash table which uses rte_jhash() as a hash function.
From the rte_jhash() documentation: If input key is not aligned to
four byte boundaries or a multiple of four bytes in length,
the memory region just after may be read (but not used in the
computation).
rte_lpm6 uses 17 bytes keys consisting of IPv6 address (16 bytes) +
depth (1 byte).
This patch increases the size of the depth field up to uint32_t
and sets the alignment to 4 bytes.
Bugzilla ID: 819
Fixes:
86b3b21952a8 ("lpm6: store rules in hash table")
Cc: stable@dpdk.org
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Vladimir Medvedkin [Mon, 6 Sep 2021 16:02:43 +0000 (17:02 +0100)]
hash: fix Doxygen comment of Toeplitz file
Fixes:
7574c3ef7428 ("hash: add toeplitz algorithm used by RSS")
Cc: stable@dpdk.org
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Honnappa Nagarahalli [Mon, 25 Oct 2021 04:52:37 +0000 (23:52 -0500)]
test/ring: relax memory ordering for stress test
wrk_cmd variable is used to signal the worker thread to start
or stop the stress test loop. Relaxed barriers are used
to achieve the same.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
Honnappa Nagarahalli [Mon, 25 Oct 2021 04:52:36 +0000 (23:52 -0500)]
eal: fix memory ordering around lcore task accesses
Ensure that the memory operations before the call to
rte_eal_remote_launch are visible to the worker thread.
Use the function pointer to execute in worker thread
as the guard variable.
Ensure that the memory operations in worker thread, that happen
before it returns the status of the assigned function, are
visible to the main thread. Use the variable containing the
lcore's state as the guard variable.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
Honnappa Nagarahalli [Mon, 25 Oct 2021 04:52:35 +0000 (23:52 -0500)]
eal: remove FINISHED lcore state
FINISHED state seems to be used to indicate that the worker's update
of the 'state' is not visible to other threads. There seems to be no
requirement to have such a state.
Since the FINISHED state is removed, the API rte_eal_wait_lcore
is updated to always return the status of the last function that
ran in the worker core.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
Honnappa Nagarahalli [Mon, 25 Oct 2021 04:52:34 +0000 (23:52 -0500)]
eal: reset lcore task callback and argument
In the rte_eal_remote_launch function, the lcore function
pointer is checked for NULL. However, the pointer is never
reset to NULL. Reset the lcore function pointer and argument
after the worker has completed executing the lcore function.
Fixes:
af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
Kefu Chai [Wed, 13 Oct 2021 20:54:18 +0000 (04:54 +0800)]
config: add option for atomic mbuf reference counting
RTE_MBUF_REFCNT_ATOMIC = 0 is not necessary for applications like
Seastar, where it's safe to assume that the mbuf refcnt is only
updated by a single core only.
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Juraj Linkeš [Mon, 11 Oct 2021 13:40:41 +0000 (15:40 +0200)]
ci: update Meson option for generic build
The way we're building DPDK in CI, with -Dmachine=default, has not been
updated when the option got replaced to preserve a backwards-complatible
build call to facilitate ABI verification between DPDK versions. Update
the call to use -Dplatform=generic, which is the most up to date way to
execute the same build which is now present in all DPDK versions the ABI
check verifies.
Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Acked-by: Aaron Conole <aconole@redhat.com>
Eli Britstein [Thu, 21 Oct 2021 08:51:32 +0000 (11:51 +0300)]
eal/x86: avoid cast-align warning in memcpy functions
Functions and macros in x86 rte_memcpy.h may cause cast-align warnings,
when using strict cast align flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static
For example:
In file included from main.c:24:
/dpdk/build/include/rte_memcpy.h: In function 'rte_mov16':
/dpdk/build/include/rte_memcpy.h:306:25: warning: cast increases
required alignment of target type [-Wcast-align]
306 | xmm0 = _mm_loadu_si128((const __m128i *)src);
| ^
As the code assumes correct alignment, add first a (void *) or (const
void *) castings, to avoid the warnings.
Fixes:
9484092baad3 ("eal/x86: optimize memcpy for AVX512 platforms")
Cc: stable@dpdk.org
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Eli Britstein [Thu, 21 Oct 2021 08:51:31 +0000 (11:51 +0300)]
mbuf: avoid cast-align warning in data offset macro
In rte_pktmbuf_mtod_offset macro, there is a casting from char * to type
't', which may cause cast-align warning when using strict cast align
flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static
main.c: In function 'l2fwd_mac_updating':
/dpdk/build/include/rte_mbuf_core.h:719:3: warning: cast increases
required alignment of target type [-Wcast-align]
719 | ((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
| ^
/dpdk/build/include/rte_mbuf_core.h:733:32: note: in expansion of macro
'rte_pktmbuf_mtod_offset'
733 | #define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)
| ^~~~~~~~~~~~~~~~~~~~~~~
As the code assumes correct alignment, add first a (void *) casting, to
avoid the warning.
Fixes:
af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Eli Britstein [Thu, 21 Oct 2021 08:51:30 +0000 (11:51 +0300)]
net: avoid cast-align warning in VLAN insert function
In rte_vlan_insert there is a casting of rte_pktmbuf_prepend returned
value to (struct rte_ether_hdr *), which causes cast-align warning when
using strict cast align flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static
In file included from main.c:35:
/dpdk/build/include/rte_ether.h:370:7: warning: cast increases required
alignment of target type [-Wcast-align]
370 | nh = (struct rte_ether_hdr *)
| ^
As the code assumes correct alignment, add first a (void *) casting, to
avoid the warning.
Fixes:
c974021a5949 ("ether: add soft vlan encap/decap")
Cc: stable@dpdk.org
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
David Marchand [Fri, 15 Oct 2021 08:39:41 +0000 (10:39 +0200)]
doc: fix default mempool option in guides
This option should be prefixed with -- for consistency with others.
Fixes:
a103a97e7191 ("eal: allow user to override default mempool driver")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
David Marchand [Tue, 19 Oct 2021 12:52:30 +0000 (14:52 +0200)]
usertools/pmdinfo: fix plugin auto scan
Migration to argparse was incomplete.
$ dpdk-pmdinfo.py -p $(which dpdk-testpmd)
Traceback (most recent call last):
File "/usr/bin/dpdk-pmdinfo.py", line 626, in <module>
main()
File "/usr/bin/dpdk-pmdinfo.py", line 596, in main
exit(scan_for_autoload_pmds(args[0]))
TypeError: 'Namespace' object does not support indexing
Fixes:
81255f27c65c ("usertools: replace optparse with argparse")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Robin Jarry <robin.jarry@6wind.com>
Dmitry Kozlyuk [Fri, 22 Oct 2021 21:09:19 +0000 (00:09 +0300)]
mempool: fix non-IO flag inference
When mempool had been created with RTE_MEMPOOL_F_NO_IOVA_CONTIG flag
but later populated with valid IOVA, RTE_MEMPOOL_F_NON_IO was unset,
while it should be kept. The unit test did not catch this
because rte_mempool_populate_default() it used was populating
with RTE_BAD_IOVA.
Keep setting RTE_MEMPOOL_NON_IO at an empty mempool creation
and add an assert for it in the unit test (remove the separate case).
Do not reset the flag if RTE_MEMPOOL_F_ON_IOVA_CONTIG is set.
Fixes:
11541c5c81dd ("mempool: add non-IO flag")
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Aman Singh [Tue, 19 Oct 2021 10:48:41 +0000 (11:48 +0100)]
kni: fix build for SLES15-SP3
As suse version numbering is inconsistent to determine Linux kernel
API to be used. In this patch we check parameter of 'ndo_tx_timeout'
API directly from the kernel source. This is done only for suse build.
Bugzilla ID: 812
Cc: stable@dpdk.org
Signed-off-by: Aman Singh <aman.deep.singh@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Longfeng Liang <longfengx.liang@intel.com>
Jasvinder Singh [Wed, 1 Sep 2021 12:19:20 +0000 (13:19 +0100)]
sched: promote a function as stable
This API was introduced in 18.05, therefore removing
experimental tag to promote it to stable state
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Yogesh Jangra [Mon, 18 Oct 2021 01:22:53 +0000 (21:22 -0400)]
pipeline: support action annotations
Enable restricting the scope of an action to regular table entries or
to the table default entry in order to support the P4 language
tableonly or defaultonly annotations.
Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Yogesh Jangra [Fri, 17 Sep 2021 10:32:05 +0000 (06:32 -0400)]
port: configure loop count for source port
Add support for configurable number of loops through the input PCAP
file for the source port. Added an additional parameter to source
port CLI command.
Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Yogesh Jangra [Thu, 21 Oct 2021 03:23:32 +0000 (23:23 -0400)]
pipeline: fix instruction label check
The instruction_data array was incorrectly indexed, which resulted in
the array index getting out of bounds and sometimes segfault.
Fixes: a1711f (“pipeline: add SWX Rx and extract instructions“)
Cc: stable@dpdk.org
Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Shijith Thotton [Mon, 30 Aug 2021 20:12:59 +0000 (01:42 +0530)]
test/event: fix timer adapter creation test
Removed freeing of unallocated mempool in event timer adapter create
unit test.
Fixes:
d1f3385d0076 ("test: add event timer adapter auto-test")
Cc: stable@dpdk.org
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Xueming Li [Sat, 23 Oct 2021 12:17:55 +0000 (20:17 +0800)]
test/devargs: fix memory leak
In layer argument test function, kvargs are parsed and checked without
free. This patch calls rte_kvargs_free() function to avoid memory leak.
Coverity issue: 373631
Fixes:
a4975cd20dca ("test: add devargs test cases")
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
David Marchand [Sun, 24 Oct 2021 10:04:11 +0000 (12:04 +0200)]
net: fix build with pedantic for L2TPv2 definitions
Build is broken on RHEL7 following introduction of this new protocol.
Fixes:
3a929df1f286 ("ethdev: support L2TPv2 and PPP procotol")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
Olivier Matz [Fri, 15 Oct 2021 19:24:08 +0000 (21:24 +0200)]
mbuf: add namespace to offload flags
Fix the mbuf offload flags namespace by adding an RTE_ prefix to the
name. The old flags remain usable, but a deprecation warning is issued
at compilation.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Olivier Matz [Fri, 15 Oct 2021 19:24:07 +0000 (21:24 +0200)]
devtools: add cocci script to rename mbuf offload flags
The mbuf offload flags do not match the DPDK namespace (they are not
prefixed by RTE_). This coccinelle script is used in the next commit to
do the replacement in the code.
A draft script was initially submitted [1] in commit
d7595795b760 ("doc:
announce renaming of mbuf offload flags"), but dropped by mistake at
commit.
1: http://inbox.dpdk.org/dev/
20210730155700.32574-1-olivier.matz@6wind.com
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Olivier Matz [Fri, 15 Oct 2021 19:24:06 +0000 (21:24 +0200)]
mbuf: mark old VLAN offload flags as deprecated
The flags PKT_TX_VLAN_PKT and PKT_TX_QINQ_PKT are
marked as deprecated since commit
380a7aab1ae2 ("mbuf: rename deprecated
VLAN flags") (2017). But they were not using the RTE_DEPRECATED
macro, because it did not exist at this time. Add it, and replace
usage of these flags.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Olivier Matz [Fri, 15 Oct 2021 19:24:05 +0000 (21:24 +0200)]
mbuf: remove duplicate definition of cksum offload flags
The flags PKT_RX_L4_CKSUM_BAD and PKT_RX_IP_CKSUM_BAD are defined
twice with the same value. Remove one of the occurrence, which was
marked as "deprecated".
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Raja Zidane [Wed, 15 Sep 2021 00:12:23 +0000 (00:12 +0000)]
compress/mlx5: support partial transformation
Currently compress, decompress and dma are allowed
only when all 3 capabilities are on.
A case where the user wants decompress offload, if
decompress capability is on but one of compress,
dma is off, is not allowed.
Split compress/decompress/dma support check to allow
partial transformations.
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Anoob Joseph [Mon, 18 Oct 2021 07:51:40 +0000 (13:21 +0530)]
crypto/cnxk: allow different cores in pending queue
Rework pending queue to allow producer and consumer cores to be
different.
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Anoob Joseph [Mon, 18 Oct 2021 07:51:39 +0000 (13:21 +0530)]
common/cnxk: align CPT queue depth to power of 2
Use CPT LF queue depth as power of 2 to aid in masked checks for pending
queue.
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Radu Nicolau [Tue, 19 Oct 2021 15:15:28 +0000 (16:15 +0100)]
ipsec: fix telemetry text
Set correct tunnel type telemetry text - tunnel type
was wrongly set as IPv4-UDP for all types.
Fixes:
bf5b65a8e781 ("ipsec: support SA telemetry")
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Akhil Goyal [Wed, 20 Oct 2021 11:27:54 +0000 (16:57 +0530)]
cryptodev: move device-specific structures
The device specific structures - rte_cryptodev
and rte_cryptodev_data are moved to cryptodev_pmd.h
to hide it from the applications.
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Tested-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Akhil Goyal [Wed, 20 Oct 2021 11:27:53 +0000 (16:57 +0530)]
cryptodev: use new flat array in fast path API
Rework fast-path cryptodev functions to use rte_crypto_fp_ops[].
While it is an API/ABI breakage, this change is intended to be
transparent for both users (no changes in user app is required) and
PMD developers (no changes in PMD is required).
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Akhil Goyal [Wed, 20 Oct 2021 11:27:52 +0000 (16:57 +0530)]
drivers/crypto: invoke probing finish function
Invoke event_dev_probing_finish() function at the end of probing,
this function sets the function pointers in the fp_ops flat array
in case of secondary process.
For primary process, fp_ops is updated in rte_cryptodev_start().
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Akhil Goyal [Wed, 20 Oct 2021 11:27:51 +0000 (16:57 +0530)]
cryptodev: add device probing finish function
Added a rte_cryptodev_pmd_probing_finish API which
need to be called by the PMD after the device is initialized
completely. This will set the fast path function pointers
in the flat array for secondary process. For primary process,
these are set in rte_cryptodev_start.
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Akhil Goyal [Wed, 20 Oct 2021 11:27:50 +0000 (16:57 +0530)]
crypto/scheduler: use proper API for device start/stop
The worker PMDs were using direct device start/stop
functions rather than rte_cryptodev_start(),
so rte_crypto_fp_ops never get set. This patch calls
the rte_cryptodev_start and stop APIs which start and
stop devices properly and fp_ops get set.
Reported-by: Ciara Power <ciara.power@intel.com>
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Akhil Goyal [Wed, 20 Oct 2021 11:27:49 +0000 (16:57 +0530)]
cryptodev: move inline APIs into separate structure
Move fastpath inline function pointers from rte_cryptodev into a
separate structure accessed via a flat array.
The intention is to make rte_cryptodev and related structures private
to avoid future API/ABI breakages.
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Tested-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Akhil Goyal [Wed, 20 Oct 2021 11:27:48 +0000 (16:57 +0530)]
cryptodev: allocate max space for internal queue array
At queue_pair config stage, allocate memory for maximum
number of queue pair pointers that a device can support.
This will allow fast path APIs(enqueue_burst/dequeue_burst) to
refer pointer to internal QP data without checking for currently
configured QPs.
This is required to hide the rte_cryptodev and rte_cryptodev_data
structure from user.
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Akhil Goyal [Wed, 20 Oct 2021 11:27:47 +0000 (16:57 +0530)]
cryptodev: separate out internal structures
A new header file rte_cryptodev_core.h is added and all
internal data structures which need not be exposed directly to
application are moved to this file. These structures are mostly
used by drivers, but they need to be in the public header file
as they are accessed by datapath inline functions for
performance reasons.
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Tested-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Kai Ji [Fri, 15 Oct 2021 14:39:57 +0000 (14:39 +0000)]
test/crypto: enable chacha_poly PMD
An autotest is added for the new chacha20_poly1305 PMD.
A new test case is also added for SGL test.
Signed-off-by: Kai Ji <kai.ji@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Kai Ji [Fri, 15 Oct 2021 14:39:56 +0000 (14:39 +0000)]
crypto/ipsec_mb: add chacha_poly PMD
Add in new chacha20_poly1305 PMD to the ipsec_mb framework.
Signed-off-by: Kai Ji <kai.ji@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Piotr Bronowski [Fri, 15 Oct 2021 14:39:55 +0000 (14:39 +0000)]
crypto/ipsec_mb: move zuc PMD
This patch removes the crypto/zuc folder and gathers all zuc PMD
implementation specific details into two files,
pmd_zuc.c and pmd_zuc_priv.h in the crypto/ipsec_mb folder.
Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Piotr Bronowski [Fri, 15 Oct 2021 14:39:54 +0000 (14:39 +0000)]
crypto/ipsec_mb: support snow3g digest appended ops
This patch enables out-of-place auth-cipher operations where
digest should be encrypted along with the rest of raw data.
It also adds support for partially encrypted digest when using
auth-cipher operations.
Signed-off-by: Damian Nowak <damianx.nowak@intel.com>
Signed-off-by: Kai Ji <kai.ji@intel.com>
Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Piotr Bronowski [Fri, 15 Oct 2021 14:39:53 +0000 (14:39 +0000)]
crypto/ipsec_mb: move snow3g PMD
This patch removes the crypto/snow3g folder and gathers all snow3g PMD
implementation specific details into a single file,
pmd_snow3g.c in the crypto/ipsec_mb folder.
Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Piotr Bronowski [Fri, 15 Oct 2021 14:39:52 +0000 (14:39 +0000)]
crypto/ipsec_mb: move kasumi PMD
This patch removes the crypto/kasumi folder and gathers all kasumi PMD
implementation specific details into a single file,
pmd_kasumi.c in the crypto/ipsec_mb folder.
Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Piotr Bronowski [Fri, 15 Oct 2021 14:39:51 +0000 (14:39 +0000)]
crypto/ipsec_mb: move aesni_gcm PMD
This patch removes the crypto/aesni_gcm folder and gathers all
aesni-gcm PMD implementation specific details into a single file,
pmd_aesni_gcm.c in the crypto/ipsec_mb folder.
A redundant check for iv length is removed.
GCM ops are stored in the queue pair for multi process support, they
are updated during queue pair setup for both primary and secondary
processes.
GCM ops are also set per lcore for the CPU crypto mode.
Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Pablo de Lara [Fri, 15 Oct 2021 14:39:50 +0000 (14:39 +0000)]
test/crypto: add ZUC-256 vectors
Add extra ZUC-EIA3-256 and ZUC-EEA3-256 test vectors.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Pablo de Lara [Fri, 15 Oct 2021 14:39:49 +0000 (14:39 +0000)]
test/crypto: check auth parameters
Check for auth parameters in the transform to verify if a test case is
supported by the crypto device under test.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Pablo de Lara [Fri, 15 Oct 2021 14:39:48 +0000 (14:39 +0000)]
test/crypto: check cipher parameters
Check for cipher parameters in the transform to verify if a test case
is supported by the crypto device under test.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Pablo de Lara [Fri, 15 Oct 2021 14:39:47 +0000 (14:39 +0000)]
crypto/ipsec_mb: support ZUC-256 for aesni_mb
Add support for ZUC-EEA3-256 and ZUC-EIA3-256.
Only 4-byte tags supported for now.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Piotr Bronowski [Fri, 15 Oct 2021 14:39:46 +0000 (14:39 +0000)]
crypto/ipsec_mb: move aesni_mb PMD
This patch removes the crypto/aesni_mb folder and gathers all
aesni-mb PMD implementation specific details into a single file,
pmd_aesni_mb.c in crypto/ipsec_mb.
Now that intel-ipsec-mb v1.0 is the minimum supported version, old
macros can be replaced with the newer macros supported by this version.
Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Ciara Power [Fri, 15 Oct 2021 14:39:45 +0000 (14:39 +0000)]
crypto/ipsec_mb: support multi-process
The ipsec_mb SW PMD now has multiprocess support.
The queue-pair IMB_MGR is stored in a memzone instead of being allocated
externally by the Intel IPSec MB library, when v1.1 is used.
If v1.0 is used, multi process is not supported, and allocation is
done as before.
The secondary process needs to reconfigure the queue-pair to allow for
IMB_MGR function pointers be updated.
Intel IPsec MB library version 1.1 is required for this support.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Fan Zhang [Fri, 15 Oct 2021 14:39:44 +0000 (14:39 +0000)]
crypto/ipsec_mb: introduce IPsec_mb framework
This patch introduces the new framework to share common code between
the SW crypto PMDs that depend on the intel-ipsec-mb library.
This change helps to reduce future effort on the code maintenance and
feature updates.
The PMDs that will be added to this framework in subsequent patches are:
- AESNI MB
- AESNI GCM
- CHACHA20_POLY1305
- KASUMI
- SNOW3G
- ZUC
The use of these PMDs will not change, they will still be supported for
x86, and will use the same EAL args as before.
The minimum required version for the intel-ipsec-mb library is now v1.0.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Andrew Rybchenko [Fri, 22 Oct 2021 11:20:16 +0000 (14:20 +0300)]
ethdev: avoid usage of ULL for 64-bit unsigned constants
Use UINT64_C() macro instead.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Andrew Rybchenko [Fri, 22 Oct 2021 07:30:02 +0000 (10:30 +0300)]
ethdev: replace single bit masks with macros
The macros RTE_BIT32 and RTE_BIT64 are used to replace single bit masks.
Do not switch VLAN offload flags since type is not fixed size.
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Ferruh Yigit [Fri, 22 Oct 2021 11:03:12 +0000 (12:03 +0100)]
ethdev: add namespace
Add 'RTE_ETH' namespace to all enums & macros in a backward compatible
way. The macros for backward compatibility can be removed in next LTS.
Also updated some struct names to have 'rte_eth' prefix.
All internal components switched to using new names.
Syntax fixed on lines that this patch touches.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Wisam Jaddo <wisamm@nvidia.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Konstantin Ananyev [Fri, 22 Oct 2021 13:26:42 +0000 (14:26 +0100)]
test/bonding: fix after hiding ethdev internal structures
link bounding auto-test internally creates emulated ethdev.
Some tests change Rx/Tx functions of this emulated device on the fly:
by directly modifying rte_eth_dev fields and without doing stop/start
for these devices.
As now ethdev uses rte_eth_fp_ops[] for fast-path functions, these
direct changes doesn't make expected effect.
Fix the problem by guarding fast-path functions changes with
rte_eth_dev_stop()/rte_eth_dev_start().
Fixes:
7a0935239b9e ("ethdev: make fast-path functions to use new flat array")
Reported-by: Lewei Yang <leweix.yang@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Ferruh Yigit [Fri, 22 Oct 2021 12:57:15 +0000 (13:57 +0100)]
drivers/net: fix removing jumbo offload flag
After DEV_RX_OFFLOAD_JUMBO_FRAME flag removed, drivers give jumbo frame
decisions based on MTU value checks, but some of the checks were wrong
by mistake, causing device initialization to fail, fixing them.
Fixes:
b563c1421282 ("ethdev: remove jumbo offload flag")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Yu Jiang <yux.jiang@intel.com>
Ferruh Yigit [Fri, 22 Oct 2021 12:57:14 +0000 (13:57 +0100)]
doc: remove jumbo offload feature
Jumbo offload is no more announced as capability, and
'DEV_RX_OFFLOAD_JUMBO_FRAME' offload flag is removed.
This patch is also removing 'Jumbo frame' feature from documentation.
Fixes:
b563c1421282 ("ethdev: remove jumbo offload flag")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Ciara Loftus [Fri, 22 Oct 2021 14:07:17 +0000 (14:07 +0000)]
net/af_xdp: fix max Rx packet length
Commit
1bb4a528c41f ("ethdev: fix max Rx packet length") clarified the
expected usage of the max_rx_pktlen and max_mtu values and implemented
some extra checks on these values to ensure they are sane. After this,
the AF_XDP PMD fails to initialise. The value for max_rx_pktlen which
represents the max size of the Ethernet frame was set to ETH_FRAME_LEN
(1514) and the max_mtu which represents the size of the payload was set
to the max size of the Ethernet frame. This did not make sense, as
naturally the maximum frame size should be greater than the payload
size.
Fix this by setting the max_rx_pktlen equal to the max size of the
Ethernet frame as expected, and the max MTU equal to the max_rx_pktlen
less the overhead which is set to the size of an Ethernet header plus
CRC.
Fixes:
1bb4a528c41f ("ethdev: fix max Rx packet length")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Ivan Ilchenko [Fri, 22 Oct 2021 10:18:28 +0000 (13:18 +0300)]
ethdev: forbid MTU set before device configure
rte_eth_dev_configure() always sets MTU to either dev_conf.rxmode.mtu
or RTE_ETHER_MTU if application doesn't provide the value.
So, there is no point to allow rte_eth_dev_set_mtu() before since
set value will be overwritten on configure anyway.
Fixes:
1bb4a528c41f ("ethdev: fix max Rx packet length")
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Andrew Rybchenko [Fri, 22 Oct 2021 07:22:14 +0000 (10:22 +0300)]
ethdev: remove unused L2 tunnel mask defines
Fixes:
cf47acc0f9ba ("ethdev: remove L2 tunnel offload control API")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Eli Britstein [Thu, 21 Oct 2021 13:20:02 +0000 (16:20 +0300)]
app/testpmd: fix packet burst spreading stats
RX/TX functions (rte_eth_rx_burst/rte_eth_tx_burst) get 'nb_pkts'
argument, which specifies the maximum number to receive/transmit.
It can be 0..nb_pkts, meaning nb_pkts+1 options.
Testpmd can provide statistics of the burst sizes ('set
record-burst-stats on') by incrementing an array cell of index
<burst-size>. This array is mistakenly [MAX_PKT_BURST] size. Receiving
the maximum burst will cause out of bound write.
Enlarge the spread stats array by one cell to fix it.
Fixes:
af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Chengchang Tang [Fri, 22 Oct 2021 01:38:40 +0000 (09:38 +0800)]
net/hns3: add runtime config for mailbox limit time
Current, the max waiting time for MBX response is 500ms, but in
some scenarios, it is not enough. Since it depends on the response
of the kernel mode driver, and its response time is related to the
scheduling of the system. In this special scenario, most of the
cores are isolated, and only a few cores are used for system
scheduling. When a large number of services are started, the
scheduling of the system will be very busy, and the reply of the
mbx message will time out, which will cause our PMD initialization
to fail.
This patch add a runtime config to set the max wait time. For the
above scenes, users can adjust the waiting time to a suitable value
by themselves.
Fixes:
463e748964f5 ("net/hns3: support mailbox")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Xueming Li [Thu, 21 Oct 2021 10:41:42 +0000 (18:41 +0800)]
app/testpmd: add forwarding engine for shared Rx queue
To support shared Rx queue, this patch introduces dedicate forwarding
engine. The engine groups received packets by mbuf->port into sub-group,
updates stream statistics and simply frees packets.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Xueming Li [Thu, 21 Oct 2021 10:41:41 +0000 (18:41 +0800)]
app/testpmd: force shared Rx queue polled on same core
Shared Rx queue must be polled on same core. This patch checks and stops
forwarding if shared RxQ being scheduled on multiple
cores.
It's suggested to use same number of Rx queues and polling cores.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
Xueming Li [Thu, 21 Oct 2021 10:41:40 +0000 (18:41 +0800)]
app/testpmd: dump port info for shared Rx queue
In case of shared Rx queue, source port mbuf from polling result isn't
the Rx port of forwarding stream. To provide original port ID, this
patch dumps mbuf->port for each packet in verbose mode if shared Rx
queue enabled.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Xueming Li [Thu, 21 Oct 2021 10:41:39 +0000 (18:41 +0800)]
app/testpmd: add parameter for shared Rx queue
Adds "--rxq-share=X" parameter to enable shared RxQ.
Rx queue is shared if device supports, otherwise fallback to standard
RxQ.
Shared Rx queues are grouped per X ports. X defaults to UINT32_MAX,
implies all ports join share group 1. Queue ID is mapped equally with
shared Rx queue ID.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Xueming Li [Thu, 21 Oct 2021 10:41:38 +0000 (18:41 +0800)]
app/testpmd: dump device capability and Rx domain info
Dump device capability and Rx domain ID if shared Rx queue is supported
by device.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Xueming Li [Thu, 21 Oct 2021 10:41:37 +0000 (18:41 +0800)]
ethdev: get device capability name as string
This patch adds API to return name of device capability.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Xueming Li [Thu, 21 Oct 2021 10:41:36 +0000 (18:41 +0800)]
ethdev: introduce shared Rx queue
In current DPDK framework, each Rx queue is pre-loaded with mbufs to
save incoming packets. For some PMDs, when number of representors scale
out in a switch domain, the memory consumption became significant.
Polling all ports also leads to high cache miss, high latency and low
throughput.
This patch introduces shared Rx queue. Ports in same Rx domain and
switch domain could share Rx queue set by specifying non-zero sharing
group in Rx queue configuration.
Shared Rx queue is identified by share_rxq field of Rx queue
configuration. Port A RxQ X can share RxQ with Port B RxQ Y by using
same shared Rx queue ID.
No special API is defined to receive packets from shared Rx queue.
Polling any member port of a shared Rx queue receives packets of that
queue for all member ports, port_id is identified by mbuf->port. PMD is
responsible to resolve shared Rx queue from device and queue data.
Shared Rx queue must be polled in same thread or core, polling a queue
ID of any member port is essentially same.
Multiple share groups are supported. PMD should support mixed
configuration by allowing multiple share groups and non-shared Rx queue
on one port.
Example grouping and polling model to reflect service priority:
Group1, 2 shared Rx queues per port: PF, rep0, rep1
Group2, 1 shared Rx queue per port: rep2, rep3, ... rep127
Core0: poll PF queue0
Core1: poll PF queue1
Core2: poll rep2 queue0
PMD advertise shared Rx queue capability via RTE_ETH_DEV_CAPA_RXQ_SHARE.
PMD is responsible for shared Rx queue consistency checks to avoid
member port's configuration contradict each other.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Huisong Li [Thu, 21 Oct 2021 02:24:21 +0000 (10:24 +0800)]
ethdev: fix PCI device release in secondary process
In secondary process, rte_eth_dev_close() doesn't clear eth_dev->data.
If calling rte_dev_remove() after rte_eth_dev_close(), in
rte_eth_dev_pci_generic_remove() function, the released eth device still
can be found by its name in shared memory. As a result, the eth device
will be released repeatedly. The state of the eth device is modified to
RTE_ETH_DEV_UNUSED after rte_eth_dev_close(). So this state can be used
to avoid this problem.
Fixes:
dcd5c8112bc3 ("ethdev: add PCI driver helpers")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Satheesh Paul [Thu, 21 Oct 2021 05:11:15 +0000 (10:41 +0530)]
net/cnxk: support port ID flow action
This patch adds support for rte flow action type port_id to
enable directing packets from an input port PF to an output
port which is a VF of the input port PF.
Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Satheesh Paul [Thu, 21 Oct 2021 05:11:14 +0000 (10:41 +0530)]
common/cnxk: support port ID action
This patch adds ROC API to support flow port ID action type.
Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Xuan Ding [Thu, 21 Oct 2021 14:25:40 +0000 (14:25 +0000)]
net/virtio: fix avail descriptor ID
Vhost will update desc’s Buffer ID advance to next used descriptor when
VIRTIO_F_IN_ORDER feature negotiated. When virtio reuses the descriptor,
the Buffer ID should be restored even VIRTQ_DESC_F_INDIRECT
feature negotiated.
Fixes:
b473061b0e1d ("net/virtio: fix indirect descriptors in packed datapaths")
Cc: stable@dpdk.org
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yong Liu <yong.liu@intel.com>
Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Gaoxiang Liu [Sun, 17 Oct 2021 23:19:52 +0000 (07:19 +0800)]
net/vhost: merge stats loop in datapath
To improve performance in vhost Tx/Rx, merge vhost stats loop.
eth_vhost_tx has 2 loop of send num iteraion.
It can be merge into one.
eth_vhost_rx has the same issue as Tx.
Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xuan Ding [Mon, 11 Oct 2021 07:59:42 +0000 (07:59 +0000)]
vhost: enable IOMMU for async vhost
The use of IOMMU has many advantages, such as isolation and address
translation. This patch extends the capability of DMA engine to use
IOMMU if the DMA engine is bound to vfio.
When set memory table, the guest memory will be mapped
into the default container of DPDK.
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xuan Ding [Mon, 11 Oct 2021 07:59:41 +0000 (07:59 +0000)]
vfio: allow partially unmapping adjacent memory
Currently, if we map a memory area A, then map a separate memory area B
that by coincidence happens to be adjacent to A, current implementation
will merge these two segments into one, and if partial unmapping is not
supported, these segments will then be only allowed to be unmapped in
one go. In other words, given segments A and B that are adjacent, it
is currently not possible to map A, then map B, then unmap A.
Fix this by adding a notion of "chunk size", which will allow
subdividing segments into equally sized segments whenever we are dealing
with an IOMMU that does not support partial unmapping. With this change,
we will still be able to merge adjacent segments, but only if they are
of the same size. If we keep with our above example, adjacent segments A
and B will be stored as separate segments if they are of different
sizes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xuan Ding [Wed, 13 Oct 2021 01:36:40 +0000 (01:36 +0000)]
net/virtio: fix indirect descriptor reconnection
Add initialization for packed ring indirect descriptors
in reconnection path.
Fixes:
381f39ebb78a ("net/virtio: fix packed ring indirect descricptors setup")
Cc: stable@dpdk.org
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Li Feng [Thu, 14 Oct 2021 12:40:08 +0000 (20:40 +0800)]
vhost: add sanity check on inflight last index
The index in rte_vhost_set_last_inflight_io_split is from
the frontend driver, check if it's in the virtqueue range.
Fixes:
bb0c2de9602b ("vhost: add APIs to operate inflight ring")
Cc: stable@dpdk.org
Signed-off-by: Li Feng <fengli@smartx.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Ivan Malov [Thu, 16 Sep 2021 18:49:55 +0000 (21:49 +0300)]
net/virtio: fix Tx checksum for tunnel packets
Tx prepare method calls rte_net_intel_cksum_prepare(), which
handles tunnel packets correctly, but Tx burst path does not
take tunnel presence into account when computing the offsets.
Fixes:
58169a9c8153 ("net/virtio: support Tx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Wenwu Ma [Fri, 24 Sep 2021 17:23:00 +0000 (17:23 +0000)]
examples/vhost: fix use after free on drain
When a vdev is removed in destroy_device function,
the corresponding vhost TX buffer will also be freed,
but the vhost TX buffer may still be used in the
drain_vhost function, which will cause an error of
heap-use-after-free. Therefore, before accessing
vhost TX buffer, we need to check whether the vdev
has been removed, if so, let's skip this vdev.
Fixes:
a68ba8e0a6b6 ("examples/vhost: refactor vhost data path")
Cc: stable@dpdk.org
Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Marvin Liu [Sun, 26 Sep 2021 09:28:42 +0000 (17:28 +0800)]
net/virtio: fix oversized packets in vectorized Rx
If packed ring size is not power of two, it is possible that remained
number less than one batch and meanwhile batch operation can pass.
This will cause incorrect remained number calculation and then lead to
receiving oversized packets. The patch fixed the issue by added
remained number check before batch operation.
Fixes:
77d66da83834 ("net/virtio: add vectorized packed ring Rx")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xueming Li [Fri, 15 Oct 2021 15:05:45 +0000 (23:05 +0800)]
vdpa/mlx5: retry VAR allocation during vDPA restart
VAR is the device memory space for the virtio queues doorbells,
Qemu could mmap it to directly to speed up doorbell push.
On a busy system, Qemu takes time to release VAR resources during driver
shutdown. If vdpa restarted quickly, the VAR allocation failed with
error 28 since the VAR is singleton resource per device.
This patch adds retry mechanism for VAR allocation.
Fixes:
4cae722c1b06 ("vdpa/mlx5: move virtual doorbell alloc to probe")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xueming Li [Fri, 15 Oct 2021 15:05:44 +0000 (23:05 +0800)]
vdpa/mlx5: workaround FW first completion in start
After a vDPA application restart, Qemu restores VQ with used and
available index, new incoming packet triggers virtio driver to
handle buffers. Under heavy traffic, no available buffer for
firmware to receive new packets, no Rx interrupts generated,
driver is stuck on endless interrupt waiting.
As a firmware workaround, this patch sends a notification after
VQ setup to ask driver handling buffers and filling new buffers.
Fixes:
bff735011078 ("vdpa/mlx5: prepare virtio queues")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Zhihong Peng [Fri, 8 Oct 2021 05:49:45 +0000 (05:49 +0000)]
net/virtio: fix check scatter on all Rx queues
This patch fixes the wrong way to obtain virtqueue.
The end of virtqueue cannot be judged based on whether
the array is NULL.
Fixes:
4e8169eb0d2d ("net/virtio: fix Rx scatter offload")
Cc: stable@dpdk.org
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Ting Xu [Thu, 21 Oct 2021 05:54:07 +0000 (05:54 +0000)]
net/ice: fix TM hierarchy commit flag reset
After DCF commits TM hierarchy configuration, the commit flag is set to
avoid duplicated commit. But the flag is not reset after device stop,
which prevents the update of hierarchy configuration unless close the
device. It is not reasonable. This patch fix to reset the commit flag
after device stop. Then users can delete and add nodes to commit a new
TM hierarchy configuration.
Fixes:
3a6bfc37eaf4 ("net/ice: support QoS config VF bandwidth in DCF")
Cc: stable@dpdk.org
Signed-off-by: Ting Xu <ting.xu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
William Tu [Wed, 20 Oct 2021 03:47:49 +0000 (20:47 -0700)]
net/e1000: build on Windows
This patch enables building the e1000 driver for Windows.
I tested using two Windows VM on top of VMware Fusion,
creating two e1000 devices with device ID 0x10D3 (8274L),
verifying rx/tx works correctly using dpdk-testpmd.exe
rxonly and txonly mode.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>
Tudor Cornea [Wed, 20 Oct 2021 18:13:46 +0000 (21:13 +0300)]
net/ixgbe: fix port initialization if MTU config fails
On a VMware ESXi 6.0 setup with an Intel 82599 NIC the ports don't
seem to initialize anymore, while running testpmd.
Configuring Port 0 (socket 0)
ixgbevf_dev_rx_init(): Set max packet length to 1518 failed.
ixgbevf_dev_start(): Unable to initialize RX hardware (-22)
Fail to start port 0: Invalid argument
Configuring Port 1 (socket 0)
ixgbevf_dev_rx_init(): Set max packet length to 1518 failed.
ixgbevf_dev_start(): Unable to initialize RX hardware (-22)
Fail to start port 1: Invalid argument
Please stop the ports first
If the call to ixgbevf_rlpml_set_vf fails and we return prematurely,
we will not be able to initialize the ports correctly.
The behavior seems to have changed since the following commit:
Fixes:
c77866a16904 ("net/ixgbe: detect failed VF MTU set")
Cc: stable@dpdk.org
We can make this particular use case work correctly if we don't
return an error, which seems to be consistent with the overall
kernel ixgbevf implementation.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c?h=v5.14#n2015
Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Rongwei Liu [Thu, 21 Oct 2021 08:56:36 +0000 (11:56 +0300)]
net/mlx5: set Tx queue affinity in round-robin
Previously, we set txq affinity to 0 and let firmware
to perform round-robin when bonding. Firmware uses a
global counter to assign txq affinity to different
physical ports accord to remainder after division.
There are three dis-advantages:
1. The global counter is shared between kernel and dpdk.
2. After restarting pmd or port, the previous counter value
is reused, so the new affinity is unpredictable.
3. There is no way to get what affinity is set by firmware.
In this update, we will create several TISs up to the
number of bonding ports and bind each TIS to one PF port.
For each port, it will start to pick up TIS using its port
index. Upper layer application can quickly calculate each txq's
affinity without querying.
At DPDK layer, when creating txq with 2 bonding ports, the
affinity is set like:
port 0: 1-->2-->1-->2
port 1: 2-->1-->2-->1
port 2: 1-->2-->1-->2
Note: Only applicable to DevX api.
This affinity subjects to HW hash.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Rongwei Liu [Thu, 21 Oct 2021 08:56:35 +0000 (11:56 +0300)]
common/mlx5: add LAG context query
Added a new function mlx5_devx_cmd_query_lag() to query LAG
property from firmware including state/affinity/mode etc.
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Dmitry Kozlyuk [Thu, 14 Oct 2021 08:55:28 +0000 (11:55 +0300)]
net/mlx5: close tools socket with last device
MLX5 PMD exposes a socket for external tools to dump port state.
Socket events are listened using an interrupt source of EXT type.
The socket was closed and the interrupt callback was unregistered
at program exit, which is incorrect because DPDK could be already
shut down at this point. Move actions performed at program exit
to the moment the last MLX5 port is closed. The socket will be opened
again if later a new MLX5 device is plugged in and probed.
Also fix comments that were decisively talking
about secondary processes instead of external tools.
Fixes:
e6cdc54cc0ef ("net/mlx5: add socket server for external tools")
Cc: stable@dpdk.org
Reported-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Dmitry Kozlyuk [Mon, 18 Oct 2021 17:24:56 +0000 (20:24 +0300)]
net/mlx5: fix Rx queue resource cleanup
mlx5_rxq_start() allocates rxq_ctrl->obj and frees it on failure,
but did not set it to NULL. Later mlx5_rxq_release() could not recognize
this object is already freed and attempted to release its resources,
resulting in a crash:
Configuring Port 0 (socket 0)
mlx5_common: Failed to create RQ using DevX
mlx5_common: Can't create DevX RQ object.
mlx5_net: Port 0 Rx queue 0 RQ creation failure.
Segmentation fault
Set rxq_ctrl->obj to NULL after it is freed to skip resource release.
Fixes:
1260a87b2889 ("net/mlx5: share Rx control code")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Bing Zhao [Mon, 18 Oct 2021 13:53:05 +0000 (16:53 +0300)]
net/mlx5: fix meter yellow policy with RSS action
The RSS configuration in a policy action container was a pointer
inside a union, and the pointer area could be used as other fate
action. In the current implementation, the RSS of the green color
was prior to that of the yellow color. There was a high possibility
the pointer was considered as the RSS and result in a error flow
expansion when only the yellow color had the RSS action.
The check of the fate action type should also be done to get rid of
the misjudgment.
Fixes:
b38a12272b3a ("net/mlx5: split meter color policy handling")
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Xueming Li [Tue, 19 Oct 2021 10:35:01 +0000 (18:35 +0800)]
net/mlx5: check DevX to support more Verbs ports
Verbs API doesn't support device port number larger than 255 by design.
To support more VF or SubFunction port representors, forces DevX API
check when max Verbs device link ports larger than 255.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Xueming Li [Tue, 19 Oct 2021 10:35:00 +0000 (18:35 +0800)]
net/mlx5: enable DevX Tx queue creation
Verbs API does not support Infiniband device port number larger 255 by
design. To support more representors on a single Infiniband device DevX
API should be engaged.
While creating Send Queue (SQ) object with Verbs API, the PMD assigned
IB device port attribute and kernel created the default miss flows in
FDB domain, to redirect egress traffic from the queue being created to
representor appropriate peer (wire, HPF, VF or SF).
With DevX API there is no IB-device port attribute (it is merely kernel
one, DevX operates in PRM terms) and PMD must create default miss flows
in FDB explicitly. PMD did not provide this and using DevX API for
E-Switch configurations was disabled.
The default miss FDB flow matches E-Switch manager vport (to make sure
the source is some representor) and SQn (Send Queue number - device
internal queue index). The root flow table managed by kernel/firmware
and it does not support vport redirect action, we have to split the
default miss flow into two ones:
- flow with lowest priority in the root table that matches E-Switch
manager vport ID and jump to group 1.
- flow in group 1 that matches E-Switch manager vport ID and SQn and
forwards packet to peer vport
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>