dpdk.git
2 years agoeal: extend --huge-unlink for hugepage file reuse
Dmitry Kozlyuk [Thu, 3 Feb 2022 18:13:36 +0000 (20:13 +0200)]
eal: extend --huge-unlink for hugepage file reuse

Expose Linux EAL ability to reuse existing hugepage files
via --huge-unlink=never switch.
Default behavior is unchanged, it can also be specified
using --huge-unlink=existing for consistency.
Old --huge-unlink switch is kept,
it is an alias for --huge-unlink=always.
Add a test case for the --huge-unlink=never mode.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2 years agoeal/linux: allow hugepage file reuse
Dmitry Kozlyuk [Thu, 3 Feb 2022 18:13:35 +0000 (20:13 +0200)]
eal/linux: allow hugepage file reuse

Linux EAL ensured that mapped hugepages are clean
by always mapping from newly created files:
existing hugepage backing files were always removed.
In this case, the kernel clears the page to prevent data leaks,
because the mapped memory may contain leftover data
from the previous process that was using this memory.
Clearing takes the bulk of the time spent in mmap(2),
increasing EAL initialization time.

Introduce a mode to keep existing files and reuse them
in order to speed up initial memory allocation in EAL.
Hugepages mapped from such files may contain data
left by the previous process that used this memory,
so RTE_MEMSEG_FLAG_DIRTY is set for their segments.
If multiple hugepages are mapped from the same file:
1. When fallocate(2) is used, all memory mapped from this file
   is considered dirty, because it is unknown
   which parts of the file are holes.
2. When ftruncate(3) is used, memory mapped from this file
   is considered dirty unless the file is extended
   to create a new mapping, which implies clean memory.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2 years agoeal: refactor --huge-unlink storage
Dmitry Kozlyuk [Thu, 3 Feb 2022 18:13:34 +0000 (20:13 +0200)]
eal: refactor --huge-unlink storage

In preparation to extend --huge-unlink option semantics
refactor how it is stored in the internal configuration.
It makes future changes more isolated.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2 years agomem: add dirty malloc element support
Dmitry Kozlyuk [Thu, 3 Feb 2022 18:13:33 +0000 (20:13 +0200)]
mem: add dirty malloc element support

EAL malloc layer assumed all free elements content
is filled with zeros ("clean"), as opposed to uninitialized ("dirty").
This assumption was ensured in two ways:
1. EAL memalloc layer always returned clean memory.
2. Freed memory was cleared before returning into the heap.

Clearing the memory can be as slow as around 14 GiB/s.
To save doing so, memalloc layer is allowed to return dirty memory.
Such segments being marked with RTE_MEMSEG_FLAG_DIRTY.
The allocator tracks elements that contain dirty memory
using the new flag in the element header.
When clean memory is requested via rte_zmalloc*()
and the suitable element is dirty, it is cleared on allocation.
When memory is deallocated, the freed element is joined
with adjacent free elements, and the dirty flag is updated:

a) If the joint element contains dirty parts, it is dirty:

    dirty + freed + dirty = dirty  =>  no need to clean
            freed + dirty = dirty      the freed memory

   Dirty parts may be large (e.g. initial allocation),
   so clearing them could create unpredictable slowdown.

b) If the only dirty part of the joint element
   is the freed memory, the joint element can be made clean:

    clean + freed + clean = clean  =>  freed memory
    clean + freed         = clean      must be cleared
            freed + clean = clean
            freed         = clean

   This logic naturally reproduces the old behavior
   and always applies in modes when EAL memalloc layer
   returns only clean segments.

As a result, memory is either cleared on free, as before,
or it will be cleared on allocation if need be, but never twice.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2 years agoapp/test: add allocator performance benchmark
Dmitry Kozlyuk [Thu, 3 Feb 2022 18:13:32 +0000 (20:13 +0200)]
app/test: add allocator performance benchmark

Memory allocator performance is crucial to applications that deal
with large amount of memory or allocate frequently. DPDK allocator
performance is affected by EAL options, API used and, at least,
allocation size. New autotest is intended to be run with different
EAL options. It measures performance with a range of sizes
for dirrerent APIs: rte_malloc, rte_zmalloc, and rte_memzone_reserve.

Work distribution between allocation and deallocation depends on EAL
options. The test prints both times and total time to ease comparison.

Memory can be filled with zeroes at different points of allocation path,
but it always takes considerable fraction of overall timing. This is why
the test measures filling speed and prints how long clearing takes
for each size as a reference (for rte_memzone_reserve estimations
are printed).

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2 years agodoc: add hugepage mapping details
Dmitry Kozlyuk [Thu, 3 Feb 2022 18:13:31 +0000 (20:13 +0200)]
doc: add hugepage mapping details

Hugepage mapping is a layer of EAL malloc builds upon.
There were implicit references to its details,
like mentions of segment file descriptors,
but no explicit description of its modes and operation.
Add an overview of mechanics used on ech supported OS.
Convert memory management subsections from list items
to level 4 headers: they are big and important enough.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2 years agoeventdev: remove useless C++ include guard
Weiguo Li [Mon, 7 Feb 2022 12:37:01 +0000 (20:37 +0800)]
eventdev: remove useless C++ include guard

This private header contains an incomplete cplusplus guard,
just remove it.

Fixes: d35e61322de52 ("eventdev: move inline APIs into separate structure")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
2 years agoeal/windows: remove useless C++ include guard
Weiguo Li [Mon, 7 Feb 2022 12:37:00 +0000 (20:37 +0800)]
eal/windows: remove useless C++ include guard

Remove the incomplete cplusplus guard in internal header.

Fixes: 6e1ed4cbbe99 ("eal/windows: add dirent implementation")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
2 years agonet/dpaa2: remove useless C++ include guard
Weiguo Li [Mon, 7 Feb 2022 12:36:59 +0000 (20:36 +0800)]
net/dpaa2: remove useless C++ include guard

Remove the incomplete cplusplus guard in internal headers.

Fixes: 72ec7a678e70 ("net/dpaa2: add soft parser driver")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
2 years agonet/cxgbe: remove useless C++ include guard
Weiguo Li [Mon, 7 Feb 2022 12:36:58 +0000 (20:36 +0800)]
net/cxgbe: remove useless C++ include guard

Remove the incomplete cplusplus guard in internal header.

Fixes: 3bd122eef2cc ("cxgbe/base: add hardware API for Chelsio T5 series adapters")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
2 years agocommon/mlx5: remove useless C++ include guard
Weiguo Li [Mon, 7 Feb 2022 12:36:57 +0000 (20:36 +0800)]
common/mlx5: remove useless C++ include guard

Remove the incomplete cplusplus guard in internal headers.

Fixes: 7525ebd8ebb0 ("common/mlx5: add glue functions on Windows")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
2 years agobus/dpaa: fix C++ include guard
Weiguo Li [Mon, 7 Feb 2022 12:36:56 +0000 (20:36 +0800)]
bus/dpaa: fix C++ include guard

Supplement the missing half of braces for the extern "C" block,
or remove the incomplete guard in internal header.

Fixes: 6d6b4f49a155 ("bus/dpaa: add FMAN hardware operations")
Fixes: 919eeaccb2ba ("bus/dpaa: introduce NXP DPAA bus driver skeleton")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
2 years agotest: enable subset of tests on Windows
Jie Zhou [Wed, 26 Jan 2022 05:10:44 +0000 (21:10 -0800)]
test: enable subset of tests on Windows

Enable a subset of unit tests for Windows CI

- For driver tests, driver owners should enable corresponding tests when
  enabling driver for Windows.
- For dump tests, currently the tests hang on Windows which require
  further investigation.
- For telemetry tests, it has POSIX socket specific codes which require
  replacement for Windows. Will investigate and work on a separate patch.

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2 years agotest: replace shell script with Python
Jie Zhou [Wed, 26 Jan 2022 05:10:43 +0000 (21:10 -0800)]
test: replace shell script with Python

- Add python script to check if system supports hugepages
- Remove corresponding .sh script
- Replace calling of .sh with corresponding .py in meson.build

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2 years agotest: skip unsupported tests on Windows
Jie Zhou [Wed, 26 Jan 2022 05:10:42 +0000 (21:10 -0800)]
test: skip unsupported tests on Windows

Skip tests which are not yet supported for Windows:
- The libraries that tests depend on are not enabled on Windows yet
- The tests can compile but with issue still under investigation
    * test_func_reentrancy:
      Windows EAL has no protection against repeated calls.
    * test_lcores:
      Execution enters an infinite loops, requires investigation.
    * test_rcu_qsbr_perf:
      Execution hangs on Windows, requires investigation.

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
2 years agotest: resolve name collision on Windows
Jie Zhou [Wed, 26 Jan 2022 05:10:41 +0000 (21:10 -0800)]
test: resolve name collision on Windows

Add prefix to resolve name collision on Windows.

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2 years agotest/alarm: disable bad time cases on Windows
Jie Zhou [Wed, 26 Jan 2022 05:10:40 +0000 (21:10 -0800)]
test/alarm: disable bad time cases on Windows

Remove two alarm_autotest test cases which do bogus range check
on Windows.

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2 years agoeal: differentiate strerror message on Windows
Jie Zhou [Wed, 26 Jan 2022 05:10:39 +0000 (21:10 -0800)]
eal: differentiate strerror message on Windows

On Windows, strerror returns just "Unknown error" for errnum greater
than MAX_ERRNO, while linux and freebsd returns "Unknown error <num>",
which is the current expectation for errno_autotest. Differentiate
the error string on Windows to remove a "duplicate error code" failure.

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2 years agotest/log: skip regex on Windows
Jie Zhou [Wed, 26 Jan 2022 05:10:38 +0000 (21:10 -0800)]
test/log: skip regex on Windows

DPDK logs_autotest on Windows failed at "dynamic log types" tests.
The failures are on 2 test cases for rte_log_set_level_regexp API,
due to regular expression is not supported on Windows in DPDK yet
and regcomp/regexec are just stubs on Windows (in regex.h).

In app/test/test_logs.c, ifndef these two test cases, and for the
rte_log_set_level_pattern validation case following these two cases,
differentiate the expected log level passed into macro CHECK_LEVELS

Now logs_autotest completes for all dynamic log types and static log types.

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2 years agotest/interrupts: skip on Windows
Jie Zhou [Wed, 26 Jan 2022 05:10:37 +0000 (21:10 -0800)]
test/interrupts: skip on Windows

Even though test_interrupts.c can compile on Windows, skip interrupt
tests for now since majority of eal_interrupt on Windows are stubs.
Will remove the skip after interrupt being fully enabled on Windows.

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2 years agotest/mem: fix error check
Jie Zhou [Wed, 26 Jan 2022 05:10:36 +0000 (21:10 -0800)]
test/mem: fix error check

Fix incorrect errno variable used in memory autotest.
Use rte_errno instead.

Fixes: 086d426406bd ("test/mem: fix memory autotests on FreeBSD")
Cc: stable@dpdk.org
Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2 years agotest: remove POSIX-specific code
Jie Zhou [Wed, 26 Jan 2022 05:10:35 +0000 (21:10 -0800)]
test: remove POSIX-specific code

- Replace POSIX-specific code with DPDK equivalents or
  conditionally disable it on Windows
- Use NUL on Windows as /dev/null for Unix
- Exclude tests not supported on Windows yet
  * multi-process
  * PMD performance statistics display on signal

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
2 years agoeal/windows: fix error code for not supported API
Jie Zhou [Wed, 26 Jan 2022 05:10:34 +0000 (21:10 -0800)]
eal/windows: fix error code for not supported API

UT memory_autotest on Windows has 2 failed cases on EAL APIs
eal_memalloc_get_seg_fd and eal_memalloc_get_seg_fd_offset. These 2
APIs are not supported on Windows yet. Should return ENOTSUP such that
in test_memory.c these 2 ENOTSUP cases will not be marked as failures,
same as other ENOTSUP cases.

Fixes: 2a5d547a4a9b ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org
Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2 years agoring: fix overflow in memory size calculation
Zhihong Wang [Tue, 14 Dec 2021 03:30:16 +0000 (11:30 +0800)]
ring: fix overflow in memory size calculation

Parameters count and esize are both unsigned int, and their product can
legaly exceed unsigned int and lead to runtime access violation.

Fixes: cc4b218790f6 ("ring: support configurable element size")
Cc: stable@dpdk.org
Signed-off-by: Zhihong Wang <wangzhihong.wzh@bytedance.com>
Reviewed-by: Liang Ma <liangma@liangbit.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2 years agoring: update ring size doxygen comments
Robert Sanford [Wed, 22 Dec 2021 16:20:18 +0000 (11:20 -0500)]
ring: update ring size doxygen comments

- Add RING_F_EXACT_SZ description to rte_ring_init and
  rte_ring_create param comments.
- Fix ring size comments.

Signed-off-by: Robert Sanford <rsanford@akamai.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2 years agoring: fix error code when creating ring
Yunjian Wang [Mon, 10 Jan 2022 09:23:03 +0000 (17:23 +0800)]
ring: fix error code when creating ring

The error value returned by rte_ring_create_elem() should be positive
integers. However, if the rte_ring_get_memsize_elem() function fails,
a negative number is returned and is directly used as the return value.
As a result, this will cause the external call to check the return
value to fail(like called by rte_mempool_create()).

Fixes: a182620042aa ("ring: get size in memory")
Cc: stable@dpdk.org
Reported-by: Nan Zhou <zhounan14@huawei.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2 years agoring: optimize corner case for enqueue/dequeue
Andrzej Ostruszka [Tue, 11 Jan 2022 11:37:39 +0000 (12:37 +0100)]
ring: optimize corner case for enqueue/dequeue

When enqueueing/dequeueing to/from the ring we try to optimize by manual
loop unrolling.  The check for this optimization looks like:

if (likely(idx + n < size)) {

where 'idx' points to the first usable element (empty slot for enqueue,
data for dequeue).  The correct comparison here should be '<=' instead
of '<'.

This is not a functional error since we fall back to the loop with
correct checks on indexes.  Just a minor suboptimal behaviour for the
case when we want to enqueue/dequeue exactly the number of elements that
we have in the ring before wrapping to its beginning.

Fixes: cc4b218790f6 ("ring: support configurable element size")
Fixes: 286bd05bf70d ("ring: optimisations")

Signed-off-by: Andrzej Ostruszka <amo@semihalf.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
2 years agoeal/windows: set worker thread affinity at init
Pallavi Kadam [Fri, 21 Jan 2022 00:17:49 +0000 (16:17 -0800)]
eal/windows: set worker thread affinity at init

Sometimes OS tries to switch the core. So, bind the lcore thread
to a fixed core.
Implement affinity call on Windows similar to Linux.

Signed-off-by: Qiao Liu <qiao.liu@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2 years agomempool: test performance with constant n
Morten Brørup [Mon, 24 Jan 2022 14:59:53 +0000 (15:59 +0100)]
mempool: test performance with constant n

"What gets measured gets done."

This patch adds mempool performance tests where the number of objects to
put and get is constant at compile time, which may significantly improve
the performance of these functions. [*]

Also, it is ensured that the array holding the object used for testing
is cache line aligned, for maximum performance.

And finally, the following entries are added to the list of tests:
- Number of kept objects: 512
- Number of objects to get and to put: The number of pointers fitting
  into a cache line, i.e. 8 or 16

[*] Some example performance test (with cache) results:

get_bulk=4 put_bulk=4 keep=128 constant_n=false rate_persec=280480972
get_bulk=4 put_bulk=4 keep=128 constant_n=true  rate_persec=622159462

get_bulk=8 put_bulk=8 keep=128 constant_n=false rate_persec=477967155
get_bulk=8 put_bulk=8 keep=128 constant_n=true  rate_persec=917582643

get_bulk=32 put_bulk=32 keep=32 constant_n=false rate_persec=871248691
get_bulk=32 put_bulk=32 keep=32 constant_n=true rate_persec=1134021836

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
2 years agodoc: fix KNI PMD name typo
Haiyue Wang [Wed, 19 Jan 2022 12:26:14 +0000 (20:26 +0800)]
doc: fix KNI PMD name typo

The KNI PMD name should be "net_kni".

Fixes: 75e2bc54c018 ("net/kni: add KNI PMD")
Cc: stable@dpdk.org
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2 years agokni: fix ioctl signature
Markus Theil [Fri, 3 Dec 2021 07:19:07 +0000 (08:19 +0100)]
kni: fix ioctl signature

Fix kni's ioctl signature to correctly match the kernel's
structs. This shaves off the (void*) casts and uses struct file*
instead of struct inode*. With the correct signature, control flow
integrity checkers are no longer confused at this point.

Signed-off-by: Markus Theil <markus.theil@secunet.com>
Tested-by: Michael Pfeiffer <michael.pfeiffer@tu-ilmenau.de>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agokni: allow configuring thread granularity
Tudor Cornea [Thu, 20 Jan 2022 12:41:34 +0000 (14:41 +0200)]
kni: allow configuring thread granularity

The Kni kthreads seem to be re-scheduled at a granularity of roughly
1 millisecond right now, which seems to be insufficient for performing
tests involving a lot of control plane traffic.

Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it
seems that the existing code cannot reschedule at the desired granularily,
due to precision constraints of schedule_timeout_interruptible().

In our use case, we leverage the Linux Kernel for control plane, and
it is not uncommon to have 60K - 100K pps for some signaling protocols.

Since we are not in atomic context, the usleep_range() function seems to be
more appropriate for being able to introduce smaller controlled delays,
in the range of 5-10 microseconds. Upon reading the existing code, it would
seem that this was the original intent. Adding sub-millisecond delays,
seems unfeasible with a call to schedule_timeout_interruptible().

KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
schedule_timeout_interruptible(
        usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));

Below, we attempted a brief comparison between the existing implementation,
which uses schedule_timeout_interruptible() and usleep_range().

We attempt to measure the CPU usage, and RTT between two Kni interfaces,
which are created on top of vmxnet3 adapters, connected by a vSwitch.

insmod rte_kni.ko kthread_mode=single carrier=on

schedule_timeout_interruptible(usecs_to_jiffies(5))
kni_single CPU Usage: 2-4 %
[root@localhost ~]# ping 1.1.1.2 -I eth1
PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data.
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms

usleep_range(5, 10)
kni_single CPU usage: 50%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms

usleep_range(20, 50)
kni_single CPU usage: 24%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms

usleep_range(50, 100)
kni_single CPU usage: 13%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms

usleep_range(100, 200)
kni_single CPU usage: 7%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms

usleep_range(1000, 1100)
kni_single CPU usage: 2%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms

Upon testing, usleep_range(1000, 1100) seems roughly equivalent in
latency and cpu usage to the variant with schedule_timeout_interruptible(),
while usleep_range(100, 200) seems to give a decent tradeoff between
latency and cpu usage, while allowing users to tweak the limits for
improved precision if they have such use cases.

Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a
softlockup on my kernel.

Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 1226 Comm: kni_single Tainted: G        W  O 3.10 #1
 <IRQ>  [<ffffffff814f84de>] dump_stack+0x19/0x1b
 [<ffffffff814f7891>] panic+0xcd/0x1e0
 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160
 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0
 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0
 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0
 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80

This patch also attempts to remove this option.

References:
[1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt

Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
Acked-by: Padraig Connolly <Padraig.J.Connolly@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2 years agobuild: remove deprecated Meson functions
Bruce Richardson [Mon, 24 Jan 2022 17:49:59 +0000 (17:49 +0000)]
build: remove deprecated Meson functions

Starting in meson 0.56, the functions meson.source_root() and
meson.build_root() are deprecated and to be replaced by the [more
descriptive] functions: project_source_root()/global_source_root() and
project_build_root()/global_build_root(). Unfortunately, these new
replacement functions were only added in 0.56 release too, so to use
them we would need version checks for old/new functions to remove the
deprecation warnings.

However, the functions "current_build_dir()" and "current_source_dir()"
remain unaffected by all this, so we can bypass the versioning problem,
by saving off these values to "dpdk_source_root" and "dpdk_build_root"
in the top-level meson.build file

Bugzilla ID: 926
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
2 years agobuild: fix warning about using -Wextra flag
Bruce Richardson [Fri, 21 Jan 2022 16:12:30 +0000 (16:12 +0000)]
build: fix warning about using -Wextra flag

Each build, meson would issue a warning reporting that the
"warning_level" setting should be used in place of adding -Wextra
directly to our build commands. Testing with meson 0.61 shows that the
only difference for gcc and clang builds between warning levels 1 and
2 is the addition of -Wextra, so we can remove the warning by deleting
our explicit set of Wextra and changing the build defaults to
warning_level 2.

Fixes: 524a0d5d66b9 ("build: enable extra warnings with meson")

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2 years agobuild: fix warnings when running external commands
Bruce Richardson [Thu, 20 Jan 2022 18:06:39 +0000 (18:06 +0000)]
build: fix warnings when running external commands

Meson 0.61.1 is giving warnings that the calls to run_command do not
always explicitly specify if the result is to be checked or not, i.e.
there is a missing "check" parameter. This is because the default
behaviour without the parameter is due to change in the future.

We can fix these warnings by explicitly adding into each call whether
the result should be checked by meson or not. This patch therefore
adds in "check: false" to each run_command call where the result is
being checked by the DPDK meson.build code afterwards, and adds in
"check: true" to any calls where the result is currently unchecked.

Bugzilla ID: 921
Cc: stable@dpdk.org
Reported-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
2 years agopflock: fix header file installation
Martijn Bakker [Mon, 31 Jan 2022 22:48:21 +0000 (22:48 +0000)]
pflock: fix header file installation

The generic header file was missing
in the list of files to install.

Fixes: 9667d97c2507 ("pflock: add phase-fair reader writer locks")
Cc: stable@dpdk.org
Signed-off-by: Martijn Bakker <gladdyu@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2 years agodoc: update matching versions in ice guide
Qi Zhang [Tue, 25 Jan 2022 01:26:11 +0000 (09:26 +0800)]
doc: update matching versions in ice guide

Add recommended matching list for ice PMD in DPDK 21.08 and DPDK 21.11.

Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Junfeng Guo <junfeng.guo@intel.com>
2 years agonet/i40e: remove redundant reset operation
Feifei Wang [Thu, 27 Jan 2022 07:40:01 +0000 (15:40 +0800)]
net/i40e: remove redundant reset operation

For free buffer operation in i40e vector path, it is unnecessary to
store 'NULL' into txep.mbuf. This is because when putting mbuf into Tx
queue, tx_tail is the sentinel. And when doing tx_free, tx_next_dd is
the sentinel. In all processes, mbuf==NULL is not a condition in check.
Thus reset of mbuf is unnecessary and can be omitted.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2 years agonet/mlx5: reject jump to root table
Xiaoyu Min [Tue, 18 Jan 2022 11:38:50 +0000 (19:38 +0800)]
net/mlx5: reject jump to root table

Currently root table as destination is not supported.
The jump action which finally be translated to underlying root table in
rdma-core should be rejected.

Fixes: f78f747f41d0 ("net/mlx5: allow jump to group lower than current")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agocommon/mlx5: fix probing failure code
Bing Zhao [Mon, 17 Jan 2022 17:49:14 +0000 (19:49 +0200)]
common/mlx5: fix probing failure code

While probing the device with unsupported class, the process should
fail because no appropriate driver was found. After traversing all
the drivers, an error value should be returned for the case.

In the previous implementation, zero value indicating probing success
was wrongly returned.

Fixes: ad435d320473 ("common/mlx5: add bus-agnostic layer")
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agonet/mlx5: fix mark enabling for Rx
Raja Zidane [Sun, 16 Jan 2022 15:23:47 +0000 (15:23 +0000)]
net/mlx5: fix mark enabling for Rx

To optimize datapath, the mlx5 pmd checked for mark action on flow
creation, and flagged possible destination rxqs (through queue/RSS
actions), then it enabled the mark action logic only for flagged rxqs.

Mark action didn't work if no queue/rss action was in the same flow,
even when the user use multi-group logic to manage the flows.
So, if mark action is performed in group X and the packet is moved to
group Y > X when the packet is forwarded to Rx queues, SW did not get
the mark ID to the mbuf.

Flag Rx datapath to report mark action for any queue when the driver
detects the first mark action after dev_start operation.

Fixes: 8e61555657b2 ("net/mlx5: fix shared RSS and mark actions combination")
Cc: stable@dpdk.org
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2 years agocommon/mlx5: fix MR lookup for non-contiguous mempool
Dmitry Kozlyuk [Fri, 14 Jan 2022 10:52:17 +0000 (12:52 +0200)]
common/mlx5: fix MR lookup for non-contiguous mempool

Memory region (MR) lookup by address inside mempool MRs
was not accounting for the upper bound of an MR.
For mempools covered by multiple MRs this could return
a wrong MR LKey, typically resulting in an unrecoverable
TxQ failure:

    mlx5_net: Cannot change Tx QP state to INIT Invalid argument

Corresponding message from /var/log/dpdk_mlx5_port_X_txq_Y_index_Z*:

    Unexpected CQE error syndrome 0x04 CQN = 128 SQN = 4848
        wqe_counter = 0 wq_ci = 9 cq_ci = 122

This is likely to happen with --legacy-mem and IOVA-as-PA,
because EAL intentionally maps pages at non-adjacent PA
to non-adjacent VA in this mode, and MLX5 PMD works with VA.

Fixes: 690b2a88c2f7 ("common/mlx5: add mempool registration facilities")
Cc: stable@dpdk.org
Reported-by: Wang Yunjian <wangyunjian@huawei.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2 years agovhost: use proper logging type for data path
Maxime Coquelin [Wed, 26 Jan 2022 09:55:10 +0000 (10:55 +0100)]
vhost: use proper logging type for data path

This patch changes type from config to data for functions
called in the datapath.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2 years agovhost: differentiate IOTLB logs
Maxime Coquelin [Wed, 26 Jan 2022 09:55:09 +0000 (10:55 +0100)]
vhost: differentiate IOTLB logs

Same logging messages were used for both IOTLB cache
insertion failure and IOTLB pending insertion failure.

This patch differentiate them to ease logs analysis.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2 years agovhost: remove multi-line logs
Maxime Coquelin [Wed, 26 Jan 2022 09:55:08 +0000 (10:55 +0100)]
vhost: remove multi-line logs

This patch replaces multi-lines logs in multiple single-
line logs in order to ease logs filtering based on their
socket path.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2 years agovhost: improve virtio-net layer logs
Maxime Coquelin [Wed, 26 Jan 2022 09:55:07 +0000 (10:55 +0100)]
vhost: improve virtio-net layer logs

This patch standardizes logging done in Virtio-net, so that
the Vhost-user socket path is always prepended to the logs.
It will ease log analysis when multiple Vhost-user ports
are in use.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2 years agovhost: improve socket layer logs
Maxime Coquelin [Wed, 26 Jan 2022 09:55:06 +0000 (10:55 +0100)]
vhost: improve socket layer logs

This patch adds the Vhost socket path whenever possible in
order to make debugging possible when multiple Vhost
devices are in use. Some vhost-user layer functions are
modified to pass the device path down to the socket layer.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2 years agovhost: improve vhost-user layer logs
Maxime Coquelin [Wed, 26 Jan 2022 09:55:05 +0000 (10:55 +0100)]
vhost: improve vhost-user layer logs

This patch adds the Vhost-user socket path to Vhost-user
layer logs in order to ease logs filtering.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2 years agovhost: improve vhost layer logs
Maxime Coquelin [Wed, 26 Jan 2022 09:55:04 +0000 (10:55 +0100)]
vhost: improve vhost layer logs

This patch prepends Vhost logs with the Vhost-user socket
path when available to ease filtering logs for a given port.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2 years agovhost: improve vDPA registration failure log
Maxime Coquelin [Wed, 26 Jan 2022 09:55:03 +0000 (10:55 +0100)]
vhost: improve vDPA registration failure log

This patch adds name of the device failing vDPA registration.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2 years agovhost: improve IOTLB logs
Maxime Coquelin [Wed, 26 Jan 2022 09:55:02 +0000 (10:55 +0100)]
vhost: improve IOTLB logs

This patch adds IOTLB mempool name when logging debug
or error messages, and also prepends the socket path.
to all the logs.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2 years agovhost: add log when setting vring base
Andy Pei [Fri, 14 Jan 2022 07:57:07 +0000 (15:57 +0800)]
vhost: add log when setting vring base

This patch adds log for vring related info in handling of vhost message
VHOST_USER_SET_VRING_BASE, which will be useful in live migration case.

Signed-off-by: Andy Pei <andy.pei@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2 years agonet/virtio: fix uninitialized RSS key
Yunjian Wang [Sat, 8 Jan 2022 08:14:21 +0000 (16:14 +0800)]
net/virtio: fix uninitialized RSS key

This patch fixes an issue that uninitialized old_rss_key
is used for restoring the rss_key.

Coverity issue: 373866
Fixes: 0c9d66207054 ("net/virtio: support RSS")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2 years agonet/virtio-user: check FD flags getting failure
Yunjian Wang [Sat, 8 Jan 2022 07:52:31 +0000 (15:52 +0800)]
net/virtio-user: check FD flags getting failure

The function fcntl() could return errors,
the return value need to be checked.

Fixes: 6a84c37e3975 ("net/virtio-user: add vhost-user adapter layer")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2 years agonet/virtio-user: fix resource leak on probing failure
Harold Huang [Thu, 23 Dec 2021 04:42:37 +0000 (12:42 +0800)]
net/virtio-user: fix resource leak on probing failure

When eth_virtio_dev_init is failed, the registered virtio user memory
event cb is not released and the backend created tap device is not
destroyed.  It would cause some residual tap device existed in the host
and creating a new vdev could be failed because the new virtio_user_dev
could use the same address pointer and register memory event cb to the
same address is not allowed.

Fixes: ca8326a94365 ("net/virtio_user: fix error management during init")
Cc: stable@dpdk.org
Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2 years agovdpa/ifc: fix log info mismatch
Andy Pei [Mon, 13 Dec 2021 07:00:40 +0000 (15:00 +0800)]
vdpa/ifc: fix log info mismatch

Fix log info mismatch.

Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver")
Cc: stable@dpdk.org
Signed-off-by: Andy Pei <andy.pei@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2 years agonet/virtio: fix Tx queue 0 overriden by queue 128
Xueming Li [Thu, 2 Dec 2021 13:50:45 +0000 (21:50 +0800)]
net/virtio: fix Tx queue 0 overriden by queue 128

Both Rx queue and Tx queue are VirtQ in virtio, VQ index is 256 for Tx
queue 128. Uint8 type of TxQ VQ index overflows and overrides Tx queue 0
data.

This patch fixes VQ index type with uint16 type.

Fixes: c1f86306a026 ("virtio: add new driver")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2 years agovdpa/mlx5: workaround queue stop with traffic
Matan Azrad [Mon, 22 Nov 2021 13:12:35 +0000 (15:12 +0200)]
vdpa/mlx5: workaround queue stop with traffic

When the event thread polls traffic and a virtq is stopping, the FW loses
synchronization in the virtq indexes.

It causes LM failure on synchronization between the HOST indexes to
the GUEST indexes.

Unset the event thread before the queue stop in the LM process.

Fixes: 31b9c29c86af ("vdpa/mlx5: support close and config operations")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2 years agonet/axgbe: alter port speed bit range
Selwin Sebastian [Tue, 25 Jan 2022 12:17:47 +0000 (17:47 +0530)]
net/axgbe: alter port speed bit range

Newer generation Hardware uses the slightly different
port speed bit widths, so alter the existing port speed
bit range to extend support to the newer generation hardware
while maintaining the backward compatibility with older
generation hardware.

The previously reserved bits are now being used which
then requires the adjustment to the BIT values, e.g.:

Before:
   PORT_PROPERTY_0[22:21] - Reserved
   PORT_PROPERTY_0[26:23] - Supported Speeds

After:
   PORT_PROPERTY_0[21] - Reserved
   PORT_PROPERTY_0[26:22] - Supported Speeds

To make this backwards compatible, the existing BIT
definitions for the port speeds are incremented by one
to maintain the original position.

Signed-off-by: Selwin Sebastian <selwin.sebastian@amd.com>
Acked-by: Chandubabu Namburu <chandu@amd.com>
2 years agonet/axgbe: support no-autoneg port mode
Selwin Sebastian [Tue, 25 Jan 2022 12:17:46 +0000 (17:47 +0530)]
net/axgbe: support no-autoneg port mode

Add support for a new port mode that is a backplane
connection without support for auto negotiation.

Signed-off-by: Selwin Sebastian <selwin.sebastian@amd.com>
Acked-by: Chandubabu Namburu <chandu@amd.com>
2 years agonet/axgbe: reset PHY Rx when mailbox command timeout
Selwin Sebastian [Tue, 25 Jan 2022 12:17:45 +0000 (17:47 +0530)]
net/axgbe: reset PHY Rx when mailbox command timeout

Sometimes mailbox commands timeout when the RX data path becomes
unresponsive. This prevents the submission of new mailbox commands
to DXIO. This patch identifies the timeout and resets the RX data
path so that the next message can be submitted properly.

Signed-off-by: Selwin Sebastian <selwin.sebastian@amd.com>
Acked-by: Chandubabu Namburu <chandu@amd.com>
2 years agonet/axgbe: simplify rate change mailbox interface
Selwin Sebastian [Tue, 25 Jan 2022 12:17:44 +0000 (17:47 +0530)]
net/axgbe: simplify rate change mailbox interface

Simplify and centralize the mailbox command rate change interface by
having a single function perform the writes to the mailbox registers
to issue the request.

Signed-off-by: Selwin Sebastian <selwin.sebastian@amd.com>
Acked-by: Chandubabu Namburu <chandu@amd.com>
2 years agonet/axgbe: toggle PLL settings during rate change
Selwin Sebastian [Tue, 25 Jan 2022 12:17:43 +0000 (17:47 +0530)]
net/axgbe: toggle PLL settings during rate change

For each rate change command submission, the FW has to do a phy
power off sequence internally. For this to happen correctly, the
PLL re-initialization control setting has to be turned off before
sending mailbox commands and re-enabled once the command submission
is complete. Without the PLL control setting, the link up takes
longer time in a fixed phy configuration.

Signed-off-by: Selwin Sebastian <selwin.sebastian@amd.com>
Acked-by: Chandubabu Namburu <chandu@amd.com>
2 years agonet/axgbe: attempt always link training in KR mode
Selwin Sebastian [Tue, 25 Jan 2022 12:17:42 +0000 (17:47 +0530)]
net/axgbe: attempt always link training in KR mode

Link training is always attempted when in KR mode, but the code is
structured to check if link training has been enabled before attempting
to perform it. Since that check will always be true, simplify the code
to always enable and start link training during KR auto-negotiation.

Signed-off-by: Selwin Sebastian <selwin.sebastian@amd.com>
Acked-by: Chandubabu Namburu <chandu@amd.com>
2 years agonet/hns3: support indirect counter flow action
Chengwen Feng [Sat, 22 Jan 2022 01:51:42 +0000 (09:51 +0800)]
net/hns3: support indirect counter flow action

This patch support indirect counter action because the shared counter
attribute has been deprecated in DPDK 21.11.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: extract functions to create RSS and FDIR flow rule
Huisong Li [Sat, 22 Jan 2022 01:51:41 +0000 (09:51 +0800)]
net/hns3: extract functions to create RSS and FDIR flow rule

Extract two functions to create the RSS and FDIR flow rule for clearer
code logic.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: rename function
Chengwen Feng [Sat, 22 Jan 2022 01:51:40 +0000 (09:51 +0800)]
net/hns3: rename function

This patch rename hns3_parse_rss_key with hns3_adjust_rss_key to
improve readability.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: remove non re-entrant strerror call
Chengwen Feng [Sat, 22 Jan 2022 01:51:39 +0000 (09:51 +0800)]
net/hns3: remove non re-entrant strerror call

This patch delete strerror invoke which was non re-entrant.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: replace single line functions
Chengwen Feng [Sat, 22 Jan 2022 01:51:38 +0000 (09:51 +0800)]
net/hns3: replace single line functions

This patch removes single functions with actual calls.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: extract common function to obtain revision ID
Huisong Li [Sat, 22 Jan 2022 01:51:37 +0000 (09:51 +0800)]
net/hns3: extract common function to obtain revision ID

The code logic of obtaining the revision ID of PCI device is the same
for PF and VF driver. This patch extracts a common interface to do it.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: remove logging memory addresses
Huisong Li [Sat, 22 Jan 2022 01:51:36 +0000 (09:51 +0800)]
net/hns3: remove logging memory addresses

Remove the printing of memory addresses.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: remove getting number of queue descriptors from FW
Huisong Li [Sat, 22 Jan 2022 01:51:35 +0000 (09:51 +0800)]
net/hns3: remove getting number of queue descriptors from FW

Application can specify the number of Rx/Tx queue descriptors in DPDK.
So driver does not obtain the default value from firmware and PF.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: remove unused variables
Huisong Li [Sat, 22 Jan 2022 01:51:34 +0000 (09:51 +0800)]
net/hns3: remove unused variables

Remove unused variables.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: extract reset failure handling to function
Huisong Li [Sat, 22 Jan 2022 01:51:33 +0000 (09:51 +0800)]
net/hns3: extract reset failure handling to function

Extract a function to handle reset fail for clearer code logic.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: remove unnecessary blank lines
Huisong Li [Sat, 22 Jan 2022 01:51:32 +0000 (09:51 +0800)]
net/hns3: remove unnecessary blank lines

Remove unnecessary blank lines.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: make control plane function non-inline
Jie Hai [Sat, 22 Jan 2022 01:51:31 +0000 (09:51 +0800)]
net/hns3: make control plane function non-inline

This function is a control-plane interface and does
not need to use inline.

Signed-off-by: Jie Hai <haijie1@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: extract common function to initialize MAC address
Huisong Li [Sat, 22 Jan 2022 01:51:30 +0000 (09:51 +0800)]
net/hns3: extract common function to initialize MAC address

The code logic to initialize "data->mac_addrs" for PF and VF is similar.
This patch extracts a common API to initialize it to improve code
maintainability.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: fix using enum as boolean
Huisong Li [Sat, 22 Jan 2022 01:51:29 +0000 (09:51 +0800)]
net/hns3: fix using enum as boolean

The enum type variables cannot be used as bool variables. This patch
fixes for "with->func" in hns3_action_rss_same().

Fixes: eb158fc756a5 ("net/hns3: fix config when creating RSS rule after flush")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: remove unnecessary assignment
Huisong Li [Sat, 22 Jan 2022 01:51:28 +0000 (09:51 +0800)]
net/hns3: remove unnecessary assignment

Remove unnecessary assignment.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/nfp: free HW ring memzone on queue release
Heinrich Kuhn [Wed, 19 Jan 2022 11:48:00 +0000 (13:48 +0200)]
net/nfp: free HW ring memzone on queue release

During Rx/Tx queue setup, memory is reserved for the hardware rings.
This memory zone should subsequently be freed in the queue release
logic. This commit also adds a call to the release logic in the
dev_close() callback so that the ring memzone may be freed during port
close too.

Fixes: b812daadad0d ("nfp: add Rx and Tx")
Cc: stable@dpdk.org
Signed-off-by: Heinrich Kuhn <heinrich.kuhn@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2 years agonet/tap: forbid different Rx/Tx queue number
Nobuhiro Miki [Wed, 19 Jan 2022 07:43:16 +0000 (16:43 +0900)]
net/tap: forbid different Rx/Tx queue number

Users can create the desired number of RxQ and TxQ in DPDK. For
example, if the number of RxQ = 2 and the number of TxQ = 5,
a total of 8 file descriptors will be created for a tap device,
including RxQ, TxQ, and one for keepalive. The RxQ and TxQ
with the same ID are paired by dup(2).

In this scenario, Kernel will have 3 RxQ where packets are
incoming but not read. The reason for this is that there are only
2 RxQ that are polled by DPDK, while there are 5 queues in Kernel.
This patch add a checking if DPDK has appropriate numbers of
queues to avoid unexpected packet drop.

Signed-off-by: Nobuhiro Miki <nmiki@yahoo-corp.jp>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2 years agonet/bonding: fix RSS with early configure
Yu Wenjun [Tue, 18 Jan 2022 09:18:52 +0000 (17:18 +0800)]
net/bonding: fix RSS with early configure

RSS don't work when bond_ethdev_configure called before
rte_eth_bond_slave_add.

This is because internals->rss_key_len is 0 in bond_ethdev_configure().
If internals->rss_key_len is 0, internals->rss_key can not be set
properly.

e.g.:
doesn't work (examples/bond/main.c):
rte_eth_bond_create()
rte_eth_dev_configure()
rte_eth_bond_slave_add()
rte_eth_dev_start()

works (testpmd):
rte_eth_bond_create()
rte_eth_bond_slave_add()
rte_eth_dev_configure()
rte_eth_dev_start()

Fixing by using 'default_rss_key' when 'internals->rss_key_len' is 0.

Fixes: 6b1a001ec546 ("net/bonding: fix RSS key length")
Cc: stable@dpdk.org
Signed-off-by: Yu Wenjun <yuwenjun@cmss.chinamobile.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: fix vector Rx/Tx when PTP enabled
Min Hu (Connor) [Mon, 17 Jan 2022 02:43:02 +0000 (10:43 +0800)]
net/hns3: fix vector Rx/Tx when PTP enabled

If hardware supports IEEE 1588 PTP, PTP capability will be set.
Currently, vec and sve burst is unsupported when PTP capability is set.

For sake of Rx/Tx performance, IEEE 1588 PTP is not supported in sve or
vec burst mode. When enabling IEEE 1588 PTP, Rx/Tx burst mode should be
simple or common. Rx/Tx burst mode could be set like this, for example:
-a 0000:35:00.0,rx_func_hint=common,tx_func_hint=common

This patch supports vec and sve burst when PTP is disabled. And only
support simple or common burst When PTP is enabled.

Fixes: 38b539d96eb6 ("net/hns3: support IEEE 1588 PTP")
Cc: stable@dpdk.org
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/hns3: fix mailbox wait time
Huisong Li [Mon, 17 Jan 2022 02:43:01 +0000 (10:43 +0800)]
net/hns3: fix mailbox wait time

The mailbox wait time can be specified at runtime. But the variable that
controls this time are not initialized when the variable isn't designated
or is specified as an invalid value, which will fail to initialize device
in the case where no device is bound to initialize the device.

Fixes: 2fc3e696a7f1 ("net/hns3: add runtime config for mailbox limit time")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
2 years agonet/hns3: fix Rx/Tx functions update
Min Hu (Connor) [Mon, 17 Jan 2022 02:43:00 +0000 (10:43 +0800)]
net/hns3: fix Rx/Tx functions update

When fast path operation is introduced, the Rx/Tx function is done by
object 'rte_eth_fp_ops'. So 'rte_eth_fp_ops' should be updated if
'fast-path functions' need to be changed, such as PMD receive function,
prepare function and so on.

This patch fixed receiving packets bug when fast path operation is
introduced.

Fixes: bba636698316 ("net/hns3: support Rx/Tx and related operations")
Fixes: 168b7d79dada ("net/hns3: support set link up/down for PF")
Cc: stable@dpdk.org
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
2 years agonet/memif: remove unnecessary Rx interrupt stub
Stephen Hemminger [Fri, 14 Jan 2022 20:46:44 +0000 (12:46 -0800)]
net/memif: remove unnecessary Rx interrupt stub

The code in memif driver to stub out rx_irq_enable is unnecessary
and causes different error returns than other drivers.
The core ethdev code will return -ENOTSUP if the driver has
a null rx_queue_intr_enable callback.

Fixes: 09c7e63a71f9 ("net/memif: introduce memory interface PMD")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2 years agoraw/ifpga/base: fix port feature ID
Wei Huang [Tue, 25 Jan 2022 02:30:47 +0000 (21:30 -0500)]
raw/ifpga/base: fix port feature ID

Fix ID value of port features to match the definition from hardware.

Fixes: 473c88f9b391 ("drivers/raw: remove rawdev from directory names")
Cc: stable@dpdk.org
Signed-off-by: Wei Huang <wei.huang@intel.com>
Acked-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
2 years agonet/bnxt: fix VF resource allocation strategy
Ajit Khaparde [Thu, 20 Jan 2022 09:12:28 +0000 (14:42 +0530)]
net/bnxt: fix VF resource allocation strategy

1. VFs need a notification queue to handle async messages.
But the current logic does not reserve a notification queue leading
to initialization failure in some cases.
2. With the current logic, DPDK PF driver reserves only one VNIC
to the VFs leading to initialization failure with more than 1 RXQs.

Added logic to distribute number of NQs and VNICs from the pool
across VFs and PF.

While reserving resources for the VFs, the strategy is to keep
both min & max values the same. This could result in a failure
when there isn't enough resources to satisfy the request.
Hence fixed to instruct the FW to not reserve all minimum
resources requested for the VF. The VF driver can request the FW
for the allocated resources during probe.

Fixes: b7778e8a1c00 ("net/bnxt: refactor to properly allocate resources for PF/VF")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
2 years agonet/bnxt: fix memzone allocation per VNIC
Kalesh AP [Thu, 20 Jan 2022 09:12:27 +0000 (14:42 +0530)]
net/bnxt: fix memzone allocation per VNIC

In case of Thor RSS table size is too big. This could result in
memory allocation failure when the supported vnic count is high.
Instead of allocating the memzone for all VNICs in one shot,
allocate for each VNIC individually.

Also, fixed to free the memzone in the uninit path.

Fixes: 9738793f28ec ("net/bnxt: add VNIC functions and structs")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2 years agonet/bnxt: handle ring cleanup in case of error
Kalesh AP [Thu, 20 Jan 2022 09:12:26 +0000 (14:42 +0530)]
net/bnxt: handle ring cleanup in case of error

In bnxt_alloc_mem(), after bnxt_alloc_async_ring_struct(),
any of the functions failure causes an error:

bnxt_hwrm_ring_free(): hwrm_ring_free nq failed. rc:1

Fix this by initializing ring->fw_ring_id to INVALID_HW_RING_ID
in bnxt_alloc_async_ring_struct().

Fixes: bd0a14c99f65 ("net/bnxt: use dedicated CPR for async events")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
2 years agonet/bnxt: fix check for autoneg enablement
Kalesh AP [Thu, 20 Jan 2022 09:12:25 +0000 (14:42 +0530)]
net/bnxt: fix check for autoneg enablement

HWRM_PORT_PHY_QCFG_OUTPUT response indicates the autoneg speed mask
supported by the FW. While enabling autoneg, driver should also check
the FW advertised PAM4 speeds supported in auto mode which is set
in the HWRM_PORT_PHY_QCFG_OUTPUT response.

Fixes: c23f9ded0391 ("net/bnxt: support 200G PAM4 link")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
2 years agoraw/ifpga: fix thread closing
Tianfei Zhang [Mon, 24 Jan 2022 03:50:05 +0000 (22:50 -0500)]
raw/ifpga: fix thread closing

When we want to close a thread, we should set a flag to notify
thread handler function.

Fixes: 9c006c45d0c5 ("raw/ifpga: scan PCIe BDF device tree")
Cc: stable@dpdk.org
Signed-off-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
2 years agonet/ice: fix link up when starting device
Yunjian Wang [Tue, 25 Jan 2022 01:39:07 +0000 (09:39 +0800)]
net/ice: fix link up when starting device

Currently, there is a possibility that the link status is not correct
after set link up, the device ID is 159b. It would be fixed by calling
ice_link_update() while the parameter 'wait_to_complete' is true. It's
reasonable to wait for complete right after set link up as it is not
in an link status change interrupt handling scenario.

Fixes: cf911d90e366 ("net/ice: support link update")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2 years agonet/ice: support module EEPROM
Steve Yang [Thu, 20 Jan 2022 02:59:30 +0000 (02:59 +0000)]
net/ice: support module EEPROM

Add new callbacks for eth_dev_ops of ice to get the information
and data of plugin module EEPROM.

Signed-off-by: Steve Yang <stevex.yang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2 years agonet/ice: fix mbuf offload flag for Rx timestamp
Simei Su [Thu, 20 Jan 2022 10:21:52 +0000 (18:21 +0800)]
net/ice: fix mbuf offload flag for Rx timestamp

For received PTP packets, the flag "RTE_MBUF_F_RX_IEEE1588_TMST" has not
been set which leads to received PTP packet not timestamped by hardware
shown in testpmd/ieee1588 fwd.

Fixes: 646dcbe6c701 ("net/ice: support IEEE 1588 PTP")
Cc: stable@dpdk.org
Signed-off-by: Simei Su <simei.su@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2 years agoraw/ifpga/base: fix SPI transaction
Tianfei Zhang [Wed, 19 Jan 2022 01:44:59 +0000 (20:44 -0500)]
raw/ifpga/base: fix SPI transaction

When EOP is detected, 2 more bytes should be received
(may be a SPI_PACKET_ESC before last valid byte) then
rx should be finished.

Fixes: 96ebfcf8125c ("raw/ifpga/base: add SPI and MAX10 device driver")
Cc: stable@dpdk.org
Signed-off-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
2 years agonet/sfc: validate queue span when parsing flow action RSS
Ivan Malov [Mon, 10 Jan 2022 21:48:45 +0000 (00:48 +0300)]
net/sfc: validate queue span when parsing flow action RSS

The current code silently shrinks the value if it exceeds
the supported maximum. Do not do that. Validate the value.

Fixes: d77d07391d4d ("net/sfc: support flow API RSS action")
Cc: stable@dpdk.org
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2 years agoethdev: fix Rx queue telemetry memory leak on failure
Yunjian Wang [Sat, 8 Jan 2022 07:51:57 +0000 (15:51 +0800)]
ethdev: fix Rx queue telemetry memory leak on failure

In eth_dev_handle_port_info() allocated memory for rxq_state,
we should free it when error happens, otherwise it will lead
to memory leak.

Fixes: 58b43c1ddfd1 ("ethdev: add telemetry endpoint for device info")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2 years agocommon/cnxk: update NIX and NPA dump functions
Rahul Bhansali [Mon, 20 Dec 2021 13:26:25 +0000 (18:56 +0530)]
common/cnxk: update NIX and NPA dump functions

Updates nix_dump and npa_dump to use plt_dump function.

Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2 years agocommon/cnxk: fix error checking
Weiguo Li [Sat, 22 Jan 2022 06:49:04 +0000 (14:49 +0800)]
common/cnxk: fix error checking

Fixes: 804c108b039a ("common/cnxk: set BPHY IRQ handler")
Cc: stable@dpdk.org
Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>