dpdk.git
3 years agocommon/cnxk: support NIX MAC operations
Sunil Kumar Kori [Tue, 6 Apr 2021 14:41:13 +0000 (20:11 +0530)]
common/cnxk: support NIX MAC operations

Add support to different MAC related operations such as
MAC address set/get, link set/get, link status callback,
etc.

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: add NIX Tx queue management API
Jerin Jacob [Tue, 6 Apr 2021 14:41:12 +0000 (20:11 +0530)]
common/cnxk: add NIX Tx queue management API

This patch adds support to init/modify/fini NIX
SQ(send queue) for both CN9K and CN10K platforms.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: add NIX Rx queue management API
Jerin Jacob [Tue, 6 Apr 2021 14:41:11 +0000 (20:11 +0530)]
common/cnxk: add NIX Rx queue management API

Add nix Rx queue management API to init/modify/fini
RQ context and also setup CQ(completion queue) context.
Current support is both for CN9K and CN10K devices.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: support NIX IRQ
Jerin Jacob [Tue, 6 Apr 2021 14:41:10 +0000 (20:11 +0530)]
common/cnxk: support NIX IRQ

Add support to register NIX error and completion
queue IRQ's using base device class IRQ helper API's.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Harman Kalra <hkalra@marvell.com>
3 years agocommon/cnxk: support NIX
Jerin Jacob [Tue, 6 Apr 2021 14:41:09 +0000 (20:11 +0530)]
common/cnxk: support NIX

Add base nix support as ROC(Rest of Chip) API which will
be used by generic ETHDEV PMD(net/cnxk).

This patch adds support to device init, fini, resource
alloc and free API which sets up a ETHDEV PCI device of either
CN9K or CN10K Marvell SoC.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Satha Rao <skoteshwar@marvell.com>
3 years agocommon/cnxk: support NPA batch alloc/free
Ashwin Sekhar T K [Tue, 6 Apr 2021 14:41:08 +0000 (20:11 +0530)]
common/cnxk: support NPA batch alloc/free

Add APIs to do allocations/frees in batch from
NPA pool.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: support NPA performance counter
Ashwin Sekhar T K [Tue, 6 Apr 2021 14:41:07 +0000 (20:11 +0530)]
common/cnxk: support NPA performance counter

Add APIs to read NPA performance counters.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: support NPA bulk alloc/free
Ashwin Sekhar T K [Tue, 6 Apr 2021 14:41:06 +0000 (20:11 +0530)]
common/cnxk: support NPA bulk alloc/free

Add APIs to alloc/free in bulk from NPA pool.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: add NPA pool HW operations
Ashwin Sekhar T K [Tue, 6 Apr 2021 14:41:05 +0000 (20:11 +0530)]
common/cnxk: add NPA pool HW operations

Add APIs for creating, destroying, modifying
NPA pools.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: support NPA debug
Ashwin Sekhar T K [Tue, 6 Apr 2021 14:41:04 +0000 (20:11 +0530)]
common/cnxk: support NPA debug

Add NPA debug APIs.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: support NPA IRQ
Ashwin Sekhar T K [Tue, 6 Apr 2021 14:41:03 +0000 (20:11 +0530)]
common/cnxk: support NPA IRQ

Add support for NPA IRQs.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: support NPA device
Ashwin Sekhar T K [Tue, 6 Apr 2021 14:41:02 +0000 (20:11 +0530)]
common/cnxk: support NPA device

Add base NPA device support. NPA i.e Network Pool Allocator is
HW block that provides HW mempool functionality on Marvell CN9K
and CN10K SoC's. NPA by providing HW mempool support, also
facilitates Rx and Tx packet alloc and packet free by HW without
SW intervention.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: add VF support to base device class
Jerin Jacob [Tue, 6 Apr 2021 14:41:01 +0000 (20:11 +0530)]
common/cnxk: add VF support to base device class

Add VF specific handling such as BAR4 setup, forwarding
VF mbox messages to AF and vice-versa, VF FLR handling
etc.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: add base device class
Jerin Jacob [Tue, 6 Apr 2021 14:41:00 +0000 (20:11 +0530)]
common/cnxk: add base device class

Introduce 'dev' class to hold cnxk PCIe device specific
information and operations.

All PCIe drivers(ethdev, mempool, cryptodev and eventdev) of cnxk
inherits this base object to avail the common functionalities such
as mailbox creation, interrupt registration, LMT setup, VF message
mbox forwarding, etc.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: add mailbox base infrastructure
Jerin Jacob [Tue, 6 Apr 2021 14:40:59 +0000 (20:10 +0530)]
common/cnxk: add mailbox base infrastructure

This patch adds mailbox infra API's to communicate with Kernel AF
driver. These API's will be used by all the other cnxk drivers
for mbox init/fini, send/recv functionality.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: add mbox request and response definitions
Jerin Jacob [Tue, 6 Apr 2021 14:40:58 +0000 (20:10 +0530)]
common/cnxk: add mbox request and response definitions

The admin function driver sits in Linux kernel as mailbox
server. The DPDK AF mailbox client, send the message to mailbox
server to complete the administrative task such as get mac
address.

This patch adds mailbox request and response definition of
existing mailbox defined between AF driver and DPDK driver.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
3 years agocommon/cnxk: add interrupt helper API
Jerin Jacob [Tue, 6 Apr 2021 14:40:57 +0000 (20:10 +0530)]
common/cnxk: add interrupt helper API

Add interrupt helper API's in common code to register and
unregister for specific interrupt vectors. These API's
will be used by all cnxk drivers.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: add roc plt init callback support
Ashwin Sekhar T K [Tue, 6 Apr 2021 14:40:56 +0000 (20:10 +0530)]
common/cnxk: add roc plt init callback support

Add support for registering callbacks for roc plt init.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: add model init and IO handling API
Jerin Jacob [Tue, 6 Apr 2021 14:40:55 +0000 (20:10 +0530)]
common/cnxk: add model init and IO handling API

Add routines for SoC model identification and HW IO handling
routines specific to CN9K and CN10K Marvell SoC's.
These are based on arm64 ISA and behaviour specific to
Marvell SoC's.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
3 years agocommon/cnxk: add build infrastructre and HW definition
Jerin Jacob [Tue, 6 Apr 2021 14:40:54 +0000 (20:10 +0530)]
common/cnxk: add build infrastructre and HW definition

Add meson build infrastructure along with HW definition
header file.

This patch also adds cross-compile configs for arm
for CN9K series and CN10K series of Marvell SoC's.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
3 years agodoc: add Marvell cnxk platform guide
Nithin Dabilpuram [Tue, 6 Apr 2021 14:40:53 +0000 (20:10 +0530)]
doc: add Marvell cnxk platform guide

Platform specific guide for Marvell OCTEON CN9K/CN10K SoC is added.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
3 years agobuild: alias default build as generic
Juraj Linkeš [Tue, 30 Mar 2021 06:40:19 +0000 (08:40 +0200)]
build: alias default build as generic

The current machine='default' build name is not descriptive. The actual
default build is machine='native'. Add an alternative string which does
the same build and better describes what we're building:
machine='generic'. Leave machine='default' for backwards compatibility.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agobuild: limit symbol checks to developer mode
Bruce Richardson [Thu, 25 Feb 2021 15:29:03 +0000 (15:29 +0000)]
build: limit symbol checks to developer mode

The checking of symbols within each library and driver is only of
interest to developers, so limit to developer mode only.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agobuild: hide debug messages in non-developer mode
Bruce Richardson [Thu, 25 Feb 2021 15:29:02 +0000 (15:29 +0000)]
build: hide debug messages in non-developer mode

The messages about what components have what dependency names, and
information about function versioning not being supported on windows are
only of interest to developers, so hide them when building in
non-developer mode.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agobuild: enable a developer mode setting
Bruce Richardson [Thu, 25 Feb 2021 15:29:01 +0000 (15:29 +0000)]
build: enable a developer mode setting

To allow support for additional build checks and tests only really
relevant for developers, we add support for a developer mode option to
DPDK. The default, "auto", value for this enables developer mode if a
".git" folder is found at the root of the source tree - as was the case
with the previous "make" build system. There is also support for
explicitly enabling or disabling this option using "meson configure" if
so desired.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agoeal: fix hang in control thread creation
Luc Pelletier [Wed, 7 Apr 2021 20:16:06 +0000 (16:16 -0400)]
eal: fix hang in control thread creation

The affinity of a control thread is set after it has been launched. If
setting the affinity fails, pthread_cancel is called followed by a call
to pthread_join, which can hang forever if the thread's start routine
doesn't call a pthread cancellation point.

This patch modifies the logic so that the control thread exits
gracefully if the affinity cannot be set successfully and removes the
call to pthread_cancel.

Fixes: 6383d2642b62 ("eal: set name when creating a control thread")
Cc: stable@dpdk.org
Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
3 years agoeal: fix race in control thread creation
Luc Pelletier [Wed, 7 Apr 2021 20:16:04 +0000 (16:16 -0400)]
eal: fix race in control thread creation

The creation of control threads uses a pthread barrier for
synchronization. This patch fixes a race condition where the pthread
barrier could get destroyed while one of the threads has not yet
returned from the pthread_barrier_wait function, which could result in
undefined behaviour.

Fixes: 3a0d465d4c53 ("eal: fix use-after-free on control thread creation")
Cc: stable@dpdk.org
Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
3 years agovfio: reformat logs
Thomas Monjalon [Mon, 8 Mar 2021 22:24:10 +0000 (23:24 +0100)]
vfio: reformat logs

The log messages had various issues:
- split on 2 lines, making search (grep) difficult
- long lines (can be split after the string)
- indented for no good reason (parent message may have higher log level)
- inconsistent use of __func__, not meaningful context for user
- lack of context (general message not mentioning VFIO)
- log level too high (more below)

Message having its level decreased from WARNING to NOTICE:
"not managed by VFIO driver, skipping"
Message having its level decreased from INFO to DEBUG:
"Probing VFIO support..."

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
3 years agoapp/testpmd: fix usage text
Thomas Monjalon [Mon, 5 Apr 2021 19:33:25 +0000 (21:33 +0200)]
app/testpmd: fix usage text

The options help text was including an incomplete and redundant
summary of the options before explaining each. The summary is dropped.

The details of the option --hairpin-mode had an extra space,
breaking the alignment with the next line.

There were some mismatches between options in the usage text
sed -rn 's/.*\(" *--([a-z-]*)[=: ].*/\1/p' app/test-pmd/parameters.c
and the options declared in lgopts array
sed -rn 's/.*\{.*"(.*)",.*,.*,.*},.*/\1/p' app/test-pmd/parameters.c
The misses were:
--no-numa
--enable-scatter
--tx-ip
--tx-udp
--noisy-lkup-num-reads-writes
The option --ports was not implemented.

Fixes: 01817b10d27c ("app/testpmd: change hairpin queues setup")
Fixes: 3c156061b938 ("app/testpmd: add noisy neighbour forwarding mode")
Fixes: bf5b2126bf44 ("app/testpmd: add ability to set Tx IP and UDP parameters")
Fixes: 0499793854f5 ("app/testpmd: add scatter enabling option")
Fixes: 999b2ee0fe45 ("app/testpmd: enable NUMA support by default")
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Bing Zhao <bingz@nvidia.com>
Acked-by: David Marchand <david.marchand@redhat.com>
3 years agoapp/regex: fix usage text
Thomas Monjalon [Mon, 5 Apr 2021 19:33:24 +0000 (21:33 +0200)]
app/regex: fix usage text

The usage syntax help includes the program name which was fake.
It is replaced with the real name from argv.

Fixes: de06137cb295 ("app/regex: add RegEx test application")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: David Marchand <david.marchand@redhat.com>
3 years agoapp: fix exit messages
Thomas Monjalon [Mon, 5 Apr 2021 19:33:23 +0000 (21:33 +0200)]
app: fix exit messages

Some applications were printing useless messages with rte_exit()
after showing the help. Using exit() is enough in this case.

Some applications were using a redundant printf or fprintf() before
calling rte_exit(). The messages are unified in a single rte_exit().

Some rte_exit() calls were missing a line feed or returning a wrong code.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Wisam Jaddo <wisamm@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: David Marchand <david.marchand@redhat.com>
3 years agoeal: fix evaluation of log level option
David Marchand [Fri, 9 Apr 2021 11:04:53 +0000 (13:04 +0200)]
eal: fix evaluation of log level option

--log-level option is handled early, no need to reevaluate it later in
EAL init.

Before:
$ echo quit | ./build/app/test/dpdk-test --no-huge -m 512 \
  --log-level=lib.eal:debug \
  --log-level=lib.ethdev:debug --log-level=lib.ethdev:info \
  |& grep -i log.level

EAL: lib.eal log level changed from info to debug
EAL: lib.ethdev log level changed from info to debug
EAL: lib.ethdev log level changed from debug to info
EAL: lib.ethdev log level changed from info to debug
EAL: lib.ethdev log level changed from debug to info
EAL: lib.telemetry log level changed from disabled to warning

After:
$ echo quit | ./build/app/test/dpdk-test --no-huge -m 512 \
  --log-level=lib.eal:debug \
  --log-level=lib.ethdev:debug --log-level=lib.ethdev:info \
  |& grep -i log.level

EAL: lib.eal log level changed from info to debug
EAL: lib.ethdev log level changed from info to debug
EAL: lib.ethdev log level changed from debug to info
EAL: lib.telemetry log level changed from disabled to warning

Fixes: 6c7216eefd63 ("eal: fix log level of early messages")
Fixes: 1c806ae5c3ac ("eal/windows: support command line options parsing")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Tested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
3 years agolog: track log level changes
David Marchand [Fri, 9 Apr 2021 11:04:52 +0000 (13:04 +0200)]
log: track log level changes

Add a log message when registering log types and changing log levels.

__rte_log_register previous handled both legacy and dynamic logtypes.
To simplify the code, __rte_log_register is reworked to only handle
dynamic logtypes and takes a log level.

Example:
$ DPDK_TEST=logs_autotest ./build/app/test/dpdk-test --no-huge -m 512 \
  --log-level=lib.eal:debug
...
RTE>>logs_autotest
== dynamic log types
EAL: logtype1 log level changed from disabled to info
EAL: logtype2 log level changed from disabled to info
EAL: logtype1 log level changed from info to error
EAL: logtype3 log level changed from error to emergency
EAL: logtype2 log level changed from info to emergency
EAL: logtype3 log level changed from emergency to debug
EAL: logtype1 log level changed from error to debug
EAL: logtype2 log level changed from emergency to debug
error message
critical message
critical message
error message
== static log types
TESTAPP1: error message
TESTAPP1: critical message
TESTAPP2: critical message
TESTAPP1: error message
Test OK

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
3 years agotest/log: check levels
David Marchand [Fri, 9 Apr 2021 11:04:51 +0000 (13:04 +0200)]
test/log: check levels

Add checks on log levels:
- default values for rte_log_register and RTE_LOG_REGISTER,
- level changes with rte_log_set_level and consorts

Signed-off-by: David Marchand <david.marchand@redhat.com>
3 years agolog: add option argument help
Thomas Monjalon [Thu, 8 Apr 2021 16:47:13 +0000 (18:47 +0200)]
log: add option argument help

The option --log-level was not completely described in the usage text,
and it was difficult to guess the names of the log types and levels.

A new value "help" is accepted after --log-level to give more details
about the syntax and listing the log types and levels.

The array "levels" used for level name parsing is replaced with
a (modified) existing function which was used in rte_log_dump().

The new function rte_log_list_types() is exported in the API
for allowing an application to give this info to the user
if not exposing the EAL option --log-level.
The list of log types cannot include all drivers if not linked in the
application (shared object plugin case).

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: David Marchand <david.marchand@redhat.com>
3 years agolog: catch invalid level option number
Thomas Monjalon [Thu, 8 Apr 2021 16:47:12 +0000 (18:47 +0200)]
log: catch invalid level option number

The parsing check for invalid log level was not trying to catch
irrelevant numeric values.
A log level 0 becomes a failure in parsing so it can be caught early.
A log level higher than the max (8) is accepted with a warning message.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: David Marchand <david.marchand@redhat.com>
3 years agolog: introduce macro for maximum level
Thomas Monjalon [Thu, 8 Apr 2021 16:47:11 +0000 (18:47 +0200)]
log: introduce macro for maximum level

RTE_DIM(...) and RTE_LOG_DEBUG were used to get the highest log level.
For better clarity a new constant RTE_LOG_MAX is introduced
and mapped to RTE_LOG_DEBUG.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: David Marchand <david.marchand@redhat.com>
3 years agolog: move private functions
Thomas Monjalon [Thu, 8 Apr 2021 16:47:10 +0000 (18:47 +0200)]
log: move private functions

Some private log functions had a wrong "rte_" prefix.

All private log functions are moved from eal_private.h
to the new file eal_log.h:
rte_eal_log_init -> eal_log_init
rte_log_save_regexp -> eal_log_save_regexp
rte_log_save_pattern -> eal_log_save_pattern
eal_log_set_default

The static functions in the file eal_common_log.c are renamed:
rte_log_save_level -> log_save_level
rte_log_lookup -> log_lookup
rte_log_init -> log_init
__rte_log_register -> log_register

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: David Marchand <david.marchand@redhat.com>
3 years agotest: proceed if timer subsystem already initialized
Stanislaw Kardach [Fri, 26 Mar 2021 10:47:59 +0000 (11:47 +0100)]
test: proceed if timer subsystem already initialized

rte_timer_subsystem_init() may return -EALREADY if the timer subsystem
was already initialized. This can happen i.e. in PMD code (see
eth_ena_dev_init). This is not an error, rather a notification as the
initialization function simply returns without any action taken.

Fixes: 50247fe03fe0 ("test/timer: exercise new APIs in secondary process")
Cc: stable@dpdk.org
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
3 years agotimer: clarify error if subsystem already initialized
Stanislaw Kardach [Fri, 26 Mar 2021 10:47:58 +0000 (11:47 +0100)]
timer: clarify error if subsystem already initialized

rte_timer_subsystem_init() may return -EALREADY if it has been already
initialized. Therefore put explicitly into doxygen that this is not a
failure for the application.

Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
3 years agoapp/test-regex: add scattered mbuf input
Suanming Mou [Wed, 7 Apr 2021 07:21:40 +0000 (10:21 +0300)]
app/test-regex: add scattered mbuf input

This commits adds the scattered mbuf input support.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
3 years agoregex/mlx5: add data path scattered mbuf process
Suanming Mou [Wed, 7 Apr 2021 07:21:39 +0000 (10:21 +0300)]
regex/mlx5: add data path scattered mbuf process

UMR (User-Mode Memory Registration) WQE can present data buffers
scattered within multiple mbufs with single indirect mkey. Take
advantage of the UMR WQE, scattered mbuf in one operation can be
presented to an indirect mkey. The RegEx which only accepts one
mkey can now process the whole scattered mbuf in one operation.

The maximum scattered mbuf can be supported in one UMR WQE is now
defined as 64. The mbufs from multiple operations can be combined
into one UMR WQE as well if there is enough space in the KLM array,
since the operations can address their own mbuf's content by the
mkey's address and length. However, one operation's scattered mbuf's
can't be placed in two different UMR WQE's KLM array, if the UMR
WQE's KLM does not has enough free space for one operation, the
extra UMR WQE will be engaged.

In case the UMR WQE's indirect mkey will be over wrapped by the SQ's
WQE move, the mkey's index used by the UMR WQE should be the index
of last the RegEX WQE in the operations. As one operation consumes
one WQE set, build the RegEx WQE by reverse helps address the mkey
more efficiently. Once the operations in one burst consumes multiple
mkeys, when the mkey KLM array is full, the reverse WQE set index
will always be the last of the new mkey's for the new UMR WQE.

In GGA mode, the SQ WQE's memory layout becomes UMR/NOP and RegEx
WQE by interleave. The UMR and RegEx WQE can be called as WQE set.
The SQ's pi and ci will also be increased as WQE set not as WQE.

For operations don't have scattered mbuf, uses the mbuf's mkey directly,
the WQE set combination is NOP + RegEx.
For operations have scattered mbuf but share the UMR WQE with others,
the WQE set combination is NOP + RegEx.
For operations complete the UMR WQE, the WQE set combination is UMR +
RegEx.

Signed-off-by: John Hurley <jhurley@nvidia.com>
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
3 years agocommon/mlx5: add user memory registration bits
Suanming Mou [Wed, 7 Apr 2021 07:21:38 +0000 (10:21 +0300)]
common/mlx5: add user memory registration bits

This commit adds the UMR capability bits.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
3 years agodrivers: align log names
Thomas Monjalon [Tue, 6 Apr 2021 13:22:04 +0000 (15:22 +0200)]
drivers: align log names

The log levels are configured by using the name of the logs.
Some drivers are aligned to follow a common log name standard:
pmd.class.driver[.sub]
Some "common" drivers skip the "class" part:
pmd.driver.sub

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
3 years agodrivers: fix log level after loading
Thomas Monjalon [Tue, 6 Apr 2021 13:22:03 +0000 (15:22 +0200)]
drivers: fix log level after loading

When compiled as a shared object, and loaded at runtime as a plugin,
the drivers should get the log level set earlier at EAL init
by the user through --log-level option.

The function for applying the log level setting is
rte_log_register_type_and_pick_level().
It is called by most drivers via RTE_LOG_REGISTER().

The drivers common/mlx5, bcmfs and e1000 were missing,
so the user-specified log level was not applied when
those drivers were loaded as plugins.
The macro RTE_LOG_REGISTER() is used for those drivers.

The unnecessary protection for double registration
is removed from e1000.

Fixes: 9c99878aa1b1 ("log: introduce logtype register macro")
Fixes: c8e79da7c676 ("crypto/bcmfs: introduce BCMFS driver")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agoeal: fix telemetry log type on registration failure
David Marchand [Tue, 6 Apr 2021 09:25:45 +0000 (11:25 +0200)]
eal: fix telemetry log type on registration failure

rte_log_register_type_and_pick_level() returns an int.
Casting to a uin32_t will make us miss the -1 passed in case of failure.
Fallback to EAL log type like RTE_LOG_REGISTER.

Fixes: 37b881a96194 ("telemetry: use log function from pointer")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agolog: choose EAL log type on registration failure
Thomas Monjalon [Tue, 6 Apr 2021 13:22:02 +0000 (15:22 +0200)]
log: choose EAL log type on registration failure

In the unlikely case where something goes wrong
while registering a log type,
the fallback is to use the EAL log type.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
3 years agobuild: remove Windows export symbol list
David Marchand [Tue, 6 Apr 2021 17:59:10 +0000 (19:59 +0200)]
build: remove Windows export symbol list

Rather than have two files that keeps getting out of sync, let's
annotate the version.map to generate the Windows export file.

Some mlx5 symbols (haswell_broadwell_cpu, mlx5_glue, mlx5_os_*) were
only exported for Windows.
All of them are available and used by Linux too, so this patch adds
them in version.map.

Note: Existing version.map annotation achieved with:
$ for dir in lib/librte_eal drivers/common/mlx5; do
    ./buildtools/map-list-symbol.sh $dir/*.map |
    while read file version sym; do
      ! git grep -qw $sym $dir/*.def || continue;
      sed -i -e "s/$sym;/$sym; # WINDOWS_NO_EXPORT/" $dir/*.map;
    done;
  done

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
3 years agoservice: clean references to removed symbol
David Marchand [Wed, 7 Apr 2021 09:06:56 +0000 (11:06 +0200)]
service: clean references to removed symbol

rte_service_get_id() was removed in v17.11 but the API description
still referenced it and a version node was still present in EAL map.

Fixes: 8edc9aaaf217 ("service: use id in get by name function")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
3 years agoeal: add synchronous interrupt unregister
Renata Saiakhova [Tue, 6 Apr 2021 14:46:14 +0000 (16:46 +0200)]
eal: add synchronous interrupt unregister

Avoid race with unregister interrupt handler if interrupt
source has some active callbacks at the moment, use wrapper
around rte_intr_callback_unregister() to check for -EAGAIN
return value and to loop until rte_intr_callback_unregister()
succeeds.

Signed-off-by: Renata Saiakhova <renata.saiakhova@ekinops.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Harman Kalra <hkalra@marvell.com>
3 years agomem: fix freeing segments in --huge-unlink mode
Roy Shterman [Mon, 22 Feb 2021 10:41:31 +0000 (12:41 +0200)]
mem: fix freeing segments in --huge-unlink mode

When using huge_unlink we unlink the segment right
after allocation. Although we unlink the file we keep
the fd in fd_list so file still exist just the path deleted.
When freeing the hugepage we need to close the fd and assign
it with (-1) in fd_list for the page to be released.

The current flow fails rte_malloc in the following flow when working
with --huge-unlink option:
1. alloc_seg() for segment A -
    We allocate a segment, unlink the path to the segment
    and keep the file descriptor in fd_list.
2. free_seg() for segment A -
    We clear the segment metadata and return - without closing fd
    or assigning (-1) in fd list.
3. alloc_seg() for segment A again -
    We find segment A as available, try to allocate it,
    find the old fd in fd_list try to unlink it
    as part of alloc_seg() but failed because path doesn't exist.

The impact of such error is falsely failing rte_malloc()
although we have hugepages available.

Fixes: d435aad37da7 ("mem: support --huge-unlink mode")
Cc: stable@dpdk.org
Signed-off-by: Roy Shterman <roy.shterman@vastdata.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
3 years agomaintainers: update for OPDL
Liang Ma [Sat, 3 Apr 2021 10:35:11 +0000 (11:35 +0100)]
maintainers: update for OPDL

I would like to change my email to personal email address.

Signed-off-by: Liang Ma <liangma@liangbit.com>
3 years agobus/pci: rename probe/remove operation types
Thomas Monjalon [Mon, 5 Apr 2021 09:15:05 +0000 (11:15 +0200)]
bus/pci: rename probe/remove operation types

The names of the prototypes pci_probe_t and pci_remove_t
are missing a prefix rte_.
These function types are simply renamed.

No compatibility break is expected for the applications
because it is considered as an internal name in the driver interface.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
3 years agopci: rename catch-all ID
Thomas Monjalon [Mon, 5 Apr 2021 09:15:04 +0000 (11:15 +0200)]
pci: rename catch-all ID

The name of the constant PCI_ANY_ID was missing RTE_ prefix.
It is renamed, and the old name becomes a deprecated alias.

While renaming, the duplicate definitions in rte_bus_pci.h
are removed to keep only those in rte_pci.h.
Note: rte_pci.h is included in rte_bus_pci.h

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
3 years agopower: do not skip saving original P-state governor
Anatoly Burakov [Fri, 2 Apr 2021 09:26:45 +0000 (09:26 +0000)]
power: do not skip saving original P-state governor

Currently, when we set the pstate governor to "performance", we check if
it is already set to this value, and if it is, we skip setting it.

However, we never save this value anywhere, so that next time we come
back and request the governor to be set to its original value, the
original value is empty.

Fix it by saving the original pstate governor first. While we're at it,
replace `strlcpy` with `rte_strscpy`.

Fixes: e6c6dc0f96c8 ("power: add p-state driver compatibility")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
3 years agopower: fix P-state base frequency handling
Anatoly Burakov [Fri, 2 Apr 2021 09:26:44 +0000 (09:26 +0000)]
power: fix P-state base frequency handling

Previous fix for base frequency handling in pstate mode introduced a
couple of issues:

- When base_frequency file does not exist, it simply bails out because
  of what appears to be accidental addition of FOPEN_OR_ERR_RET. This is
  incorrect, as absence of this file is not fatal and is in fact
  expected on kernel versions earlier than 5.3
- When base_frequency file does exist, it gets opened, but never gets
  closed, resulting in a resource leak

Both issues also manifest themselves as Coverity defects (dead code, and
a resource leak), so this fix addresses both.

Coverity issue: 369693, 369694
Bugzilla ID: 668
Fixes: 4db9587bbf72 ("power: check sysfs base frequency")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
3 years agodoc: fix sphinx rtd theme import in GHA
David Marchand [Thu, 1 Apr 2021 19:58:42 +0000 (21:58 +0200)]
doc: fix sphinx rtd theme import in GHA

If the rtd theme is available, passing it by name is enough to select
it. Sphinx itself recognises the "sphinx_rtd_theme" name as a special
case and tries to find its path automatically.

On the other hand, passing a html_theme_path makes sphinx parse all
themes availables in this path, which in some environment (like GHA) is
/usr/share and makes sphinx error on the first zipfile it finds (in GHA,
some Azure CLI thingy) that has no sphinx theme in it.

Fixes: 46562be65094 ("doc: import sphinx rtd theme when available")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
3 years agovdpa/mlx5: fix virtq cleaning
Matan Azrad [Mon, 1 Mar 2021 10:41:31 +0000 (10:41 +0000)]
vdpa/mlx5: fix virtq cleaning

The HW virtq object can be destroyed either when the device is closed or
when the state of the virtq becomes disabled.

Some parameters of the virtq should continue to be managed when the
virtq state is changed but all of them must be initialized when the
device is closed.

Wrongly, the enable parameter stayed on when the device is closed what
might cause creation of invalid virtq in the next time a device is
assigned to the driver.

Clean all the virtqs memory when the device is closed.

Fixes: c47d6e83334e ("vdpa/mlx5: support queue update")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agonet/virtio: remove duplicated port ID from virtio-user
David Marchand [Mon, 1 Feb 2021 17:46:01 +0000 (18:46 +0100)]
net/virtio: remove duplicated port ID from virtio-user

The private virtio_user_dev structure embeds a virtio_hw which itself
contains the ethdev port_id.
Make use of it and remove the duplicate port_id field.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agoexamples/vhost_crypto: remove unused short option
Ibtisam Tariq [Thu, 4 Feb 2021 08:05:42 +0000 (08:05 +0000)]
examples/vhost_crypto: remove unused short option

Short option "s" was passed to getopt_long function, while there was
no condition on this option.

Fixes: f5188211c721 ("examples/vhost_crypto: add sample application")
Cc: stable@dpdk.org
Signed-off-by: Ibtisam Tariq <ibtisam.tariq@emumba.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovhost: fix batch dequeue potential buffer overflow
Marvin Liu [Wed, 31 Mar 2021 06:49:39 +0000 (14:49 +0800)]
vhost: fix batch dequeue potential buffer overflow

Similar as single dequeue, the multiple accesses of descriptor length
will lead to potential risk. One-time access of descriptor length can
eliminate this risk.

Fixes: 75ed51697820 ("vhost: add packed ring batch dequeue")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovhost: fix packed ring potential buffer overflow
Marvin Liu [Wed, 31 Mar 2021 06:49:38 +0000 (14:49 +0800)]
vhost: fix packed ring potential buffer overflow

Similar as split ring, the multiple accesses of descriptor length will
lead to potential risk. One-time access of descriptor length can
eliminate this risk.

Fixes: 2f3225a7d69b ("vhost: add vector filling support for packed ring")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovhost: fix split ring potential buffer overflow
Marvin Liu [Wed, 31 Mar 2021 06:49:37 +0000 (14:49 +0800)]
vhost: fix split ring potential buffer overflow

In vhost datapath, descriptor's length are mostly used in two coherent
operations. First step is used for address translation, second step is
used for memory transaction from guest to host. But the interval between
two steps will give a window for malicious guest, in which can change
descriptor length after vhost calculated buffer size. Thus may lead to
buffer overflow in vhost side. This potential risk can be eliminated by
accessing the descriptor length once.

Fixes: 1be4ebb1c464 ("vhost: support indirect descriptor in mergeable Rx")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agoexamples/vhost: check memory table query
Chenbo Xia [Fri, 19 Feb 2021 02:40:11 +0000 (10:40 +0800)]
examples/vhost: check memory table query

This patch fixes unchecked return value for rte_vhost_get_mem_table(),
which is reported by coverity.

Coverity issue: 364233
Fixes: ca059fa5e290 ("examples/vhost: demonstrate the new generic APIs")
Cc: stable@dpdk.org
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovdpa/ifc: check PCI config read
Xiao Wang [Tue, 9 Mar 2021 08:43:15 +0000 (16:43 +0800)]
vdpa/ifc: check PCI config read

The return value of rte_pci_read_config should be checked.

Coverity issue: 302860
Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver")
Cc: stable@dpdk.org
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
3 years agoexamples/vhost_blk: check features before inflight API
Keiichi Watanabe [Mon, 22 Mar 2021 07:22:57 +0000 (16:22 +0900)]
examples/vhost_blk: check features before inflight API

Avoid calling rte_vhost_get_vhost_ring_inflight() and
rte_vhost_get_vring_base_from_inflight() when
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is not set.

Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovhost: get negotiated protocol features
Keiichi Watanabe [Mon, 22 Mar 2021 07:22:56 +0000 (16:22 +0900)]
vhost: get negotiated protocol features

Add rte_vhost_get_negotiated_protocol_features, which returns a set of
enabled protocol features.

Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
3 years agovhost: optimize virtqueue structure
Maxime Coquelin [Tue, 23 Mar 2021 09:02:19 +0000 (10:02 +0100)]
vhost: optimize virtqueue structure

This patch moves vhost_virtqueue struct fields in order
to both optimize packing and move hot fields on the first
cachelines.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
3 years agovhost: move dirty logging cache out of virtqueue
Maxime Coquelin [Tue, 23 Mar 2021 09:02:18 +0000 (10:02 +0100)]
vhost: move dirty logging cache out of virtqueue

This patch moves the per-virtqueue's dirty logging cache
out of the virtqueue struct, by allocating it dynamically
only when live-migration is enabled.

It saves 8 cachelines in vhost_virtqueue struct.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
3 years agovhost: remove unused virtqueue field
Maxime Coquelin [Tue, 23 Mar 2021 09:02:17 +0000 (10:02 +0100)]
vhost: remove unused virtqueue field

This patch removes the "backend" field of the
vhost_virtqueue struct, which is not used by the
library.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
3 years agonet/virtio: pack virtqueue structure
Maxime Coquelin [Tue, 16 Mar 2021 09:38:25 +0000 (10:38 +0100)]
net/virtio: pack virtqueue structure

This patch optimizes packing of the virtqueue
struct by moving fields around to fill holes.

Offset field is not used and so can be removed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
3 years agonet/virtio: allocate fake mbuf in Rx queue
Maxime Coquelin [Tue, 16 Mar 2021 09:38:24 +0000 (10:38 +0100)]
net/virtio: allocate fake mbuf in Rx queue

While it is worth clarifying whether the fake mbuf
in virtnet_rx struct is really necessary, it is sure
that it heavily impacts cache usage by being part of
the struct. Indeed, it uses two cachelines, and
requires alignment on a cacheline.

Before this series, it means it took 120 bytes in
virtnet_rx struct:

struct virtnet_rx {
 struct virtqueue *vq; /*0 8*/

 /* XXX 56 bytes hole, try to pack */

 /* --- cacheline 1 boundary (64 bytes) --- */
 struct rte_mbuf fake_mbuf __attribute__((__aligned__(64))); /*64 128*/
 /* --- cacheline 3 boundary (192 bytes) --- */

This patch allocates it using malloc in order to optimize
virtnet_rx cache usage and so virtqueue cache usage.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
3 years agonet/virtio: improve queue init error path
Maxime Coquelin [Tue, 16 Mar 2021 09:38:23 +0000 (10:38 +0100)]
net/virtio: improve queue init error path

This patch improves the error path of virtio_init_queue(),
by cleaning in reversing order all resources that have
been allocated.

Suggested-by: Chenbo Xia <chenbo.xia@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
3 years agonet/virtio: remove reference to virtqueue in vrings
Maxime Coquelin [Tue, 16 Mar 2021 09:38:22 +0000 (10:38 +0100)]
net/virtio: remove reference to virtqueue in vrings

Vrings are part of the virtqueues, so we don't need
to have a pointer to it in Vrings descriptions.

Instead, let's just subtract from its offset to
calculate virtqueue address.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
3 years agonet/qede: remove unnecessary field in Rx entry and simplify
Balazs Nemeth [Fri, 26 Mar 2021 11:01:30 +0000 (12:01 +0100)]
net/qede: remove unnecessary field in Rx entry and simplify

The member page_offset is always zero. Having this in the qede_rx_entry
makes it larger than it needs to be and this has cache performance
implications so remove that field. In addition, since qede_rx_entry only
has an rte_mbuf*, remove the definition of qede_rx_entry.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
3 years agonet/qede: prefetch next packet to free
Balazs Nemeth [Fri, 26 Mar 2021 11:01:29 +0000 (12:01 +0100)]
net/qede: prefetch next packet to free

While handling the current mbuf, pull the next mbuf into the cache. Note
that the last four mbufs pulled into the cache are not handled, but that
doesn't matter.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
3 years agonet/qede: prefetch hardware consumer
Balazs Nemeth [Fri, 26 Mar 2021 11:01:28 +0000 (12:01 +0100)]
net/qede: prefetch hardware consumer

Ensure that, while ecore_chain_get_cons_idx is running, txq->hw_cons_ptr
is prefetched. This shows a slight performance improvement.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
3 years agonet/qede: free packets in bulk
Balazs Nemeth [Fri, 26 Mar 2021 11:01:27 +0000 (12:01 +0100)]
net/qede: free packets in bulk

rte_pktmbuf_free_bulk calls rte_mempool_put_bulk with the number of
pending packets to return to the mempool. In contrast, rte_pktmbuf_free
calls rte_mempool_put that calls rte_mempool_put_bulk with one object.
An important performance related downside of adding one packet at a time
to the mempool is that on each call, the per-core cache pointer needs to
be read from tls while a single rte_mempool_put_bulk only reads from the
tls once.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
3 years agonet/qede: assume mbuf to free is never null
Balazs Nemeth [Fri, 26 Mar 2021 11:01:26 +0000 (12:01 +0100)]
net/qede: assume mbuf to free is never null

The ring txq->sw_tx_ring is managed with txq->sw_tx_cons. As long as
txq->sw_tx_cons is correct, there is no need to check if
txq->sw_tx_ring[idx] is null explicitly.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
3 years agonet/qede: get consumer index once
Balazs Nemeth [Fri, 26 Mar 2021 11:01:25 +0000 (12:01 +0100)]
net/qede: get consumer index once

Calling ecore_chain_get_cons_idx repeatedly is slower than calling it
once and using the result for the remainder of qede_process_tx_compl.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
3 years agonet/qede: remove flags from Tx entry
Balazs Nemeth [Fri, 26 Mar 2021 11:01:24 +0000 (12:01 +0100)]
net/qede: remove flags from Tx entry

Each sw_tx_ring entry was of type struct qede_tx_entry:

struct qede_tx_entry {
       struct rte_mbuf *mbuf;
       uint8_t flags;
};

Leaving the unused flags member here has a few performance implications.
First, each qede_tx_entry takes up more memory which has caching
implications as less entries fit in a cache line while multiple entries
are frequently handled in batches. Second, an array of qede_tx_entry
entries is incompatible with existing APIs that expect an array of
rte_mbuf pointers. Consequently, an extra array need to be allocated
before calling such APIs and each entry needs to be copied over.

This patch omits the flags field and replaces the qede_tx_entry entry
by a simple rte_mbuf pointer.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
3 years agonet/mlx5: reject tunnel ID modification
Alexander Kozyrev [Wed, 24 Mar 2021 15:04:39 +0000 (15:04 +0000)]
net/mlx5: reject tunnel ID modification

Modification of the 802.1Q Tag Identifier, VXLAN Network
Identifier or GENEVE Network Identifier is not supported.
Reject attempt to modify these fields via the MODIFY_FIELD
action and document this mlx5 driver limitation.

Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: allow modify field action on group 0
Alexander Kozyrev [Wed, 24 Mar 2021 15:04:37 +0000 (15:04 +0000)]
net/mlx5: allow modify field action on group 0

There is a limitation about copying one header field to another for
the Flow group 0. Such copy action is not allowed there. But setting
a header field with an immediate value is perfectly fine.
Allow the MODIFY_FIELD action on group 0 in case the source field
is an immediate value or a pointer to it.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: check extended metadata for mark modification
Alexander Kozyrev [Wed, 24 Mar 2021 15:04:36 +0000 (15:04 +0000)]
net/mlx5: check extended metadata for mark modification

The MODIFY_FIELD action requires the extended metadata support
in order to manipulate on MARK register. Check if it is supported
and reject the MODIFY_FIELD action if it is not.

Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: adjust modify field action endianness
Alexander Kozyrev [Wed, 24 Mar 2021 15:04:35 +0000 (15:04 +0000)]
net/mlx5: adjust modify field action endianness

Masks that used to modify a packet field must be in a big
endian format. Convert then to BE to ensure proper modification.

Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: check field size in modify field action
Alexander Kozyrev [Wed, 24 Mar 2021 15:04:34 +0000 (15:04 +0000)]
net/mlx5: check field size in modify field action

Add a validation check to make sure that the specified width
for MODIFY_FIELD RTE action is not bigger than a field size.

Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: probe host PF representor with sub-function
Xueming Li [Sun, 28 Mar 2021 13:48:15 +0000 (13:48 +0000)]
net/mlx5: probe host PF representor with sub-function

To simplify BlueField HPF representor(vf[-1]) probe, this patch allows
probe it with "sf" syntax: "sf[-1]".

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: improve xstats of bonding port
Xueming Li [Sun, 28 Mar 2021 13:48:14 +0000 (13:48 +0000)]
net/mlx5: improve xstats of bonding port

In case of kernel bonding device, counter was read from first bonding PF
member.

This patch reads all member PFs and sums to get bond xstats.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: fix setting VF default MAC through representor
Xueming Li [Sun, 28 Mar 2021 13:48:13 +0000 (13:48 +0000)]
net/mlx5: fix setting VF default MAC through representor

With kernel bonding, there was an error when setting VF MAC address
through representor. The Netlink API requires ifindex of owner PF, not
bonding device ifindex.

Uses owner PF ifindex to modify VF default MAC in case of bonding
device.

Fixes: c21e5facf7d2 ("net/mlx5: use bond index for netdev operations")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: save bonding member ports information
Xueming Li [Sun, 28 Mar 2021 13:48:12 +0000 (13:48 +0000)]
net/mlx5: save bonding member ports information

Since kernel bonding netdev doesn't provide statistics counter that
reflects all member ports, PMD has to manually summarize counters from
each member ports.

As a preparation, this patch collects bonding member port information
and saves to shared context data.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: support list of representor PF
Xueming Li [Sun, 28 Mar 2021 13:48:11 +0000 (13:48 +0000)]
net/mlx5: support list of representor PF

To probe representors from different kernel bonding PFs, had to specify
2 separate devargs like this:
    -a 03:00.0,representor=pf0vf[0-3] -a 03:00.0,representor=pf1vf[0-3]

This patch supports range or list of PF section in devargs, so the
alternative short devargs of above is:
    -a 03:00.0,representor=pf[0-1]vf[0-3]

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: refactor bonding representor probing
Xueming Li [Sun, 28 Mar 2021 13:48:10 +0000 (13:48 +0000)]
net/mlx5: refactor bonding representor probing

To probe representor on 2nd PF of kernel bonding device, had to specify
PF1 BDF in devarg:
  <PF1_BDF>,representor=0
When closing bonding device, all representors had to be closed together
and this implies all representors have to use primary PF of bonding
device. So after probing representor port on 2nd PF, when locating new
probed device using device argument, the filter used 2nd PF as PCI
address and failed to locate new device.

Conflict happened by using current representor devargs:
 - Use PCI BDF to specify representor owner PF
 - Use PCI BDF to locate probed representor device.
 - PMD uses primary PCI BDF as PCI device.

To resolve such conflicts, new representor syntax is introduced here:
  <primary BDF>,representor=pfXvfY
All representors must use primary PF as owner PCI device, PMD internally
locate owner PCI address by checking representor "pfX" part. To EAL, all
representors are registered to primary PCI device, the 2nd PF is hidden
to EAL, thus all search should be consistent.

Same to VF representor, HPF (host PF on BlueField) uses same syntax to
probe, example: representor=pf1vf[0-3,-1]

This patch also adds pf index into kernel bonding representor port name:
<BDF>_<ib_name>_representor_pf<X>vf<Y>

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: revert setting bonding representor to first PF
Xueming Li [Sun, 28 Mar 2021 13:48:09 +0000 (13:48 +0000)]
net/mlx5: revert setting bonding representor to first PF

With kernel bonding, representors on second PF are being probed by
devargs:
<primary_bdf>,representor=pf1vf<N>
No need to save primary PF port ID and lookup when probing sibling
ports, revert patch [1]

[1]:
commit e6818853c022 ("net/mlx5: set representor to first PF in bonding mode")

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: support sub-function representor
Xueming Li [Sun, 28 Mar 2021 13:48:08 +0000 (13:48 +0000)]
net/mlx5: support sub-function representor

This patch adds support for SF representor. Similar to VF representor,
switch port name of SF representor in phys_port_name sysfs key is
"pf<x>sf<y>".

Device representor argument is "representors=sf[list]", list member
could be mix of instance and range. Example:
  representors=sf[0,2,4,8-12,-1]

To probe VF representor and SF representor, need to separate into 2
devices:
  -a <BDF>,representor=vf[list] -a <BDF>,representor=sf[list]

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agocommon/mlx5: support sub-function representor parsing
Xueming Li [Sun, 28 Mar 2021 13:48:07 +0000 (13:48 +0000)]
common/mlx5: support sub-function representor parsing

This patch supports representor name parsing for SF.
In sysfs, representor name stored under "phys_port_name" sysfs key,
similar to VF representor, switch port name of SF representor is
"pf<x>sf<y>".

For netlink message, net SF type is supported.

Examples:

pf0sf1
pf0sf[0-3]

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/mlx5: fix using flow tunnel before null check
Yunjian Wang [Sat, 27 Mar 2021 02:44:09 +0000 (10:44 +0800)]
net/mlx5: fix using flow tunnel before null check

Coverity flags that 'ctx->tunnel' variable is used before
it's checked for NULL. This patch fixes this issue.

Coverity issue: 366201
Fixes: 868d2e342cf3 ("net/mlx5: fix tunnel offload hub multi-thread protection")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/ice: refine RSS configure
Qi Zhang [Thu, 25 Mar 2021 12:42:41 +0000 (20:42 +0800)]
net/ice: refine RSS configure

The ICE_RSS_ANY_HEADERS will try to enable outer RSS for
non-tunnel case and inner RSS for tunnel case. This confuse
user.

As we already have ICE_RSS_INNER_HEADER for tunnel case,
So, replace ICE_RSS_ANY_HEADERS with ICE_RSS_OUTER_HEADERS
for all exist flow which only specified the outer pattern.

To enable inner RSS for any tunnel cases, a separated rule
should be enabled.

The patch also remove some unnecessary condition check for GTPU
in base code, as we already can support outer RSS for GTPU.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Xuan Ding <xuan.ding@intel.com>
3 years agonet/ixgbe: fix RSS RETA being reset after port start
Murphy Yang [Mon, 29 Mar 2021 08:28:45 +0000 (08:28 +0000)]
net/ixgbe: fix RSS RETA being reset after port start

If one calls ‘rte_eth_dev_rss_reta_update’ with ixgbe before starting
the device (but after setting everything else), then RSS RETA
configuration will be zero after starting the device.

This patch gives a notification if the port not started.

Bugzilla ID: 664
Fixes: 249358424eab ("ixgbe: RSS RETA configuration")
Cc: stable@dpdk.org
Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
3 years agonet/ice: remove redundant function
Haiyue Wang [Mon, 29 Mar 2021 04:56:26 +0000 (12:56 +0800)]
net/ice: remove redundant function

The function 'ice_is_profile_rule' is defined as 'ice_is_prof_rule' in
base code, which has the exactly same function body.

So remove the 'ice_is_profile_rule', use the 'ice_is_prof_rule' instead.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/iavf: fix TSO max segment size
Qi Zhang [Mon, 1 Mar 2021 07:57:14 +0000 (15:57 +0800)]
net/iavf: fix TSO max segment size

According to Intel® AVF spec
(https://www.intel.com/content/dam/
www/public/us/en/documents/product-specifications/
ethernet-adaptive-virtual-function-hardware-spec.pdf)
section 2.2.2.3:
The max segment size(MSS) of TSO should not be set lower than 88.

Fixes: a2b29a7733ef ("net/avf: enable basic Rx Tx")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>