git.droids-corp.org - dpdk.git/log

sched: update subport rate dynamically

Add support to update subport rate dynamically.

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

sched: introduce subport profile add function

API to add new subport bandwidth profile.

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

sched: add subport profile table

Add subport profile table to internal port data structure
and update the port config function.

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

examples/cmdline: build on Windows

Enable cmdline sample application as all dependencies are met.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

cmdline: support Windows

Implement terminal handling, input polling, and vdprintf() for Windows.

Because Windows I/O model differs fundamentally from Unix and there is
no concept of character device, polling is simulated depending on the
underlying input device. Supporting non-terminal input is useful for
automated testing.

Windows emulation of VT100 uses "ESC [ E" for newline instead of
standard "ESC E", so add a workaround.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

eal/windows: improve compatibility networking headers

Extend compatibility header system to support librte_cmdline.

pthread.h has to include windows.h, which exposes struct in_addr, etc.
conflicting with compatibility headers. WIN32_LEAN_AND_MEAN macro
is required to disable this behavior. Use rte_windows.h to define
WIN32_LEAN_AND_MEAN for pthread library.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

cmdline: add internal wrapper for vdprintf

Add internal wrapper for vdprintf(3) that is only available on Unix.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

cmdline: add internal wrappers for character input

poll(3) is a purely Unix facility, so it cannot be directly used by
common code. read(2) is limited in device support outside of Unix.
Create wrapper functions and implement them for Unix.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

cmdline: add internal wrappers for terminal handling

Add functions that set up, save, and restore terminal parameters.
Use existing code as Unix implementation.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

cmdline: make implementation logically opaque

struct cmdline exposes platform-specific members it contains, most
notably struct termios that is only available on Unix. While ABI
considerations prevent from hinding the definition on already supported
platforms, struct cmdline is considered logically opaque from now on.
Add a deprecation notice targeted at 20.11.

* Remove tests checking struct cmdline content as meaningless.

* Fix missing cmdline_free() in unit test.

* Add cmdline_get_rdline() to access history buffer indirectly.
The new function is currently used only in tests.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

eal/windows: implement alarm API

Implementation is based on waitable timers Win32 API. When timer is set,
a callback and its argument are supplied to the OS, while timer handle
is stored in EAL alarm list. When timer expires, OS wakes up the
interrupt thread and runs the callback. Upon completion it removes the
alarm.

Waitable timers must be set from the thread their callback will run in,
eal_intr_thread_schedule() provides a way to schedule asyncronuous code
execution in the interrupt thread. Alarm module builds synchronous timer
setup on top of it.

Windows alarms are not a type of DPDK interrupt handle and do not
interact with interrupt module beyond executing in the same thread.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>

eal/windows: add interrupt thread skeleton

Windows interrupt support is based on IO completion ports (IOCP).
Interrupt thread would send the devices requests to notify about
interrupts and then wait for any request completion. Add skeleton code
of this model without any hardware support.

Another way to wake up the interrupt thread is APC (asynchronous procedure
call), scheduled by any other thread via eal_intr_thread_schedule().
This internal API is intended for alarm implementation.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>

bus/pci: support netuio on Windows

This patch adds implementations to probe PCI devices bound to netuio
with the help of "netuio" class device changes.
Now Windows will support both "netuio" and "net" device class and
can set kernel driver type based on the device class selection.

Note: Few definitions and structures have been copied from
netuio_interface.h file from
("[v5] windows/netuio: add Windows NetUIO kernel driver") series
and this will be fixed once the exact path for netuio source code is known.

Signed-off-by: John Alexander <john.alexander@datapath.co.uk>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Reviewed-by: Tal Shnaiderman <talshn@nvidia.com>
Reviewed-by: Narcisa Vasile <navasile@linux.microsoft.com>

table: fix hash for 32-bit

When create softnic hash table with 16 keys, it failed on 32-bit
environment, because the pointer field in structure rte_bucket_4_16
is only 32 bits. Add a padding field in 32-bit environment to keep
the structure to a multiple of 64 bytes. Apply this to 8-byte and
32-byte key hash function as well.

Fixes: 8aa327214c ("table: hash")
Cc: stable@dpdk.org
Signed-off-by: Ting Xu <ting.xu@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

acl: deduplicate AVX512 code

Current rte_acl_classify_avx512x32() and rte_acl_classify_avx512x16()
code paths are very similar. The only differences are due to
256/512 register/instrincts naming conventions.
So to deduplicate the code:
- Move common code into “acl_run_avx512_common.h”
- Use macros to hide difference in naming conventions

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

acl: optimize AVX512 classify with 4 bytes loads

With current ACL implementation first field in the rule definition
has always to be one byte long. Though for optimising classify
implementation it might be useful to do 4B reads
(as we do for rest of the fields).
So at build phase, check user provided field definitions to determine
is it safe to do 4B loads for first ACL field.
Then at run-time this information can be used to choose classify
behavior.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

acl: add 512-bit AVX512 classify method

Introduce classify implementation that uses AVX512 specific ISA.
rte_acl_classify_avx512x32() is able to process up to 32 flows in parallel.
It uses 512-bit width registers/instructions and provides higher
performance then rte_acl_classify_avx512x16(), but can cause
frequency level change.
Note that for now only 64-bit version is supported.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

acl: select 256-bit AVX512 classify method by default

On supported platforms, set RTE_ACL_CLASSIFY_AVX512X16 as
default ACL classify algorithm.
Note that AVX512X16 implementation uses 256-bit registers/instincts only
to avoid possibility of frequency drop.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

acl: add 256-bit AVX512 classify method

Introduce classify implementation that uses AVX512 specific ISA.
rte_acl_classify_avx512x16() is able to process up to 16 flows in parallel.
It uses 256-bit width registers/instructions only
(to avoid frequency level change).
Note that for now only 64-bit version is supported.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

acl: add infrastructure for AVX512 classify methods

Add necessary changes to support new AVX512 specific ACL classify
algorithm:
- changes in meson.build to check that build tools
(compiler, assembler, etc.) do properly support AVX512.
- run-time checks to make sure target platform does support AVX512.
- dummy rte_acl_classify_avx512() for targets where AVX512
implementation couldn't be properly supported.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

test/acl: expand classify test coverage

Make classify test to run for all supported methods.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

app/acl: enhance displayed statistics

- enhance output to print extra stats
- use rte_rdtsc_precise() for cycle measurements

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

acl: rework classify method selection

Right now ACL library determines best possible (default) classify method
on a given platform with special constructor function rte_acl_init().
This patch makes the following changes:
- Move selection of default classify method into a separate private
function and call it for each ACL context creation (rte_acl_create()).
- Remove library constructor function
- Make rte_acl_set_ctx_classify() to check that requested algorithm
is supported on given platform.

The purpose of these changes to improve and simplify algorithm selection
process and prepare ACL library to be integrated with the
max SIMD bitwidth series in discussion.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

acl: remove classify methods count enum

Removal of unused enum value (RTE_ACL_CLASSIFY_NUM).
This enum value is not used inside DPDK, while it prevents
to add new classify algorithms without causing an ABI breakage.

Note that this change introduce a formal ABI incompatibility
with previous versions of ACL library.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

doc: fix missing classify methods in ACL guide

Add brief description for missing ACL classify algorithms:
RTE_ACL_CLASSIFY_NEON and RTE_ACL_CLASSIFY_ALTIVEC.

Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
Fixes: 1d73135f9f1c ("acl: add AltiVec for ppc64")
Cc: stable@dpdk.org
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

acl: fix x86 build for compiler without AVX2

Right now we define dummy version of rte_acl_classify_avx2()
when both X86 and AVX2 are not detected, though it should be
for non-AVX2 case only.

Fixes: e53ce4e41379 ("acl: remove use of weak functions")
Cc: stable@dpdk.org
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

eal/x86: introduce type for AVX 512-bit

New data type to manipulate 512 bit AVX values.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

eal/windows: export all built functions for clang

export for clang build all the functions currently built
on Windows and listed in rte_eal_version.map by adding
them to rte_eal_exports.def.

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

bus/pci: support segment as address domain on Windows

Set the domain value for rte_pci_addr probing on Windows
to the value of the PCI segment returned by SPDRP_BUSNUMBER.

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>

usertools: add OCTEON TX2 REE device binding

Update the devbind script with new section of regex devices, also
added OCTEONTX2 REE device ID to regex device list

Signed-off-by: Guy Kaneti <guyk@marvell.com>

regex/octeontx2: introduce REE driver

Add meson based build infrastructure along with the
OTX2 regexdev (REE) device functions.
Add Marvell OCTEON TX2 regex guide.

Signed-off-by: Guy Kaneti <guyk@marvell.com>

common/octeontx2: add REE definitions and logging

Add REE mbox msg definitions, RVU and REE HW definitions

Signed-off-by: Guy Kaneti <guyk@marvell.com>

bus/pci: copy new id for inserted device on Linux

When a device is inserted into an existing BDF slot
that has not been probed, we must overwrite the old
PCI ID with the ID of the new function. Otherwise
we may not probe the function with the correct driver,
if at all.

Signed-off-by: Jim Harris <james.r.harris@intel.com>

net: add CRC AVX512 implementation

This patch enables the optimized calculation of CRC32-Ethernet and
CRC16-CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC
implementation is built if the compiler supports the required instruction
sets. It is selected at run-time if the host CPU, again, supports the
required instruction sets.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Jasvinder Singh <jasvinder.singh@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

net: add CRC implementation runtime selection

This patch adds support for run-time selection of the optimal
architecture-specific CRC path, based on the supported instruction set(s)
of the CPU.

The compiler option checks have been moved from the C files to the meson
script. The rte_cpu_get_flag_enabled function is called automatically by
the library at process initialization time to determine which
instructions the CPU supports, with the most optimal supported CRC path
ultimately selected.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Jasvinder Singh <jasvinder.singh@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

test/cpuflags: add new Arm flags

This patch adds new flags into the test_cpuflags() functions for ARM64
platform, such as RTE_CPUFLAG_SVE, etc.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

eal/arm64: update CPU flags

ARM64 Linux kernel updated the CPU flags using the HWCAP scheme.
The related marco definition can be found in linux kernel:
arch/arm64/include/uapi/asm/hwcap.h

This patch incorporates those changes to the EAL library.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

config: remap flags used for Arm platforms

RTE_ARCH_xx flags are used to distinguish platform architectures.
These flags can be used to pick different code paths for different
architectures at compile time.
For Arm platforms, there are 3 flags in use: RTE_ARCH_ARM,
RTE_ARCH_ARMv7 and RTE_ARCH_ARM64.
RTE_ARCH_ARM64 is for 64-bit aarch64 platforms,
and RTE_ARCH_ARM & RTE_ARCH_ARMv7 are for 32-bit platforms.
RTE_ARCH_ARMv7 is for ARMv7 platforms as its name suggested.

The issue is meaning of RTE_ARCH_ARM is not clear enough.
Because no info about platform word length is included in the name.
To make the flag names more clear, a naming scheme is proposed.

RTE_ARCH_ARM (all Arm platforms)
    |
    +----RTE_ARCH_32 (New. 32-bit platforms of all architectures)
    |        |
    |        +----RTE_ARCH_ARMv7 (ARMv7 platforms)
    |        |
    |        +----RTE_ARCH_ARMv8_AARCH32 (aarch32 state on aarch64 machine)
    |
    +----RTE_ARCH_64 (64-bit platforms of all architectures)
             |
             +----RTE_ARCH_ARM64 (64-bit Arm platforms)

RTE_ARCH_32 will be explicitly defined for 32-bit platforms.

To fit into the new naming scheme, current usage of RTE_ARCH_ARM in
project is mapped to (RTE_ARCH_ARM && RTE_ARCH_32).

Matching flags for other architectures are:
RTE_ARCH_X86
    |
    +----RTE_ARCH_32
    |        |
    |        +----RTE_ARCH_I686
    |        |
    |        +----RTE_ARCH_X86_X32
    |
    +----RTE_ARCH_64
             |
             +----RTE_ARCH_X86_64

RTE_ARCH_PPC_64 ---- RTE_ARCH_64

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>

config: add -moutline-atomics to default Arm build

-moutline-atomics allows LSE instructions to be used if available when
compiling for ARMv8.0 instruction set. It's enabled by default on newer
compilers, such as gcc-10.1. Enable the option in case an earlier
compiler version is used for the default build that lacks either -mcpu
or -mtune which would otherwise enable it.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>

net/ice: use write combining store for tail updates

Performance improvement: use a write combining store
instead of a regular mmio write to update queue tail
registers.

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

net/ixgbe: use write combining store for tail updates

Performance improvement: use a write combining store
instead of a regular mmio write to update queue tail
registers.

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

common/qat: use write combining store for tail updates

Performance improvement: use a write combining store
instead of a regular mmio write to update queue tail
registers.

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>

net/i40e: use write combining store for tail updates

Performance improvement: use a write combining store
instead of a regular mmio write to update queue tail
registers.

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

eal: add write combining store

Add rte_write32_wc and rte_write32_wc_relaxed functions
that implement 32bit stores using write combining memory protocol.
Provided generic stubs and x86 implementation.

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

mem: fix allocation failure on non-NUMA kernel

Running dpdk-helloworld on Linux with lib numa present, but no kernel
support for NUMA (CONFIG_NUMA=n) causes rte_service_init() to fail with
EAL: error allocating rte services array.

alloc_seg() calls get_mempolicy to verify that the allocation
has happened on the correct socket, but receives ENOSYS from
the kernel and fails the allocation.

The allocated socket should only be verified if check_numa() is true.

Fixes: 2a96c88be83e ("mem: ease init in a docker container")
Cc: stable@dpdk.org
Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

maintainers: update for bonding

Adding Connor as additional maintainer to bonding.

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/virtio: check raw checksum failure

rte_raw_cksum_mbuf can fail, so we should check to see if it
has. If so, return with an error.

Fixes: 96cb6711939e ("net/virtio: support Rx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: Chas Williams <3chas3@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net: check segment pointer in raw checksum processing

If the overall pkt_len and segment lengths are out of agreement,
it is possible for the seg to be NULL after the loop. Add assert
to check this condition in debug builds. Otherwise, return failure.

Fixes: c442fed81bb9 ("net: add function to calculate checksum in mbuf")
Cc: stable@dpdk.org
Signed-off-by: Chas Williams <3chas3@gmail.com>

doc: fix diagram in dpaa2 guide

The diagram to show dpaa2 drivers brief
was missing in dpaa2.html file.

fix a typo in encoding for a literal block
to make it visible in generated doc file.

Fixes: 846a8305f277 ("doc: add DPAA2 NIC details")
Cc: stable@dpdk.org
Signed-off-by: Sachin Saxena <sachin.saxena@oss.nxp.com>

build: skip detecting libpcap via pcap-config

When compiling for a slightly different architecture, e.g. 32-bit on 64-bit
systems using CFLAGS rather than a cross-file, the pcap-config utility can
often return parameters that are unusable for the build in question, i.e.
providing the native 64-bit library paths rather than checking for 32-bit
equivalent.

Since many distros now include a version of libpcap with a
pkg-config file, and for those that don't find-library should work ok as a
fallback, we can explicitly just use pkg-config in the dependency search,
causing meson to skip trying to use pcap-config.

Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Luca Boccassi <bluca@debian.org>
Tested-by: David Marchand <david.marchand@redhat.com>

eal: fix doxygen for EAL cleanup

Align rte_eal_cleanup return codes description to the rest of dpdk.

Fixes: aec9c13c5257 ("eal: add function to release internal resources")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>

raw/dpaa2_qdma: fix reset

This issue detected by coverity, CID#279443(Structurally dead code).

Coverity issue: 279443
Fixes: c22fab9a6c34 ("raw/dpaa2_qdma: support configuration APIs")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>

raw/octeontx2_dma: support multiple DPI blocks

This patch adds support for multiple DPI blocks by removing the fixed
macro that was writing to same sysfs entry for different DPI blocks.

Signed-off-by: Radha Mohan Chintakuntla <radhac@marvell.com>
Reviewed-by: Satananda Burla <sburla@marvell.com>
Acked-by: Satha Rao <skoteshwar@marvell.com>

raw/octeontx2_dma: assign PEM id for external transfer

DPI needs to know the PEM number for all external transfers.

Signed-off-by: Radha Mohan Chintakuntla <radhac@marvell.com>
Reviewed-by: Satananda Burla <sburla@marvell.com>
Acked-by: Satha Rao <skoteshwar@marvell.com>

app/testpmd: add FEC command

This commit adds testpmd capability to query and config FEC
function of device. This includes:
- show FEC capabilities, example:
testpmd> show port 0 fec capabilities
- show FEC mode, example:
testpmd> show port 0 fec_mode
- config FEC mode, example:
testpmd> set port <port_id> fec_mode auto|off|rs|baser

where:

auto|off|rs|baser are four kinds of FEC mode which dev
support according to MAC link speed.

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Chengchang Tang <tangchengchang@huawei.com>

net/hns3: support FEC

Forward error correction (FEC) is a bit error correction mode.
It adds error correction information to data packets at the
transmit end, and uses the error correction information to correct
the bit errors generated during data packet transmission at the
receive end. This improves signal quality but also brings a delay
to signals. This function can be enabled or disabled as required.

This patch adds FEC support for ethdev.Introduce ethdev
operations which support query and config FEC information in
hardware.

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Chengchang Tang <tangchengchang@huawei.com>

ethdev: introduce FEC API

This patch adds Forward error correction(FEC) support for ethdev.
Introduce APIs which support query and config FEC information in
hardware.

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Chengchang Tang <tangchengchang@huawei.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/iavf: add extended stats

Add implementation of xstats() functions in iavf PMD.

Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: cleanup Tx buffers

Add support to the iavf driver for the API rte_eth_tx_done_cleanup
to force free consumed buffers on Tx ring.

Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: disable promiscuous mode on close

In scenario of Kernel Driver runs on PF and PMD runs on VF, PMD exit
doesn't disable promiscuous mode, this will cause vlan filter set by
Kernel Driver will not take effect.

This patch will fix it, add promiscuous disable at device disable.

Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: optimize promiscuous device operations

This patch is to improve efficiency and eliminate code
redundancy of promiscuous ops.

Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: re-program promiscuous mode on VF interface

During a kernel PF reset, this event is propagated to the VF.
The DPDK VF PMD will execute the reset task before the PF is done
with his. This results in the admin queue message not being responded
to leaving the port in "promiscuous" mode.

This patch makes sure the promiscuous mode is configured independently
of the current admin state.

Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: set min and max MTU for VF

This commit sets the min and max supported MTU values for iavf VF
devices via the iavf_dev_info_get() function. Min MTU supported
is set to RTE_ETHER_MIN_MTU and max MTU is calculated as the max
packet length supported minus the transport overhead.

Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: use link status helper functions

Use new rte_eth_linkstatus_get/set helper functions to handle link
status update.

Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: fix flow flush after PF reset

When VF begin reset after PF reset, VF will be uninitialized at first
and then be initialized, during the time any invalid cmd such as flow
flush should not be sent to PF until the uninitialization is finished.

Fixes: 1eab95fe2e36 ("net/iavf: fix command after PF reset")
Cc: stable@dpdk.org
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/fm10k: fix memory leak when Tx thresh check fails

In fm10k_tx_queue_setup(), we allocate memory for the queue
structure but not released when Tx thresh check fails.

Fixes: 98068e0e044e ("fm10k: add Tx queue setup/release")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>

common/mlx5: fix PCI address lookup

mlx5 PMDs use the mlx5_dev_to_pci_addr() routine to convert
Infiniband device name to the Bus-Device-Function location
on the PCI bus. The routine returned success even in case of
not found identification string. On caller side it likely
caused the wrong match with the BDF of previous device
resulting in wrong representor and master recognitions.

Fixes: 771fa900b73a ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: disable dump of Verbs flows

There was a segment fault when dump flows with device argument of
dv_flow_en=0. In such case, Verbs flow engine was enabled and fdb
resources were not initialized. It's suggested to use mlx_fs_dump
for Verbs flow dump.

This patch adds verbs engine check, prints warning message and return
gracefully.

Fixes: f6d7202402c9 ("net/mlx5: support flow dump API")
Cc: stable@dpdk.org
Reported-by: Jørgen Østergaard Sloth <jorgen.sloth@xci.dk>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/bnxt: support fast mbuf free

Add support for DEV_TX_OFFLOAD_MBUF_FAST_FREE to bnxt
vector mode transmit. This offload may be enabled
only when multi-segment transmit is not needed, all
transmitted mbufs for a given queue will be allocated
from the same pool, and all transmitted mbufs will
have a reference count of 1.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix link update

1. When port is stopped, we can forcibly set the link status for the
   device to down.
2. VFs and MH PFs do not have the privilege to bring the link down.
   As a result driver prints "Link Up" when port is stopped.
3. When driver receives link status/speed/config async event from fw,
   driver invokes bnxt_link_update() with exp_link_status as ETH_LINK_UP
   This is not logically correct as the async event could be for Link up
   or link down or for speed change.

Fixes: 074cacb9907a ("net/bnxt: fix link during port toggle")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/mlx5: remove Rx queue object type field

Once the separation between Verbs and DevX is done using function
pointers, the type field of the Rx queue object structure becomes
redundant and no more code is used.
Remove the unnecessary field from the structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: separate Rx queue state modification

Separate Rx state modification to the Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: remove Tx queue object type field

Once the separation between Verbs and DevX is done using function
pointers, the type field of the Tx queue object structure becomes
redundant and no more code is used.
Remove the unnecessary field from the structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: share Tx queue object modification

Use new modify_qp functions for Tx object creation in DevX and Verbs
modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: separate Tx queue object modification

Separate Tx object modification to the Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: rearrange QP creation in Verbs module

1. Rename function to mention the internal resources.
2. Reduce the number of function arguments.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: rearrange SQ and CQ creation in DevX module

1. Rename functions to mention the internal resources.
2. Reduce the number of function arguments.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: share Tx control code

Move Tx object similar resources allocations and debug logs from DevX
and Verbs modules to a shared location.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: separate Tx queue object creations

As an arrangement to Windows OS support, the Verbs operations should be
separated to another file.
By this way, the build can easily cut the unsupported Verbs APIs from
the compilation process.

Define operation structure and DevX module in addition to the existing
Linux Verbs module.
Separate Tx object creation into the Verbs/DevX modules and update the
operation structure according to the OS support and the user
configuration.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: reposition event queue number field

The eqn field has become a field of sh directly since it is also
relevant for Tx and Rx.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: reorder Tx queue Verbs object creation

Move the creation of the completion queue from the mlx5_txq_obj_new
function into an auxiliary function.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: reorder Tx queue DevX object creation

Move the creation of the send queue and the completion queue resources
from the mlx5_txq_obj_devx_new function into auxiliary functions.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: mitigate Tx queue reference counters

The Tx queue structures manage 2 different reference counter per queue:
txq_ctrl reference counter and txq_obj reference counter.

There is no real need to use two different counters, it just complicates
the release functions.
Remove the txq_obj counter and use only the txq_ctrl counter.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: remove unused variable in Tx queue creation

When a CQ is not created by DevX, it be allocated by either DV function
or by regular Verbs function.

The CQ DV attributes variable was wrongly defined and initialized in Tx
queue creation while the CQ is created by the regular Verbs function
what remained the attributes variable unused.

Remove the unused variable.

Fixes: faf2667fe8d5 ("net/mlx5: separate DPDK from verbs Tx queue objects")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix send queue doorbell

As part of SQ creation for Tx queue objects, a HW doorbell memory should
be allocated and mapped to the HW.

The SQ doorbell handler was wrongly saved on the CQ fields what caused
wrong doorbell release in the Tx queue object destroy flow.

Save the SQ doorbell handler in the SQ fields.

Fixes: 3a87b964edd3 ("net/mlx5: create Tx queues with DevX")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

ethdev: fix data type in TC queues

Currently, base and nb_queue in the tc_rxq and tc_txq information
of queue and TC mapping on both TX and RX paths are uint8_t.
However, these data will be truncated when queue number under a TC
is greater than 256. So it is necessary for base and nb_queue to
change from uint8_t to uint16_t.

Fixes: 89d6728c7837 ("ethdev: get DCB information")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Dongdong Liu <liudongdong3@huawei.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: move RSS expansion code to mlx5 driver

Patch [1] added support for RSS flow expansion.
It was added in ethdev for public use, but until now it is used only
by MLX5 PMD.
To allow local changes in this code, this patch removes it from ethdev
and moves it to MLX5 PMD file.

[1] commit 4ed05fcd441b ("ethdev: add flow API to expand RSS flows")

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: fix RSS flow expansion in case of mismatch

Function rte_flow_expand_rss() is used to expand a flow rule with
partial pattern into several rules, to ensure all relevant packets
are matched.
It uses utility function rte_flow_expand_rss_item_complete(), to check
if the last valid item in the flow rule pattern needs to be completed.
For example the pattern "eth / ipv4 proto is 17 / end" will be completed
with a "udp" item.
This function returns "void" item in two cases:
1) The last item has empty spec, for example "eth / ipv4 / end".
2) The last itme has spec that can't be expanded for RSS.
For example the pattern "eth / ipv4 proto is 47 / end" ends with IPv4
item that has next protocol GRE.

In both cases the flow rule may be expanded, but in the second case such
expansion may create rules with invalid pattern.
For example "eth / ipv4 proto is 47 / udp / end".
In such a case the flow rule should not be expanded.

This patch updates function rte_flow_expand_rss_item_complete().
Return value RTE_FLOW_ITEM_TYPE_END is used to indicate the flow rule
should not be expanded.
In such a case, rte_flow_expand_rss() will return with the original flow
rule only, without any expansion.

Fixes: fc2dd8dd492f ("ethdev: fix expand RSS flows")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>

ethdev: check if queues are allocated before getting info

A crash is detected when '--txpkts=#' parameter provided to the testpmd,
this is because queue information is requested before queues have been
allocated.

Adding check to queue info APIs
('rte_eth_rx_queue_info_get()' & 'rte_eth_tx_queue_info_get')
to protect against similar cases.

Fixes: ba2fb4f022fc ("ethdev: check if queue setup when getting queue info")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/qede: fix getting link details

This patch fixes get current link details, without this change the link
details can be inaccurate if proper lock is not acquired.

Fixes: 739a5b2f2b49 ("net/qede/base: use passed ptt handler")
Cc: stable@dpdk.org
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Rasesh Mody <rmody@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>

net/mlx5: fix Rx queue count calculation

There are a few discrepancies in the Rx queue count calculation.

The wrong index is used to calculate the number of used descriptors
in an Rx queue in case of the compressed CQE processing. The global
CQ index is used while we really need an internal index in a single
compressed session to get the right number of elements processed.

The total number of CQs should be used instead of the number of mbufs
to find out about the maximum number of Rx descriptors. These numbers
are not equal for the Multi-Packet Rx queue.

Allow the Rx queue count calculation for all possible Rx bursts since
CQ handling is the same for regular, vectorized, and multi-packet Rx
queues.

Fixes: 26f04883441a ("net/mlx5: support Rx queue count API")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: fix meter table definitions

As metering and metadata features were developed at the same time. The
metering and metadata tables are defined conflicted.

This cause the meter suffix flow jump to the same metadata table and
cause flow deadloop.

Adjust the metering table define to fix that issue.

Fixes: 46a5e6bc6a85 ("net/mlx5: prepare meter flow tables")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: fix DevX CQ attributes values

Previous patch wrongly used rdma-core defined values, when preparing
attributes for creating DevX CQ object.
This patch adds the correct value definition and uses them instead.

Fixes: 08d1838f645a ("net/mlx5: implement CQ for Rx using DevX API")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/bnxt: update HWRM structures

HWRM API to a newer 1.10.1.70 version.

Few fields have been renamed because of this.
rx_err_pkt -> rx_discard_pkts
rx_drop_pkts -> rx_error_pkts

tx_err_pkts -> tx_discard_pkts
tx_drop_pkts -> tx_error_pkts

link_signal_mode -> active_fec_signal_mode

tx_bd_long_hi.mss -> tx_bd_long_hi.kid_or_ts_high_mss
tx_bd_long_hi.hdr_size -> tx_bd_long_hi.kid_or_ts_low_hdr_size

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: support RSS hash selection

Add support to select RSS hash based on innermost or outermost
headers. If an application is started without any specific settings
the default mode configured by FW or HW shall be used.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

ethdev: use mbuf bulk free API

The mbuf library now has routine to free multiple buffers.
Loop is no longer needed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>

net/hns3: remove redundant return value assignment

When an error occurs in the reset process, -EIO is returned.
The assignment of ret here is redundant, so deleted it.

Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

net/hns3: check PCI config space reads

This patch add return value check when calling rte_pci_read_config
function.

Fixes: cea37e513329 ("net/hns3: fix FLR reset")
Cc: stable@dpdk.org
Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

net/hns3: support queue start and stop

The new generation hns3 network engine supports independent enabling and
disabling of a single Tx/Rx queue. So, it can support the queue start
and stop feature. In addition, when different numbers of Tx and Rx
queues need to be enabled in some applications, hns3 pmd does not need
to create fake queues to enable these scenarios.

This patch Add queue start and stop feature for the new generation hns3
networking engine. Cancel the creation of fake queue on the new
generation network engine. And the previously improperly named queue
related function was renamed to improve readability.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

net/hns3: set max scheduling rate based on actual board

Currently, max scheduling rates configuration of pg, pri and port are
set to 100000Mbps, which is the maximum bandwidth of hns3 network engine
with revision_id equals 0x21. However, max scheduling rate configuration
should be set to hardware based on the actual hardware board
environment.

The max_tm_rate in struct hns3_hw, meaning the rate, is obtained from
firmware. So we should use the variable to configure the max scheduling
rate.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>