git.droids-corp.org - dpdk.git/log

common/mlx5: update log for DevX general command failure

Application can fetch syndrome value after FW operation failure
starting from Mellanox OFED-5.6.
The patch updates log data issued after devx_general_cmd error.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: support field modification in meter rules

This patch introduces MODIFY_FIELD action support in meter. User can
create meter policy with MODIFY_FIELD action in green/yellow action.

For example:

testpmd> add port meter policy 0 21 g_actions modify_field op set
dst_type ipv4_ecn src_type value src_value 3 width 2 / ...

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: support modifying ECN field

This patch is to support modify ECN field in IPv4/IPv6 header.

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

common/mlx5: check ECN modification capability

Flag outer_ip_ecn in header modify capabilities properties layout is
added in order to check if the firmware supports modification of ecn
field.

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: support represented port item in flow rules

Add support for represented_port item in pattern. And if the spec and mask
both are NULL, translate function will not add source vport to matcher.

For example, testpmd starts with PF, VF-rep0 and VF-rep1, below command
will redirect packets from VF0 and VF1 to wire:
testpmd> flow create 0 ingress transfer group 0 pattern eth /
represented_port / end actions represented_port ethdev_id is 0 / end

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

eal/linux: allocate worker lcore stacks in hugepages

Add support for using hugepages for worker lcore stack memory. The
intent is to improve performance by reducing stack memory related TLB
misses and also by using memory local to the NUMA node of each lcore.

EAL option '--huge-worker-stack[=stack-size-in-kbytes]' is added to allow
the feature to be enabled at runtime. If the size is not specified,
the system pthread stack size will be used.

Signed-off-by: Don Wallwork <donw@xsightlabs.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>

ip_frag: fix build with GCC 12

GCC 12 raises warnings on usage of rte_memcpy with IPv4 options handling
in fragments for both the ip_frag library and unit tests.

For example in the library:
In function ‘_mm256_storeu_si256’,
    inlined from ‘rte_mov32’ at
        ../lib/eal/x86/include/rte_memcpy.h:347:2,
    inlined from ‘rte_mov128’ at
        ../lib/eal/x86/include/rte_memcpy.h:369:2,
    inlined from ‘rte_memcpy_generic’
        at ../lib/eal/x86/include/rte_memcpy.h:445:4,
    inlined from ‘rte_memcpy’
        at ../lib/eal/x86/include/rte_memcpy.h:851:10,
    inlined from ‘__create_ipopt_frag_hdr’
        at ../lib/ip_frag/rte_ipv4_fragmentation.c:68:4,
    inlined from ‘rte_ipv4_fragment_packet’
        at ../lib/ip_frag/rte_ipv4_fragmentation.c:242:16:
/usr/lib/gcc/x86_64-redhat-linux/12/include/avxintrin.h:935:8: error:
    array subscript ‘__m256i_u[1]’ is partly outside array bounds of
    ‘uint8_t[60]’ {aka ‘unsigned char[60]’} [-Werror=array-bounds]
  935 |   *__P = __A;
      |   ~~~~~^~~~~
../lib/ip_frag/rte_ipv4_fragmentation.c: In function
    ‘rte_ipv4_fragment_packet’:
../lib/ip_frag/rte_ipv4_fragmentation.c:122:17: note: at offset [52, 60]
    into object ‘ipopt_frag_hdr’ of size 60
  122 |         uint8_t ipopt_frag_hdr[IPV4_HDR_MAX_LEN];
      |                 ^~~~~~~~~~~~~~

To resolve the compilation warning, replace the rte_memcpy with memcpy.

Fixes: b50a14a853aa ("ip_frag: add IPv4 options fragment")
Signed-off-by: Huichao Cai <chcchc88@163.com>

net/qede: fix build with GCC 12

The x86 version of rte_memcpy can cause warnings. The driver does
not need to use rte_memcpy for everything. Standard memcpy is
just as fast and safer; the compiler and static analysis tools
treat memcpy specially.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

net/ice/base: fix build with GCC 12

GCC 12 with -O2 flag would raise the following warning:
../drivers/net/ice/base/ice_switch.c:7220:61: error: writing 1 byte into a
region of size 0 [-Werror=stringop-overflow=]
7220 | buf[recps].content.lkup_indx[i + 1] = entry->fv_idx[i];
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~

This patch changed the type of fv_idx in struct ice_recp_grp_entry to
align with its callers which are also u8 type.

Fixes: 04b8ec1ea807 ("net/ice/base: add protocol structures and defines")
Cc: stable@dpdk.org
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: add basic NEON Rx

This patch adds the basic NEON Rx path to the iavf driver. It does not
include scatter or flex varieties.

Tested on N1SDP platform with Intel XL710 NIC and 40G connection.
Tested with a single core and testpmd rxonly mode. Saw no significant
performance difference between scalar and Arm vPMD paths using this test
in iavf and saw the same results when comparing scalar and Arm vPMD
path in i40e.

Signed-off-by: Kathleen Capella <kathleen.capella@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>

net/i40e: add outer VLAN processing

Outer VLAN processing is supported after firmware v8.4, kernel driver
also change the default behavior to support this feature. To align with
kernel driver, add support for outer VLAN processing in DPDK.

But it is forbidden for firmware to change the Inner/Outer VLAN
configuration while there are MAC/VLAN filters in the switch table.
Therefore, we need to clear the MAC table before setting config,
and then restore the MAC table after setting.

This will not impact on an old firmware.

Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Signed-off-by: Kevin Liu <kevinx.liu@intel.com>
Acked-by: Yuying Zhang <yuying.zhang@intel.com>

net/ice: add DDP runtime configuration dump

Dump DDP runtime configure into a binary (package) file from ice PF port.

Add command line:
    ddp dump <port_id> <config_path>

Parameters:
    <port_id>       the PF Port ID
    <config_path>   dumped runtime configure file, if not a absolute path,
                    it will be dumped to testpmd running directory.

For example:
testpmd> ddp dump 0 current.pkg

If you want to dump ice VF DDP runtime configure, you need bind other
unused PF port of the NIC first, and then dump the PF's runtime configure
as target output.

Signed-off-by: Steve Yang <stevex.yang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/ice: fix race condition in Rx timestamp

In multi-cores cases for Rx timestamp offload, to avoid phc time being
frequently overwritten, move related variables from ice_adapter to
ice_rx_queue structure, and each queue will handle timestamp calculation
by itself.

Fixes: 953e74e6b73a ("net/ice: enable Rx timestamp on flex descriptor")
Fixes: 5543827fc6df ("net/ice: improve performance of Rx timestamp offload")
Cc: stable@dpdk.org
Signed-off-by: Simei Su <simei.su@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/qede: fix build with GCC 13

Reproduced with "gcc (GCC) 13.0.0 20220616 (experimental)"

Build error:
In file included from ../drivers/net/qede/qede_debug.c:9:
../drivers/net/qede/qede_debug.c: In function ‘qed_grc_dump_addr_range’:
../drivers/net/qede/base/ecore.h:95:17:
warning: overflow in conversion from ‘int’ to ‘u8’
{aka ‘unsigned char’} changes value from ‘(int)vf_id << 8 | 128’
to ‘128’ [-Woverflow]
   95 |                 ((_value & _name##_MASK) << _name##_SHIFT)
      |                 ^
../drivers/net/qede/qede_debug.c:1907:31:
note: in expansion of macro ‘FIELD_VALUE’
1907 |         fid = FIELD_VALUE(PXP_PRETEND_CONCRETE_FID_VFVALID, 1)
      |               ^~~~~~~~~~~

To prevent overflow converting 'fib' to uint16_t,
while updating it also updated 'vf_id' to 16 bit too.

Fixes: ec55c118792b ("net/qede: add infrastructure for debug data collection")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
Acked-by: Devendra Singh Rawat <dsinghrawat@marvell.com>

net/cnxk: add SDP VF device IDs

Add SDP VF device ID in the table for probe matching.

Signed-off-by: Radha Mohan Chintakuntla <radhac@marvell.com>

net/cnxk: resize CQ for Rx security for errata

Resize CQ for Rx security offload in case of HW errata.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

net/cnxk: fix PFC class disabling

Disabling a specific PFC class on a SQ is resulting in disabling PFC
on the entire port.

Fixes: 9544713564f5 ("net/cnxk: support priority flow control")
Cc: stable@dpdk.org
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

net/cnxk: remove restriction on VF for PFC config

Currently PFC configuration is not allowed on VFs.
Patch enables PFC configuration on VFs

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>

net/cnxk: add SDP link status

Add SDP link status reporting

Signed-off-by: Satananda Burla <sburla@marvell.com>

common/cnxk: fix mbox structs to avoid unaligned access

Fix mbox structs to avoid unaligned access as mbox
memory is from BAR space.

Fixes: 503b82de2cbf ("common/cnxk: add mbox request and response definitions")
Fixes: e746aec161cc ("common/cnxk: fix SQ flush sequence")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

common/cnxk: enhance CPT parsing header dump

Enhance CPT parse header dump to dump fragment info
and swap pointers before printing.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

common/cnxk: support same TC value across multiple queues

User may want to configure same TC value across multiple queues, but
for that all queues should have a common TL3 where this TC value will
get configured.

Changed the pfc_tc_cq_map/pfc_tc_sq_map array indexing to qid and store
TC values in the array. As multiple queues may have same TC value.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

common/cnxk: add PFC support for VF

Current PFC implementation does not support VFs.
This patch enables PFC on VFs too.

Also fix the config of aura.bp to be based on number
of buffers(aura.limit) and corresponding shift
value(aura.shift).

Fixes: cb4bfd6e7bdf ("event/cnxk: support Rx adapter")
Cc: stable@dpdk.org
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>

common/cnxk: avoid CPT backpressure due to errata

Avoid enabling CPT backpressure due to errata where
backpressure would block requests from even other
CPT LF's. Also allow CQ size >=16K.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

common/cnxk: use computed value for WQE skip

Use computed value for WQE skip instead of a hard-coded value.
WQE skip needs to be number of 128B lines to accommodate rte_mbuf.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

common/cnxk: add include for macro definition

The header file "roc_io.h" uses the "__plt_always_inline" macro but
don't include "roc_platform.h" to get the definition of it. This
inclusion is not necessary for compilation, but the lack of it can
confuse some indexers - such as those in eclipse, which reports the
lines:

"static __plt_always_inline uint64_t"

as possible definitions of a variable called "uint64_t". This confusion
leads to uint64_t being flagged as an unknown type in all other parts of
the project being indexed, e.g. across all of DPDK code.

Adding in the include of roc_platform.h makes it clear to the indexer
that those lines are part of a function definition, and that allows
eclipse to correctly recognise uint64_t as a type from stdint.h

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: add ROC API to free MCAM entry

Add ROC API to free the given MCAM entry. If the MCAM
entry has flow counter associated, this API will clear
and free the flow counter.

Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: support CNF10KB SoC

Support for CNF10KB SoC by adding its PCI device ID.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

common/cnxk: fix CN103XX subsystem device ID

Fix the subsystem device ID for CN103XX.

Fixes: dd462f68f04a ("common/cnxk: support CN103XX platform")
Cc: stable@dpdk.org
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>

common/cnxk: update extra stats for inline device

Inline device's NIX RX and RQ stats are updated
on ethdev extra stats

Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>

mempool/cnxk: support optional wait when counting

When counting the batch allocated pointers in cnxk mempool driver,
currently it always waits for in-flight batch operations to finish.
Add a provision to make this waiting optional.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>

common/cnxk: handle ROC model init failure

Return with error on fail to initialize ROC model.

Fixes: 014a9e222bac ("common/cnxk: add model init and IO handling API")
Cc: stable@dpdk.org
Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

common/cnxk: print NIX inline outbound CPT LF registers

This add the support to dump NIX inline outbound CPT LF
registers.

Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>

common/cnxk: fix decrypt packet count register update

Corrects the CPT decrypt packet counter register.

Fixes: b1a22e5d4f ("common/cnxk: add CPT diagnostics")
Cc: stable@dpdk.org
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>

doc: add platform option in cnxk native build

Update cnxk platform documentation to use
-Dplatform meson option for native builds.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

common/cnxk: support dumping flow MCAM entry data

When dumping flow data, read hardware MCAM entry corresponding
to the flow and print that data also.

Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Reviewed-by: Kiran Kumar K <kirankumark@marvell.com>

net/cnxk: fix crash in IPsec telemetry

Calling this telemetry callback with no argument caused a crash.

Fixes: 41cc645c214f ("net/cnxk: add inline IPsec telemetry for CN9K")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>

common/cnxk: add macros to platform layer

Added new platform layer macros for pointer operations,
bitwise operations, spinlock and 32 bit read and write.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: fix channel number setting in MCAM entries

Adding changes to accommodate the following requirements
while masking the channel number.
1. For CN10K device, channel number should not be masked
   for first pass rules with RTE_FLOW_ACTION_TYPE_SECURITY
   action. And channel number should be masked for all
   other flow rules.
2. For CN9K device channel number should not be masked.

Fixes: 4968b362b639 ("common/cnxk: support CPT second pass flow rules")
Cc: stable@dpdk.org
Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Reviewed-by: Kiran Kumar K <kirankumark@marvell.com>

net/thunderx: populate max and min MTU values

Populate maximum and minimum MTU values while retrieving
device information.

Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

net/thunderx: support attaching from secondary

Adding support for device hotplugging - attach and detach from
secondary

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: support allmulticast

Implement allmulticast operations for octeontx driver:
rte_eth_allmulticast_enable()/rte_eth_allmulticast_disable().

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: support xstats

Adding support for xstats eth operations.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/thunderx: support setting link attributes

Adding support to configure link attributes like speed,
duplex, negotiation.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/thunderx: reset Rx DMAC control register

During initialization, reset RX DMAC control register by
sending mbox message NIC_MBOX_MSG_RESET_XCAST to PF.

Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

net/thunderx: support polling of link state change

Moving the logic of link polling to VF from PF. Now VF
is supposed to poll for the link status, rather PF alerting
VF about any link change.

Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

net/octeontx: handle port reconfiguration

Adding support for port reconfiguration as user may require to
do so on a running system.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: support setting link attributes

Adding support to configure link attributes like speed,
duplex, negotiation.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: fix port close

Segmentation fault has been observed while closing the ethernet
port. Reason for the segfault is, eth port close also shuts down
event device while other ethernet port is still using the event
device.

Fixes: da6c687471a3 ("net/octeontx: add start and stop support")
Cc: stable@dpdk.org
Signed-off-by: Harman Kalra <hkalra@marvell.com>

ci: enable C++ check for Arm and PPC

The crossbuild-essential-<arch> packages contain all necessary
dependencies to cross-compile binaries for a given architecture
including C and C++ compilers. Therefore use those instead of listing
packages directly. This way C++ compiler is also installed and C++
include checks will be checked in CI for ARM and PowerPC.

Cc: stable@dpdk.org
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

config: fix C++ cross compiler for Arm and PPC

Through some mixup all cross-files for ARM and PowerPC platforms were
using C Preprocessor (cpp) instead of GCC (g++).
This caused meson to fail detecting the C++ compiler presence and
therefore disabling some targets (i.e. C++ include file checks).

Fixes: e53a5299d219 ("build: support vendor specific ARM cross builds")
Cc: stable@dpdk.org
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

malloc: fix allocation of almost hugepage size

If called to allocate memory of size is between multiple of hugepage
size minus malloc_header_len and hugepage size, rte_malloc fails.

This fix replaces malloc_elem_trailer_len with malloc_elem_overhead in
try_expand_heap() to include malloc_elem_header_len when calculating
n_seg.

Bugzilla ID: 800
Fixes: 07dcbfe0101f ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org
Signed-off-by: Fidaullah Noonari <fidaullah.noonari@emumba.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

eal/unix: make stack dump signal safe

rte_dump_stack() needs to be usable in situations when a bug is
encountered and from signal handlers (such as SEGV).

Glibc backtrace_symbols() calls malloc which makes it
dangerous in a signal handler that is handling errors that maybe
due to memory corruption. Additionally, rte_log() is unsafe because
syslog() is not signal safe; printf() is also documented as
not being safe.

This version formats message and uses writev for each line in a manner
similar to what glibc version of backtrace_symbols_fd() does. The
FreeBSD version of backtrace_symbols_fd() is not signal safe.

Sample output:

0: ./build/app/dpdk-testpmd (rte_dump_stack+0x2b) [560a6e9c002b]
1: ./build/app/dpdk-testpmd (main+0xad) [560a6decd5ad]
2: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xcd) [7fd43d3e27fd]
3: ./build/app/dpdk-testpmd (_start+0x2a) [560a6e83628a]

Bugzilla ID: 929

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>

vhost/crypto: fix descriptor processing

copy_data was returning a pointer to an increased (off by one) descriptor.
Subsequent calls to copy_data in the library were then failing.
Fix this by incrementing the descriptor only if there is some left data
to copy.

Fixes: 4414bb67010d ("vhost/crypto: fix build with GCC 12")
Cc: stable@dpdk.org
Reported-by: Jakub Poczatek <jakub.poczatek@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Jakub Poczatek <jakub.poczatek@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>

net/virtio: unmap PCI device in secondary process

In multi-process, the secondary process will remap PCI during
initialization, but the mapping is not removed in the uninit path,
the device is not closed, and the device busy error will be reported
when the device is hotplugged.

This patch unmaps PCI device at secondary process uninitialization
based on virtio_rempa_pci.

Fixes: 36a7a2e7a53f ("net/virtio: move PCI device init in dedicated file")
Cc: stable@dpdk.org
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Tested-by: Wei Ling <weix.ling@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

vhost/crypto: fix build with GCC 12

GCC 12 raises the following warning:

In file included from ../lib/mempool/rte_mempool.h:46,
                 from ../lib/mbuf/rte_mbuf.h:38,
                 from ../lib/vhost/vhost_crypto.c:7:
../lib/vhost/vhost_crypto.c: In function ‘rte_vhost_crypto_fetch_requests’:
../lib/eal/x86/include/rte_memcpy.h:371:9: warning: array subscript 1 is
     outside array bounds of ‘struct virtio_crypto_op_data_req[1]’
     [-Warray-bounds]
  371 | rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../lib/vhost/vhost_crypto.c:1178:42: note: while referencing ‘req’
1178 |         struct virtio_crypto_op_data_req req;
      |                                          ^~~

Split this function and separate the per descriptor copy.
This makes the code clearer, and the compiler happier.

Note: logs for errors have been moved to callers to avoid duplicates.

Fixes: 3c79609fda7c ("vhost/crypto: handle virtually non-contiguous buffers")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: prepare virtqueue resource creation

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add virtq sub-resources creation

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add device close task

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add virtq live migration log task

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add virtq creation task

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add VM memory registration task

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add task ring for multi-thread management

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add multi-thread management for configuration

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: optimize datapath-control synchronization

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: pre-create virtq at probing time

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

common/mlx5: extend virtq modifiable fields

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: reuse event queues

To speed up queue creation time, event QP and CQ will create only once.
Each virtq creation will reuse same event QP and CQ.

Because FW will set event QP to error state during virtq destroy,
need modify event QP to RESET state, then modify QP to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW QP reset, QP pi/ci all become 0 while CQ pi/ci keep as
previous. Add new variable qp_ci to save SW QP ci. Move QP pi
independently with CQ ci.

Add new function mlx5_vdpa_drain_cq to drain CQ CQE after virtq
release.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

common/mlx5: add DevX API to move queues to reset state

Support set QP to RESET state.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: support pre-creation of virtq resource

The motivation of this change is to reduce vDPA device queue creation
time by creating some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: fix maximum number of virtqs

The driver wrongly takes the capability value for
the number of virtq pairs instead of just the number of virtqs.

Adjust all the usages of it to be the number of virtqs.

Fixes: c2eb33aaf967 ("vdpa/mlx5: manage virtqs by array")
Cc: stable@dpdk.org
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix log message for async dequeue

Since the commit 02798b073520 ("vhost: improve virtio-net layer logs"),
vhost logs contain the socket path as a prefix.
Async dequeue path was copied from the sync dequeue path but a log
was incorrect.

Fixes: 84d5204310d7 ("vhost: support async dequeue for split ring")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix statistics update in async dequeue

This patch adds missing per-virtqueue statistics in async dequeue path.

Fixes: 84d5204310d7 ("vhost: support async dequeue for split ring")
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Wei Ling <weix.ling@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

vhost: rename number of available entries

This patchs renames the local variables free_entries to
avail_entries in the dequeue path.

Indeed, this variable represents the number of new packets
available in the Virtio transmit queue, so these entries
are actually used, not free.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

vdpa/mlx5: workaround VAR offset within page

vDPA driver first uses kernel driver to allocate doorbell (VAR) area for
each device. Then uses var->mmap_off and var->length to mmap uverbs device
file as doorbell userspace virtual address.

Current kernel driver provides var->mmap_off equal to page start of VAR.
It's fine with x86 4K page server, because VAR physical address is only 4K
aligned thus locate in 4K page start.

But with aarch64 64K page server, the actual VAR physical address has
offset within page (not located in 64K page start).
So the vDPA driver needs to add this within page offset
(caps.doorbell_bar_offset) to get the right VAR virtual address.

Fixes: 62c813706e4 ("vdpa/mlx5: map doorbell")
Cc: stable@dpdk.org
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: support async packed ring dequeue

This patch implements packed ring dequeue data path
for asynchronous vhost.

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/ifc/base: fix null pointer dereference

Fix null pointer dereference reported in coverity scan.

Coverity issue: 378882
Fixes: 5d75517beffe ("vdpa/ifc/base: access block device registers")
Signed-off-by: Andy Pei <andy.pei@intel.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

examples/vhost: support clear in-flight for async dequeue

This patch allows vring_state_changed() to clear in-flight
dequeue packets. It also clears the in-flight packets in
a thread-safe way in destroy_device().

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: support clear in-flight packets for async dequeue

rte_vhost_clear_queue_thread_unsafe() supports to clear
in-flight packets for async enqueue only. But after
supporting async dequeue, this API should support async dequeue too.

This patch also adds the thread-safe version of this API,
the difference between the two API is that thread safety uses lock.

These APIs maybe used to clean up packets in the async channel
to prevent packet loss when the device state changes or
when the device is destroyed.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

net/vhost: perform SW checksum in Tx path

Virtio specification supports guest checksum offloading
for L4, which is enabled with VIRTIO_NET_F_GUEST_CSUM
feature negotiation. However, the Vhost PMD does not
advertise Tx checksum offload capabilities.

Advertising these offload capabilities at the ethdev level
is not enough, because we could still end-up with the
application enabling these offloads while the guest not
negotiating it.

This patch advertises the Tx checksum offload capabilities,
and introduces a compatibility layer to cover the case
VIRTIO_NET_F_GUEST_CSUM has not been negotiated but the
application does configure the Tx checksum offloads. This
function performs the L4 Tx checksum in SW for UDP and TCP.
Compared to Rx SW checksum, the Tx SW checksum function
needs to compute the pseudo-header checksum, as we cannot
know whether it was done before.

This patch does not advertise SCTP checksum offloading
capability for now, but it could be handled later if the
need arises.

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Cheng Jiang <cheng1.jiang@intel.com>

net/vhost: perform SW checksum in Rx path

Virtio specification supports host checksum offloading
for L4, which is enabled with VIRTIO_NET_F_CSUM feature
negotiation. However, the Vhost PMD does not advertise
Rx checksum offload capabilities, so we can end-up with
the VIRTIO_NET_F_CSUM feature being negotiated, implying
the Vhost library returns packets with checksum being
offloaded while the application did not request for it.

Advertising these offload capabilities at the ethdev level
is not enough, because we could still end-up with the
application not enabling these offloads while the guest
still negotiate them.

This patch advertises the Rx checksum offload capabilities,
and introduces a compatibility layer to cover the case
VIRTIO_NET_F_CSUM has been negotiated but the application
does not configure the Rx checksum offloads. This function
performis the L4 Rx checksum in SW for UDP and TCP. Note
that it is not needed to calculate the pseudo-header
checksum, because the Virtio specification requires that
the driver do it.

This patch does not advertise SCTP checksum offloading
capability for now, but it could be handled later if the
need arises.

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Cheng Jiang <cheng1.jiang@intel.com>

net/vhost: make VLAN stripping flag a boolean

This trivial patch makes the vlan_strip field of the
pmd_internal struct a boolean, since it is handled as
such.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

net/vhost: enable compliant offloading mode

This patch enables the compliant offloading flags mode by
default, which prevents the Rx path to set Tx offload flags,
which is illegal. A new legacy-ol-flags devarg is introduced
to enable the legacy behaviour.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

vhost: fix missing enqueue pseudo-header calculation

The Virtio specification requires that in case of checksum
offloading, the pseudo-header checksum must be set in the
L4 header.

When received from another Vhost-user port, the packet
checksum might already contain the pseudo-header checksum
but we have no way to know it. So we have no other choice
than doing the pseudo-header checksum systematically.

This patch handles this using the rte_net_intel_cksum_prepare()
helper.

Fixes: 859b480d5afd ("vhost: add guest offload setting")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

app/testpmd: revert MAC update in checksum forwarding

This patch reverts
commit 10f4620f02e1 ("app/testpmd: modify mac in csum forwarding"),
as the checksum forwarding is expected to only perform
checksum and not also overwrites the source and destination MAC addresses.

Doing so, we can test checksum offloading with real traffic
without breaking broadcast packets.

Fixes: 10f4620f02e1 ("app/testpmd: modify mac in csum forwarding")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Aman Singh <aman.deep.singh@intel.com>

net/ngbe: support YT PHY SGMII to RGMII mode

Add SGMII to RGMII mode for yt8521s and yt8531s PHY.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support autoneg on/off for external PHY SFI mode

Add support for external PHY to switch autoneg on/off on their SFI mode.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: fix YT PHY UTP mode to link up

Fix to read and write the correct register fields for yt8521s and
yt8531s PHY, since mode check was added.

Fixes: 1c44384fce76 ("net/ngbe: support custom PHY interfaces")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: add more packet statistics

Add more hardware extended statistics.

Fixes: 8b433d04adc9 ("net/ngbe: support device xstats")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/txgbe: fix register polling

Fix to poll some specific registers, which expect bit value 0.

'w32w' is used in registers where the write command bit is set and
waits for the bit clear to complete the write.

Fixes: 24a4c76aff4d ("net/txgbe: add error types and registers")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support OEM subsystem vendor ID

Add support for OEM subsystem vendor ID.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/txgbe: support OEM subsystem vendor ID

Add support for OEM subsystem vendor ID.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/i40e: move testpmd commands

Move related specific testpmd commands into this driver directory.
While at it, fix checkpatch warnings.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@xilinx.com>

net/bonding: move testpmd commands

Move related specific testpmd commands into this driver directory.
While at it, fix checkpatch warnings.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@xilinx.com>

net/nfp: fix initialization

When the testpmd start-up, it will check MTU range,
if MTU > flubfsz, it will lead testpmd start fail.
Because the hw->flbufsz doesn't have the initialized
value, so it will lead the bug.

Fixes: 417be15e5f11 ("net/nfp: make sure MTU is never larger than mbuf size")
Cc: stable@dpdk.org
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>

net/nfp: modify RSS logic

Now NFP NIC support two type of RSS logic, NFP_NET_CFG_CTRL_RSS and
NFP_NET_CFG_CTRL_RSS2, use NFP_NET_CFG_CTRL_RSS2 if NIC capability
support, otherwise use NFP_NET_CFG_CTRL_RSS.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: move round macros to header file

Move macro __round_mask, round_up and round_down from C file to
corresponding head file, will be used by TX function of nfp net
firmware with NFDk.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: add queue stop and close helper functions

This commit does not introduce new features, just integrate some common
logic into helper functions to reduce the same logic and increase code
reuse, include queue stop and queue close logic, will be used when NFP
net stop and close.

queue stop: reset queue
queue close: reset and release queue

Modify NFP net stop and close function, use helper function to stop
and close queue instead of before logic.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: add NFDk option and queue function

Add ethdev option for firmware with NFDk, implement tx_queue setup
function for firmware with NFDk.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: adjust structures

Add and modify the nfp PMD struct and macro that will be used by firmware
with NFDk.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>