git.droids-corp.org - dpdk.git/log

common/cnxk: fix channel number setting in MCAM entries

Adding changes to accommodate the following requirements
while masking the channel number.
1. For CN10K device, channel number should not be masked
   for first pass rules with RTE_FLOW_ACTION_TYPE_SECURITY
   action. And channel number should be masked for all
   other flow rules.
2. For CN9K device channel number should not be masked.

Fixes: 4968b362b639 ("common/cnxk: support CPT second pass flow rules")
Cc: stable@dpdk.org
Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Reviewed-by: Kiran Kumar K <kirankumark@marvell.com>

net/thunderx: populate max and min MTU values

Populate maximum and minimum MTU values while retrieving
device information.

Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

net/thunderx: support attaching from secondary

Adding support for device hotplugging - attach and detach from
secondary

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: support allmulticast

Implement allmulticast operations for octeontx driver:
rte_eth_allmulticast_enable()/rte_eth_allmulticast_disable().

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: support xstats

Adding support for xstats eth operations.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/thunderx: support setting link attributes

Adding support to configure link attributes like speed,
duplex, negotiation.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/thunderx: reset Rx DMAC control register

During initialization, reset RX DMAC control register by
sending mbox message NIC_MBOX_MSG_RESET_XCAST to PF.

Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

net/thunderx: support polling of link state change

Moving the logic of link polling to VF from PF. Now VF
is supposed to poll for the link status, rather PF alerting
VF about any link change.

Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

net/octeontx: handle port reconfiguration

Adding support for port reconfiguration as user may require to
do so on a running system.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: support setting link attributes

Adding support to configure link attributes like speed,
duplex, negotiation.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: fix port close

Segmentation fault has been observed while closing the ethernet
port. Reason for the segfault is, eth port close also shuts down
event device while other ethernet port is still using the event
device.

Fixes: da6c687471a3 ("net/octeontx: add start and stop support")
Cc: stable@dpdk.org
Signed-off-by: Harman Kalra <hkalra@marvell.com>

ci: enable C++ check for Arm and PPC

The crossbuild-essential-<arch> packages contain all necessary
dependencies to cross-compile binaries for a given architecture
including C and C++ compilers. Therefore use those instead of listing
packages directly. This way C++ compiler is also installed and C++
include checks will be checked in CI for ARM and PowerPC.

Cc: stable@dpdk.org
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

config: fix C++ cross compiler for Arm and PPC

Through some mixup all cross-files for ARM and PowerPC platforms were
using C Preprocessor (cpp) instead of GCC (g++).
This caused meson to fail detecting the C++ compiler presence and
therefore disabling some targets (i.e. C++ include file checks).

Fixes: e53a5299d219 ("build: support vendor specific ARM cross builds")
Cc: stable@dpdk.org
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

malloc: fix allocation of almost hugepage size

If called to allocate memory of size is between multiple of hugepage
size minus malloc_header_len and hugepage size, rte_malloc fails.

This fix replaces malloc_elem_trailer_len with malloc_elem_overhead in
try_expand_heap() to include malloc_elem_header_len when calculating
n_seg.

Bugzilla ID: 800
Fixes: 07dcbfe0101f ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org
Signed-off-by: Fidaullah Noonari <fidaullah.noonari@emumba.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

eal/unix: make stack dump signal safe

rte_dump_stack() needs to be usable in situations when a bug is
encountered and from signal handlers (such as SEGV).

Glibc backtrace_symbols() calls malloc which makes it
dangerous in a signal handler that is handling errors that maybe
due to memory corruption. Additionally, rte_log() is unsafe because
syslog() is not signal safe; printf() is also documented as
not being safe.

This version formats message and uses writev for each line in a manner
similar to what glibc version of backtrace_symbols_fd() does. The
FreeBSD version of backtrace_symbols_fd() is not signal safe.

Sample output:

0: ./build/app/dpdk-testpmd (rte_dump_stack+0x2b) [560a6e9c002b]
1: ./build/app/dpdk-testpmd (main+0xad) [560a6decd5ad]
2: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xcd) [7fd43d3e27fd]
3: ./build/app/dpdk-testpmd (_start+0x2a) [560a6e83628a]

Bugzilla ID: 929

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>

vhost/crypto: fix descriptor processing

copy_data was returning a pointer to an increased (off by one) descriptor.
Subsequent calls to copy_data in the library were then failing.
Fix this by incrementing the descriptor only if there is some left data
to copy.

Fixes: 4414bb67010d ("vhost/crypto: fix build with GCC 12")
Cc: stable@dpdk.org
Reported-by: Jakub Poczatek <jakub.poczatek@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Jakub Poczatek <jakub.poczatek@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>

net/virtio: unmap PCI device in secondary process

In multi-process, the secondary process will remap PCI during
initialization, but the mapping is not removed in the uninit path,
the device is not closed, and the device busy error will be reported
when the device is hotplugged.

This patch unmaps PCI device at secondary process uninitialization
based on virtio_rempa_pci.

Fixes: 36a7a2e7a53f ("net/virtio: move PCI device init in dedicated file")
Cc: stable@dpdk.org
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Tested-by: Wei Ling <weix.ling@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

vhost/crypto: fix build with GCC 12

GCC 12 raises the following warning:

In file included from ../lib/mempool/rte_mempool.h:46,
                 from ../lib/mbuf/rte_mbuf.h:38,
                 from ../lib/vhost/vhost_crypto.c:7:
../lib/vhost/vhost_crypto.c: In function ‘rte_vhost_crypto_fetch_requests’:
../lib/eal/x86/include/rte_memcpy.h:371:9: warning: array subscript 1 is
     outside array bounds of ‘struct virtio_crypto_op_data_req[1]’
     [-Warray-bounds]
  371 | rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../lib/vhost/vhost_crypto.c:1178:42: note: while referencing ‘req’
1178 |         struct virtio_crypto_op_data_req req;
      |                                          ^~~

Split this function and separate the per descriptor copy.
This makes the code clearer, and the compiler happier.

Note: logs for errors have been moved to callers to avoid duplicates.

Fixes: 3c79609fda7c ("vhost/crypto: handle virtually non-contiguous buffers")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: prepare virtqueue resource creation

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add virtq sub-resources creation

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add device close task

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add virtq live migration log task

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add virtq creation task

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add VM memory registration task

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add task ring for multi-thread management

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add multi-thread management for configuration

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: optimize datapath-control synchronization

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: pre-create virtq at probing time

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

common/mlx5: extend virtq modifiable fields

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: reuse event queues

To speed up queue creation time, event QP and CQ will create only once.
Each virtq creation will reuse same event QP and CQ.

Because FW will set event QP to error state during virtq destroy,
need modify event QP to RESET state, then modify QP to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW QP reset, QP pi/ci all become 0 while CQ pi/ci keep as
previous. Add new variable qp_ci to save SW QP ci. Move QP pi
independently with CQ ci.

Add new function mlx5_vdpa_drain_cq to drain CQ CQE after virtq
release.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

common/mlx5: add DevX API to move queues to reset state

Support set QP to RESET state.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: support pre-creation of virtq resource

The motivation of this change is to reduce vDPA device queue creation
time by creating some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: fix maximum number of virtqs

The driver wrongly takes the capability value for
the number of virtq pairs instead of just the number of virtqs.

Adjust all the usages of it to be the number of virtqs.

Fixes: c2eb33aaf967 ("vdpa/mlx5: manage virtqs by array")
Cc: stable@dpdk.org
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix log message for async dequeue

Since the commit 02798b073520 ("vhost: improve virtio-net layer logs"),
vhost logs contain the socket path as a prefix.
Async dequeue path was copied from the sync dequeue path but a log
was incorrect.

Fixes: 84d5204310d7 ("vhost: support async dequeue for split ring")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix statistics update in async dequeue

This patch adds missing per-virtqueue statistics in async dequeue path.

Fixes: 84d5204310d7 ("vhost: support async dequeue for split ring")
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Wei Ling <weix.ling@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

vhost: rename number of available entries

This patchs renames the local variables free_entries to
avail_entries in the dequeue path.

Indeed, this variable represents the number of new packets
available in the Virtio transmit queue, so these entries
are actually used, not free.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

vdpa/mlx5: workaround VAR offset within page

vDPA driver first uses kernel driver to allocate doorbell (VAR) area for
each device. Then uses var->mmap_off and var->length to mmap uverbs device
file as doorbell userspace virtual address.

Current kernel driver provides var->mmap_off equal to page start of VAR.
It's fine with x86 4K page server, because VAR physical address is only 4K
aligned thus locate in 4K page start.

But with aarch64 64K page server, the actual VAR physical address has
offset within page (not located in 64K page start).
So the vDPA driver needs to add this within page offset
(caps.doorbell_bar_offset) to get the right VAR virtual address.

Fixes: 62c813706e4 ("vdpa/mlx5: map doorbell")
Cc: stable@dpdk.org
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: support async packed ring dequeue

This patch implements packed ring dequeue data path
for asynchronous vhost.

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/ifc/base: fix null pointer dereference

Fix null pointer dereference reported in coverity scan.

Coverity issue: 378882
Fixes: 5d75517beffe ("vdpa/ifc/base: access block device registers")
Signed-off-by: Andy Pei <andy.pei@intel.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

examples/vhost: support clear in-flight for async dequeue

This patch allows vring_state_changed() to clear in-flight
dequeue packets. It also clears the in-flight packets in
a thread-safe way in destroy_device().

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: support clear in-flight packets for async dequeue

rte_vhost_clear_queue_thread_unsafe() supports to clear
in-flight packets for async enqueue only. But after
supporting async dequeue, this API should support async dequeue too.

This patch also adds the thread-safe version of this API,
the difference between the two API is that thread safety uses lock.

These APIs maybe used to clean up packets in the async channel
to prevent packet loss when the device state changes or
when the device is destroyed.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

net/vhost: perform SW checksum in Tx path

Virtio specification supports guest checksum offloading
for L4, which is enabled with VIRTIO_NET_F_GUEST_CSUM
feature negotiation. However, the Vhost PMD does not
advertise Tx checksum offload capabilities.

Advertising these offload capabilities at the ethdev level
is not enough, because we could still end-up with the
application enabling these offloads while the guest not
negotiating it.

This patch advertises the Tx checksum offload capabilities,
and introduces a compatibility layer to cover the case
VIRTIO_NET_F_GUEST_CSUM has not been negotiated but the
application does configure the Tx checksum offloads. This
function performs the L4 Tx checksum in SW for UDP and TCP.
Compared to Rx SW checksum, the Tx SW checksum function
needs to compute the pseudo-header checksum, as we cannot
know whether it was done before.

This patch does not advertise SCTP checksum offloading
capability for now, but it could be handled later if the
need arises.

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Cheng Jiang <cheng1.jiang@intel.com>

net/vhost: perform SW checksum in Rx path

Virtio specification supports host checksum offloading
for L4, which is enabled with VIRTIO_NET_F_CSUM feature
negotiation. However, the Vhost PMD does not advertise
Rx checksum offload capabilities, so we can end-up with
the VIRTIO_NET_F_CSUM feature being negotiated, implying
the Vhost library returns packets with checksum being
offloaded while the application did not request for it.

Advertising these offload capabilities at the ethdev level
is not enough, because we could still end-up with the
application not enabling these offloads while the guest
still negotiate them.

This patch advertises the Rx checksum offload capabilities,
and introduces a compatibility layer to cover the case
VIRTIO_NET_F_CSUM has been negotiated but the application
does not configure the Rx checksum offloads. This function
performis the L4 Rx checksum in SW for UDP and TCP. Note
that it is not needed to calculate the pseudo-header
checksum, because the Virtio specification requires that
the driver do it.

This patch does not advertise SCTP checksum offloading
capability for now, but it could be handled later if the
need arises.

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Cheng Jiang <cheng1.jiang@intel.com>

net/vhost: make VLAN stripping flag a boolean

This trivial patch makes the vlan_strip field of the
pmd_internal struct a boolean, since it is handled as
such.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

net/vhost: enable compliant offloading mode

This patch enables the compliant offloading flags mode by
default, which prevents the Rx path to set Tx offload flags,
which is illegal. A new legacy-ol-flags devarg is introduced
to enable the legacy behaviour.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

vhost: fix missing enqueue pseudo-header calculation

The Virtio specification requires that in case of checksum
offloading, the pseudo-header checksum must be set in the
L4 header.

When received from another Vhost-user port, the packet
checksum might already contain the pseudo-header checksum
but we have no way to know it. So we have no other choice
than doing the pseudo-header checksum systematically.

This patch handles this using the rte_net_intel_cksum_prepare()
helper.

Fixes: 859b480d5afd ("vhost: add guest offload setting")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

app/testpmd: revert MAC update in checksum forwarding

This patch reverts
commit 10f4620f02e1 ("app/testpmd: modify mac in csum forwarding"),
as the checksum forwarding is expected to only perform
checksum and not also overwrites the source and destination MAC addresses.

Doing so, we can test checksum offloading with real traffic
without breaking broadcast packets.

Fixes: 10f4620f02e1 ("app/testpmd: modify mac in csum forwarding")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Aman Singh <aman.deep.singh@intel.com>

net/ngbe: support YT PHY SGMII to RGMII mode

Add SGMII to RGMII mode for yt8521s and yt8531s PHY.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support autoneg on/off for external PHY SFI mode

Add support for external PHY to switch autoneg on/off on their SFI mode.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: fix YT PHY UTP mode to link up

Fix to read and write the correct register fields for yt8521s and
yt8531s PHY, since mode check was added.

Fixes: 1c44384fce76 ("net/ngbe: support custom PHY interfaces")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: add more packet statistics

Add more hardware extended statistics.

Fixes: 8b433d04adc9 ("net/ngbe: support device xstats")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/txgbe: fix register polling

Fix to poll some specific registers, which expect bit value 0.

'w32w' is used in registers where the write command bit is set and
waits for the bit clear to complete the write.

Fixes: 24a4c76aff4d ("net/txgbe: add error types and registers")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/ngbe: support OEM subsystem vendor ID

Add support for OEM subsystem vendor ID.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/txgbe: support OEM subsystem vendor ID

Add support for OEM subsystem vendor ID.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

net/i40e: move testpmd commands

Move related specific testpmd commands into this driver directory.
While at it, fix checkpatch warnings.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@xilinx.com>

net/bonding: move testpmd commands

Move related specific testpmd commands into this driver directory.
While at it, fix checkpatch warnings.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@xilinx.com>

net/nfp: fix initialization

When the testpmd start-up, it will check MTU range,
if MTU > flubfsz, it will lead testpmd start fail.
Because the hw->flbufsz doesn't have the initialized
value, so it will lead the bug.

Fixes: 417be15e5f11 ("net/nfp: make sure MTU is never larger than mbuf size")
Cc: stable@dpdk.org
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>

net/nfp: modify RSS logic

Now NFP NIC support two type of RSS logic, NFP_NET_CFG_CTRL_RSS and
NFP_NET_CFG_CTRL_RSS2, use NFP_NET_CFG_CTRL_RSS2 if NIC capability
support, otherwise use NFP_NET_CFG_CTRL_RSS.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: move round macros to header file

Move macro __round_mask, round_up and round_down from C file to
corresponding head file, will be used by TX function of nfp net
firmware with NFDk.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: add queue stop and close helper functions

This commit does not introduce new features, just integrate some common
logic into helper functions to reduce the same logic and increase code
reuse, include queue stop and queue close logic, will be used when NFP
net stop and close.

queue stop: reset queue
queue close: reset and release queue

Modify NFP net stop and close function, use helper function to stop
and close queue instead of before logic.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: add NFDk option and queue function

Add ethdev option for firmware with NFDk, implement tx_queue setup
function for firmware with NFDk.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: adjust structures

Add and modify the nfp PMD struct and macro that will be used by firmware
with NFDk.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: support firmware with NFDk

Modify nfp driver logic, add firmware version (NFD3 or NFDK) judgment, will
according to the firmware version, mount different driver functions.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: support NFP3800 card

Add support for a new type of NIC NFP3800 card, and update some
network card data acquisition interface functions.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: rename functions and structs

Add 'nfd3' into the firmware with NFD3 eth driver function name,
preparation for the next work, as we will support another version
firmware with NFDk.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: rename set MAC function

The NFP eth driver function name start with 'nfp_net', but set_mac
function start with 'nfp' only, rename it, be consistent with others.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: remove pessimistic limit

Multiple writes cause intermediate pointer values that do not
end on complete TX descriptors.

The QCP peripheral on the NFP provides a number of access
modes. In some access modes, the maximum amount to add must
be restricted to a 6bit value. The particular access mode
used by _nfp_qcp_ptr_add() has no such restrictions, so the
"NFP_QCP_MAX_ADD" test is unnecessary.

Note that trying to add more that the configured ring size
in a single add will cause a QCP overflow, caught and handled
by the QCP peripheral.

Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: remove unnecessary forward function declaration

This commit remove some unnecessary forward function
declarations.

Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

net/nfp: refactor coding style

Change the coding style of some logics, to make it more
compatible with the DPDK coding style.

Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Signed-off-by: Jin Liu <jin.liu@corigine.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>

app/testpmd: fix bonding slave devices not released

Currently, some eth devices are added to bond device, these devices are
not released when the quit command is executed in testpmd. This patch
adds the release operation for all active slaves under a bond device.

Fixes: 0e545d3047fe ("app/testpmd: check stopping port is not in bonding")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@xilinx.com>

app/testpmd: add help messages for multi-process

This patch adds help messages for multi-process.
--num-procs=N: set the total number of multi-process instances.
--proc-id=id: set the id of the current process from multi-process
instances(0 <= id < num-procs).

Fixes: a550baf24af9 ("app/testpmd: support multi-process")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@xilinx.com>

net/cxgbe: fix build with optimization=1

Initialize maddr and mtype to fix following warnings when
using optimization=1 compilation flag.

In file included from ../drivers/net/cxgbe/base/common.h:13,
                 from ../drivers/net/cxgbe/cxgbe_main.c:37:
../drivers/net/cxgbe/cxgbe_main.c: In function ‘cxgbe_probe’:
../drivers/net/cxgbe/base/t4fw_interface.h:656:7:
warning: ‘maddr’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
  ((x) << S_FW_CAPS_CONFIG_CMD_MEMADDR64K_CF)
       ^~
../drivers/net/cxgbe/cxgbe_main.c:1111:40:
note: ‘maddr’ was declared here
  u32 finiver, finicsum, cfcsum, mtype, maddr, param, val;
                                        ^~~~~
In file included from ../drivers/net/cxgbe/base/common.h:13,
                 from ../drivers/net/cxgbe/cxgbe_main.c:37:
../drivers/net/cxgbe/base/t4fw_interface.h:648:7:
warning: ‘mtype’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
  ((x) << S_FW_CAPS_CONFIG_CMD_MEMTYPE_CF)
       ^~
../drivers/net/cxgbe/cxgbe_main.c:1111:33: note: ‘mtype’ was declared here
  u32 finiver, finicsum, cfcsum, mtype, maddr, param, val;
                                 ^~~~~
Bugzilla ID: 1029
Fixes: 6d7d651bbc15 ("net/cxgbe: read firmware configuration file from filesystem")
Reported-by: Daxue Gao <daxuex.gao@intel.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>

net/hns3: fix TM capability

The TM capability should be bit-19 according to the user manual of
firmware.

Fixes: fc18d1b4b85f ("net/hns3: fix traffic management")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: fix crash from secondary process

If a hns3 device in the secondary process is attached to do probing
operation, 'rx_queues' and 'tx_queues' in dev->data are null in
eth_dev_fp_ops_setup when calling rte_eth_dev_probing_finish. The primary
process calls dev_start to re-setup their fp_ops. But the secondary process
can't call dev_start and has no chance to do it. If the application sends
and receives packets at this time, a segfault will occur. So this patch
uses the MP communication of the PMD to update the fp_ops of the device in
the secondary process.

Fixes: 96c33cfb06cf ("net/hns3: fix Rx/Tx functions update")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: unify wrapping style

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: modify a function name

The meaning of the "hns3_get_count" function is not precise enough.
Change from "hns3_get_count" to "hns3_fd_get_count".

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: fix return value for unsupported tuple

Driver should return false for unsupported tuple.

Fixes: 18a4b4c3fa80 ("net/hns3: add default to switch when parsing fd tuple")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: fix code check warning

In bitwise operation, "val" should be an unsigned type.

Fixes: 38b539d96eb6 ("net/hns3: support IEEE 1588 PTP")
Cc: stable@dpdk.org
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: remove duplicate definition

The default hash key array is defined twice. Remove the extra one.

Fixes: c37ca66f2b27 ("net/hns3: support RSS")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: fix an unreasonable memset

Fixes: bba636698316 ("net/hns3: support Rx/Tx and related operations")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: adjust data type of some variables

Using the 'int' type and 'uint16_t' type to compare is insecure.
Make them consistent.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: remove redundant parentheses

Remove redundant parentheses.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

net/hns3: add check for deferred start queue when rollback

Driver doesn't allocate mbufs for the deferred start queues, so no need to
free it when rollback.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>

test/bonding: fix RSS test when disable RSS

The "test_rss_lazy" test is used for testing bonding RSS functions
when bonded port disable RSS. Currently, this test case can update
RSS functions of bonded and slave port if bonded port turns off RSS.
It is unreasonable and has been adjusted to be non-updateable in
following patch:
"93e1ea6dfa99 ethdev: fix RSS update when RSS is disabled"

So this patch fixes this test code.

Fixes: 43b630244e7e ("app/test: add dynamic bonding RSS configuration")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>

net/bonding: fix RSS inconsistency between ports

Currently, RSS configuration of slave is set only when RSS is enabled for
bonded port. If RSS is enabled for the slaves port before adding to the
bonded port with disabling RSS, it will run into that the RSS enabled state
of bonded and slaves port is inconsistent after starting bonded port.
So the RSS configuration of slave should also be set when RSS is disabled
for bonded port.

Fixes: 734ce47f71e0 ("bonding: support RSS dynamic configuration")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

app/eventdev: increase number of descriptors

Increase number of cryptodev queue pair descriptors by default. Current
size of 128 descriptors does not satisfying minimal requirements of crypto
drivers.

Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

app/eventdev: add null checks for crypto allocations

Crypto operation allocation may fail in case when total size of queue
pairs are bigger than the pool size.

Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

event/cnxk: initialize work slot read cache

Initialize gw_rdata with tag type EMPTY. Leaving tag type as
zero(ATOMIC) may cause some unnecessary head wait, if cache will be used
before the first update in dequeue/get_work functions.

Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>

eventdev/eth_tx: fix adapter creation

During adapter create, memory is allocated for storing event port
configuration which is freed during adapter free. The following
error is seen during free "EAL: Error: Invalid memory"

The service data pointer storage for txa_service_data_array is
allocated during adapter create with incorrect size which is less
than the required size.
Initialization of this memory causes buffer overflow and result in
metadata overwrite of event port config memory allocated above
and results in the above error message during free.

Allocating the correct size of memory for txa_service_data_array
prevents overwriting other memory areas like event port config
memory.

Fixes: a3bbf2e09756 ("eventdev: add eth Tx adapter implementation")
Cc: stable@dpdk.org
Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>

event/dlb2: support ldb port specific COS

DLB supports 4 class of service domains, to aid in managing the
device bandwidth across ldb ports. This commit allows specifying
which ldb ports will participate in the COS scheme, which class
they are a part of, and the specific bandwidth percentage
associated with each class. The cumulative bandwidth associated
with the 4 classes must not exceed 100%. This feature is enabled
on the command line, and will be documented in the DLB2 programmers
guide.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>

event/dlb2: support CQ weight

Enabling the weight limit on a CQ allows the enqueued QEs' 2-bit weight
value (representing weights of 1, 2, 4, and 8) to factor into whether a
CQ is full. If the sum of the weights of the QEs in the CQ meet or exceed
its weight limit, DLB will stop scheduling QEs to it (until software pops
enough QEs from the CQ to reverse that).

CQ weight support is enabled via the command line, and applies to
DLB 2.5 (and above) load balanced ports. The DLB2 documentation will
be updated with further details.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>

event/dlb2: support single 512B write of 4 QEs

On Xeon, 512b accesses are available, so movdir64 instruction is able to
perform 512b read and write to DLB producer port. In order for movdir64
to be able to pull its data from store buffers (store-buffer-forwarding)
(before actual write), data should be in single 512b write format.
This commit add change when code is built for Xeon with 512b AVX support
to make single 512b write of all 4 QEs instead of 4x64b writes.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Acked-by: Kent Wires <kent.wires@intel.com>

event/dlb2: fix advertized capabilities

This commit corrects the advertized capabilities reported by the DLB2 PMD.

Previously DLB2 reported supporting RTE_EVENT_DEV_CAP_QUEUE_QOS, but the
DLB2 hardware does not support such capability. This commit removes that
feature from the reported capabilities feature set.

Additionally, two capabilities that DLB2 does support were not being
reported in the capabilities feature set. This commit adds those.

RTE_EVENT_DEV_CAP_MULTIPLE_QUEUE_PORT = Event device is capable of
setting up the link between multiple queues and a single port. If the
flag is not set, the eventdev can only map a single queue to each
port or map a single queue to many port

RTE_EVENT_DEV_CAP_RUNTIME_PORT_LINK = Event device is capable of
configuring the queue/port link at runtime. If the flag is not set,
the eventdev queue/port link is only can be configured during
initialization

Finally, the file doc/guides/eventdevs/features/dlb2.ini has been updated
to match the capabilities actually reported by the PMD.

Fixes: e7c9971a857a ("event/dlb2: add probe-time hardware init")
Cc: stable@dpdk.org
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>

common/cnxk: allocate link map array if HWS is available

Link map array is required only if work slots are available.

Signed-off-by: Shijith Thotton <sthotton@marvell.com>

app/eventdev: wait for workers before cryptodev destroy

Destroying cryptodev resources before exiting workers are not safe.
Moved cryptodev destroy after worker thread exit in main thread.

Fixes: de2bc16e1bd1 ("app/eventdev: add crypto producer mode")
Cc: stable@dpdk.org
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

event/cnxk: add Tx adapter freeing

Tx adapter allocate data during eth_tx_adapter_queue_add() call and
it's only cleaned but not freed during eth_tx_adapter_queue_del().
Implemented eth_tx_adapter_free() callback to free adapter data.

Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>

app/eventdev: add Tx first option to pipeline mode

Add Tx first support to pipeline mode tests, the transmission is done
on all the ethernet ports. This helps in testing eventdev performance
with standalone loopback interfaces.

Example:
./dpdk-test-eventdev ... -- ... --tx_first 512

512 defines the number of packets to transmit.
Add an option Tx packet size, the default packet size is 64.

Following example can change packet size value as 320.

Example:
./dpdk-test-eventdev ... -- ... --tx_first 512 --tx_pkt_sz 320

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

examples: use mempool cache for vector pool

Use mempool cache for vector mempool as vectors are freed by the Tx
routine, also increase the minimum pool size to 512 to avoid resource
contention on Rx.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

app/eventdev: use mempool cache for vector pool

Use mempool cache for vector mempool as vectors are freed by the Tx
routine, also increase the minimum pool size to 512 to avoid resource
contention on Rx.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

event/cnxk: fix Tx adapter enqueue return for CN10K

The `rte_event_eth_tx_adapter_enqueue()` function expects driver layer
to return the total number of events successfully transmitted.
Fix cn10k driver returning the number of packets transmitted in a
event vector instead of number of events.

Fixes: 761a321acf91 ("event/cnxk: support vectorized Tx event fast path")
Cc: stable@dpdk.org
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>