dpdk.git
3 years agonet/bnxt: fix device readiness check
Kalesh AP [Thu, 4 Mar 2021 09:07:27 +0000 (14:37 +0530)]
net/bnxt: fix device readiness check

Fix HWRM_VER_GET command to handle DEV_NOT_RDY state.

Driver should fail probe if the device is not ready.
Conversely, the HWRM_VER_GET poll after reset can safely
retry until the existing timeout is exceeded.

Fixes: 804e746c7b73 ("net/bnxt: add hardware resource manager init code")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: check flush status during ring free
Ajit Khaparde [Thu, 4 Mar 2021 09:07:26 +0000 (14:37 +0530)]
net/bnxt: check flush status during ring free

When host SW issues a HWRM_RING_FREE for Tx/Rx/AGG ring in HW,
the FW flushes the BDs associated with the ring and performs other
cleanup in the HW. The host software should ideally check for an
indication from the FW indicating this step has been completed
successfully to avoid unexpected errors during cleanup.

The FW issues a HWRM_DONE response to the RING_FREE request on
the corresponding CQ ring. Poll the CQs during cleanup and
ensure the HWRM_FREE command is completed not just based on the
value of valid bit but also the HWRM_DONE response for the ring.

If the HWRM_DONE response is not seen, force the cleanup to
complete just based on the valid bit.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
3 years agonet/bnxt: refactor mbuf pointer reset
Lance Richardson [Tue, 2 Mar 2021 15:16:08 +0000 (10:16 -0500)]
net/bnxt: refactor mbuf pointer reset

Remove code for setting consumed mbuf pointers to NULL from the
vector receive functions as a minor performance optimization.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
3 years agonet/bnxt: fix Rx descriptor status
Lance Richardson [Tue, 2 Mar 2021 14:29:27 +0000 (09:29 -0500)]
net/bnxt: fix Rx descriptor status

Fix a number of issues in the bnxt receive descriptor status
function, including:
   - Provide status of receive descriptor instead of completion
     descriptor.
   - Remove invalid comparison of raw ring index with masked ring
     index.
   - Correct misinterpretation of offset parameter as ring index.
   - Correct misuse of completion ring index for mbuf ring (the
     two rings have different sizes).

Fixes: 0fe613bb87b2 ("net/bnxt: support Rx descriptor status")
Cc: stable@dpdk.org
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: fix PTP support for Thor
Kalesh AP [Wed, 24 Feb 2021 15:55:53 +0000 (21:25 +0530)]
net/bnxt: fix PTP support for Thor

On Thor, Rx timestamp is present in the Rx completion record.
Only 32 bits of the timestamp is present in the completion.
The driver needs to periodically poll the current 48 bit
free running timer using the HWRM_PORT_TS_QUERY command.
It can combine the upper 16 bits from the HWRM response
with the lower 32 bits in the Rx completion to produce
the 48 bit timestamp for the Rx packet.

This patch adds an alarm thread to periodically poll the current 48 bit
free running timer using the HWRM_PORT_TS_QUERY command.
This avoids issuing the hwrm command from the Rx handler.
This patch also handles the timer roll over condition.

Fixes: 6cbd89f9f3d8 ("net/bnxt: support PTP for Thor")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: fix FW readiness check during recovery
Kalesh AP [Wed, 24 Feb 2021 15:55:52 +0000 (21:25 +0530)]
net/bnxt: fix FW readiness check during recovery

Moved fw readiness check to a new routine bnxt_check_fw_ready().

During error recovery, driver needs to wait for fw readiness.
For that, it uses bnxt_hwrm_ver_get() function now and that
function does parsing of the VER_GET response as well.

Added a new lightweight function bnxt_hwrm_poll_ver_get() for polling
the firmware readiness which issues VER_GET and checks for success
without processing the command response.

Fixes: df6cd7c1f73a ("net/bnxt: handle reset notify async event from FW")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
3 years agonet/bnxt: fix firmware fatal error handling
Kalesh AP [Wed, 24 Feb 2021 15:55:51 +0000 (21:25 +0530)]
net/bnxt: fix firmware fatal error handling

During some fatal firmware error conditions, the PCI config space
register 0x2e which normally contains the subsystem ID will become
0xffff. This register will revert back to the normal value after
the chip has completed core reset. If we detect this condition,
we can poll this config register immediately for the value to revert.
Because we use config read cycles to poll this register, there is no
possibility of Master Abort if we happen to read it during core reset.
This speeds up recovery significantly as we don't have to wait for the
conservative min_time before polling to see if the firmware has come
out of reset. As soon as this register changes value we can proceed
to re-initialize the device.

Fixes: df6cd7c1f73a ("net/bnxt: handle reset notify async event from FW")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: handle echo request async message
Kalesh AP [Wed, 24 Feb 2021 15:55:50 +0000 (21:25 +0530)]
net/bnxt: handle echo request async message

This is a new async message that the firmware can send to check if it
can communicate with the driver. This is an added error detection
scheme that firmware can use if it suspects errors in the PCIe
interface. When the driver receives this async message, it will reply
back echoing some data in the async message. If the firmware is not
getting the reply with the proper data after some retries, error
recovery will kick in.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: log port id in async events
Kalesh AP [Wed, 24 Feb 2021 15:55:49 +0000 (21:25 +0530)]
net/bnxt: log port id in async events

1. Used port id in async event logs.
2. Added a debug log in bnxt_hwrm_func_driver_unregister().

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: update to new version of backing store
Ajit Khaparde [Wed, 24 Feb 2021 15:55:48 +0000 (21:25 +0530)]
net/bnxt: update to new version of backing store

Update HWRM headers to version 1.10.2.15
which  updates the backing store API for additional TQM rings.
Add support for 9th TQM ring using latest firmware interface.
Also make sure that we set only necessary bits in the enables
field in backing store request.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
3 years agonet/bnxt: update HWRM structures
Kalesh AP [Wed, 24 Feb 2021 15:55:47 +0000 (21:25 +0530)]
net/bnxt: update HWRM structures

Brought in the latest hsi_struct_def_dpdk.h.
HWRM API is now updated to version 1.10.2.15.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: fix queues per VNIC
Venkat Duvvuru [Wed, 24 Feb 2021 15:55:46 +0000 (21:25 +0530)]
net/bnxt: fix queues per VNIC

Update queues per VNIC in single queue mode.
bp->rx_num_qs_per_vnic is not initialized in the single queue mode.
As a result of this when an interface is reconfigured to single
queue mode from an existing multiqueue mode, bp->rx_num_qs_per_vnic
is not updated to the value of 1. Hence, the driver will try to
access more than one queue resulting in a crash.

This patch fixes it by initializing bp->rx_num_qs_per_vnic in the
single queue mode as well.

Fixes: 36024b2e7fe5 ("net/bnxt: allow dynamic creation of VNIC")
Cc: stable@dpdk.org
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: remove extra blank line
Kalesh AP [Wed, 24 Feb 2021 15:55:45 +0000 (21:25 +0530)]
net/bnxt: remove extra blank line

Removed an unnecessary extra blank line.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
3 years agonet/bnxt: fix VNIC configuration
Kalesh AP [Wed, 24 Feb 2021 15:55:44 +0000 (21:25 +0530)]
net/bnxt: fix VNIC configuration

PMD should not set any flags to receive RoCE traffic while
configuring the vnic. Since the PMD does not support RoCE
some of the flags and code is unused. Clean it up.

Fixes: b7778e8a1c00 ("net/bnxt: refactor to properly allocate resources for PF/VF")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agonet/bnxt: remove unused macro
Kalesh AP [Wed, 24 Feb 2021 15:55:43 +0000 (21:25 +0530)]
net/bnxt: remove unused macro

remove HWRM_SEQ_ID_INVALID macro.

Fixes: 804e746c7b73 ("net/bnxt: add hardware resource manager init code")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
3 years agodevtools: add acronyms in dictionary for commit checks
Ajit Khaparde [Wed, 10 Mar 2021 18:31:01 +0000 (10:31 -0800)]
devtools: add acronyms in dictionary for commit checks

Update word list with VNIC and Thor to catch errors in patch title.

Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agonet/sfc: update copyright year
Andrew Rybchenko [Thu, 11 Mar 2021 10:00:40 +0000 (13:00 +0300)]
net/sfc: update copyright year

Bump copyright year to 2021.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
3 years agocommon/sfc_efx: update copyright year
Andrew Rybchenko [Thu, 11 Mar 2021 10:00:39 +0000 (13:00 +0300)]
common/sfc_efx: update copyright year

Bump copyright year to 2021.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
3 years agoapp/testpmd: log reason of port start failure
Andrew Rybchenko [Thu, 11 Mar 2021 10:48:35 +0000 (13:48 +0300)]
app/testpmd: log reason of port start failure

Provide a bit more diagnostics information when port start fails.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
3 years agonet: fix comment in IPv6 header
Ivan Malov [Mon, 8 Mar 2021 06:51:04 +0000 (09:51 +0300)]
net: fix comment in IPv6 header

The comment got it wrong. The payload length field
does not include the fixed IPv6 header size.

Fixes: 7eca7f7fd09d ("net: add missing endianness annotations")
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agomaintainers: update for qede
Igor Russkikh [Wed, 10 Mar 2021 08:16:31 +0000 (09:16 +0100)]
maintainers: update for qede

Removing Shahed, adding me and Devendra

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Acked-by: Devendra Singh Rawat <dsinghrawat@marvell.com>
3 years agonet/af_xdp: prefer busy polling
Ciara Loftus [Wed, 10 Mar 2021 07:48:16 +0000 (07:48 +0000)]
net/af_xdp: prefer busy polling

This commit introduces support for preferred busy polling
to the AF_XDP PMD. This feature aims to improve single-core
performance for AF_XDP sockets under heavy load.

A new vdev arg is introduced called 'busy_budget' whose default
value is 64. busy_budget is the value supplied to the kernel
with the SO_BUSY_POLL_BUDGET socket option and represents the
busy-polling NAPI budget. To set the budget to a different value
eg. 256:

--vdev=net_af_xdp0,iface=eth0,busy_budget=256

Preferred busy polling is enabled by default provided a kernel with
version >= v5.11 is in use. To disable it, set the budget to zero.

The following settings are also strongly recommended to be used in
conjunction with this feature:

echo 2 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
echo 200000 | sudo tee /sys/class/net/eth0/gro_flush_timeout

.. where eth0 is the interface being used by the PMD.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agonet/af_xdp: use recvfrom instead of poll syscall
Ciara Loftus [Wed, 10 Mar 2021 07:48:15 +0000 (07:48 +0000)]
net/af_xdp: use recvfrom instead of poll syscall

poll() is more expensive and requires more tuning
when used with the upcoming busy polling functionality.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agonet/af_xdp: allow bigger batch sizes
Ciara Loftus [Wed, 10 Mar 2021 07:48:14 +0000 (07:48 +0000)]
net/af_xdp: allow bigger batch sizes

Prior to this commit, the maximum batch sizes for zero-copy and
copy-mode rx and copy-mode tx were set to 32. Apart from zero-copy tx,
the user could never rx/tx any more than 32 packets at a time and
without inspecting the code the user wouldn't be aware of this.

This commit removes these upper limits placed on the user and instead
sets an internal batch size equal to the default ring size (2048).
Batches larger than this are still processed, however they are split
into smaller batches similar to how it's done in other drivers. This is
necessary because some arrays used during rx/tx need to be sized at
compile-time.

Allowing a larger batch size allows for fewer batches and thus larger
bulk operations, fewer ring accesses and fewer syscalls which should
yield improved performance.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agobus/pci: fix Windows kernel driver categories
Thomas Monjalon [Tue, 16 Mar 2021 22:09:38 +0000 (23:09 +0100)]
bus/pci: fix Windows kernel driver categories

In Windows probing, the value RTE_PCI_KDRV_NONE was used
instead of RTE_PCI_KDRV_UNKNOWN.
This value covers the mlx case where the kernel driver is in place,
offering a bifurcated mode to the userspace driver.
When the kernel driver is listed as unknown,
there is no special treatment in DPDK probing, contrary to UIO modes.

The value RTE_PCI_KDRV_NIC_UIO (FreeBSD) was re-used
instead of having a new RTE_PCI_KDRV_NET_UIO for Windows NetUIO.
While adding the new value RTE_PCI_KDRV_NET_UIO
(at the end for ABI compatibility),
the enum of kernel driver categories is annotated.

Fixes: b762221ac24f ("bus/pci: support Windows with bifurcated drivers")
Fixes: c76ec01b4591 ("bus/pci: support netuio on Windows")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
3 years agoeal: mark version parts API as experimental
Thomas Monjalon [Wed, 17 Mar 2021 14:34:02 +0000 (15:34 +0100)]
eal: mark version parts API as experimental

Some functions were introduced in DPDK 21.05 to query the version parts
(prefix, year, month, minor, suffix, release) at runtime.
Per guidelines, these new public functions must be marked with
__rte_experimental and ABI versioned as EXPERIMENTAL.

Fixes: 5b637a848195 ("eal: fix querying DPDK version at runtime")
Cc: stable@dpdk.org
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agoeal: fix version macro
Thomas Monjalon [Wed, 17 Mar 2021 09:18:22 +0000 (10:18 +0100)]
eal: fix version macro

The macro RTE_VERSION was broken since updated with function calls.
It is a build-time version number, and must be built with macros.
For a run-time version number, there is the function rte_version().

Fixes: 5b637a848195 ("eal: fix querying DPDK version at runtime")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
3 years agoexamples/ptpclient: enable Rx timestamp offload
Hemant Agrawal [Tue, 23 Feb 2021 06:14:12 +0000 (11:44 +0530)]
examples/ptpclient: enable Rx timestamp offload

This patch add support to enable Rx offload for timestamp.
It is required to be enabled for some PMDs e.g. dpaa2.

Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
3 years agodoc: update power management in doxygen API index
Anatoly Burakov [Wed, 3 Mar 2021 12:35:13 +0000 (12:35 +0000)]
doc: update power management in doxygen API index

The headers rte_power_intrinsics.h and rte_power_pmd_mgmt.h
were missing from the doxygen API index.

Fixes: cda57d9388c0 ("eal: add power management intrinsics")
Fixes: 682a645438c5 ("power: add ethdev power management")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
3 years agobus/pci: support allow/block lists on Windows
Khoa To [Mon, 1 Mar 2021 06:52:41 +0000 (22:52 -0800)]
bus/pci: support allow/block lists on Windows

EAL -a and -b options are used to specify which PCI devices are
explicitly allowed or blocked during PCI bus scan.  This evaluation
is missing in the Windows implementation of rte_pci_scan.

This patch provides this missing functionality, so that apps can specify
which devices to ignore during PCI bus scan.

Signed-off-by: Khoa To <khot@microsoft.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
3 years agobus/pci: set Windows device class and bus
Nick Connolly [Mon, 1 Mar 2021 09:56:44 +0000 (09:56 +0000)]
bus/pci: set Windows device class and bus

Attaching to an NVMe disk on Windows using SPDK requires the
PCI class ID and device.bus fields. Decode the class ID from the PCI
device info strings if it is present and set device.bus.

Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
3 years agobus/pci: skip probing some Windows NDIS devices
Pallavi Kadam [Wed, 10 Feb 2021 20:36:54 +0000 (12:36 -0800)]
bus/pci: skip probing some Windows NDIS devices

Implement rte_pci_map_device() to distinguish between the devices bound
to netuio and NDIS devices.
Only return success for the netuio devices.

Fixes: c76ec01b4591 ("bus/pci: support netuio on Windows")
Cc: stable@dpdk.org
Suggested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
Tested-by: Narcisa Vasile <navasile@linux.microsoft.com>
3 years agoeal/windows: fix default thread priority
Tal Shnaiderman [Thu, 18 Feb 2021 11:40:58 +0000 (13:40 +0200)]
eal/windows: fix default thread priority

The hard-coded thread priority for Windows threads in EAL
is REALTIME_PRIORITY_CLASS/THREAD_PRIORITY_TIME_CRITICAL.

This results in issues with DPDK threads causing OS thread starvation
and eventually a bugcheck.

The fix reduce the thread priority to
NORMAL_PRIORITY_CLASS/THREAD_PRIORITY_NORMAL.

Bugzilla ID: 600
Fixes: 53ffd9f080f ("eal/windows: add minimum viable code")
Cc: stable@dpdk.org
Reported-by: Odi Assli <odia@nvidia.com>
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
3 years agoeal/windows: add missing SPDX license tag
Dmitry Kozlyuk [Sat, 27 Feb 2021 20:32:01 +0000 (23:32 +0300)]
eal/windows: add missing SPDX license tag

Fixes: c08bd191b13d ("eal/windows: initialize hugepage info")
Cc: stable@dpdk.org
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Nick Connolly <nick.connolly@mayadata.io>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
3 years agometrics: export telemetry stubs if no libjansson
Jie Zhou [Mon, 8 Mar 2021 18:05:39 +0000 (10:05 -0800)]
metrics: export telemetry stubs if no libjansson

This patch allows the same set of rte_metrics_tel_* functions to be
exported no matter JANSSON is available or not, by doing following:
1. Leverage dpdk_conf to set configuration flag RTE_HAS_JANSSON
when Jansson dependency is found.
2. In rte_metrics_telemetry.c, leverage RTE_HAS_JANSSON to handle the
case when JANSSON is not available by adding stubs for all the instances.
3. In meson.build, per dpdk/doc/guides/rel_notes/release_20_05.rst,
it is claimed that "Telemetry library is no longer dependent on the
external Jansson library, which allows Telemetry be enabled by default.",
thus make the deps and includes of Telemetry as not conditional anymore.

Signed-off-by: Jie Zhou <jizh@microsoft.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
3 years agolog/linux: make default output stderr
Ferruh Yigit [Tue, 9 Feb 2021 15:06:20 +0000 (15:06 +0000)]
log/linux: make default output stderr

In Linux by default DPDK log goes to stdout, as well as syslog.

It is possible for an application to change the library output stream
via 'rte_openlog_stream()' API, to set it to stderr, it can be used as:
rte_openlog_stream(stderr);

But still updating the default log output to 'stderr'.

Bugzilla ID: 8
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Reported-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agobuild: support KNI cross-compilation
Juraj Linkeš [Thu, 11 Feb 2021 12:59:46 +0000 (13:59 +0100)]
build: support KNI cross-compilation

The KNI linux module is using a custom target for building, which
doesn't take into account any cross compilation arguments. The arguments
in question are ARCH, CROSS_COMPILE (for gcc, clang) and CC, LD (for
clang). Get those from the cross file and pass them to the custom
target.

The user supplied path may not contain the 'build' directory, such as
when using cross-compiled headers, so only append that in the default
case (when no path is supplied in native builds) and use the unmodified
path from the user otherwise. Also modify the install path accordingly.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agoeal: fix querying DPDK version at runtime
Bruce Richardson [Tue, 16 Feb 2021 15:13:29 +0000 (15:13 +0000)]
eal: fix querying DPDK version at runtime

For using a DPDK application, such as OVS, which is dynamically linked, the
DPDK version in use should always report the actual version, not the
version used at build time. This incorrect behaviour can be seen by
building OVS against one version of DPDK and running it against a later
one. Using "ovs-vsctl list Open_vSwitch" to query basic info, the
dpdk_version returned will be the build version not the currently running
one - which can be verified using the DPDK telemetry library client.

  $ sudo ovs-vsctl list Open_vSwitch | grep dpdk_version
  dpdk_version        : "DPDK 20.11.0-rc4"

  $ echo quit | sudo dpdk-telemetry.py
  Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
  {"version": "DPDK 21.02.0-rc2", "pid": 405659, "max_output_len": 16384}
  -->

To fix this, we need to convert the rte_version() function, and any other
necessary parts of the rte_version.h, to be actual functions in EAL, not
just inlines/macros. The only complication in doing so is that telemetry
library cannot call rte_version() directly, and instead needs the version
string passed in on init.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agobuild: exclude meson files from examples installation
Bruce Richardson [Fri, 12 Mar 2021 14:56:05 +0000 (14:56 +0000)]
build: exclude meson files from examples installation

The meson.build files in each example directory is simply to support
building the example as part of the main SDK build, and these should not
be installed with the example's source code and makefile. The exclude of
"meson.build" only filters out the top-level examples/meson.build file,
not the file in each subdirectory.

To fix this, we can build up the list of files to exclude based off the
list of all examples. With this change "find examples/ -name meson.build"
returns no hits when run on an installed instance.

Fixes: e5b95003f1df ("examples: fix flattening directory layout on install")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agobus/pci: support MMIO for ioport
Huawei Xie [Wed, 10 Mar 2021 17:36:30 +0000 (01:36 +0800)]
bus/pci: support MMIO for ioport

With I/O BAR, we get PIO (port-mapped I/O) address.
With MMIO (memory-mapped I/O) BAR, we get mapped virtual address.
We distinguish PIO and MMIO by their address range like how kernel does,
i.e, address below 64K is PIO.
ioread/write8/16/32 is provided to access PIO/MMIO.
By the way, for virtio on arch other than x86, BAR flag indicates PIO
but is mapped.

Signed-off-by: Huawei Xie <huawei.xhw@alibaba-inc.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
3 years agobus/pci: use Linux PCI sysfs to get PIO address
Huawei Xie [Wed, 10 Mar 2021 17:36:29 +0000 (01:36 +0800)]
bus/pci: use Linux PCI sysfs to get PIO address

Currently virtio PMD assumes legacy device uses PIO bar.
There are three ways to get PIO (port-mapped I/O) address for virtio
legacy device.
1) under igb_uio
   - get PIO address from uio/uio# sysfs attribute, for instance:
     /sys/bus/pci/devices/0000:00:09.0/uio/uio0/portio/port0/start
2) under uio_pci_generic
   - for X86, get PIO address from /proc/ioport
   - for other ARCH, get PIO address from standard PCI sysfs attribute,
     for instance: /sys/bus/pci/devices/0000:00:09.0/resource

Actually, "port0/start" in igb_uio and "resource" point to exactly the
same thing, i.e, pci_dev->resource[0] in kernel source code.

This patch refactors these messy things, and uses standard PCI sysfs
attribute "resource".

Signed-off-by: Huawei Xie <huawei.xhw@alibaba-inc.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
3 years agonet/octeontx2: fix VLAN filter
Satheesh Paul [Tue, 9 Feb 2021 10:01:13 +0000 (15:31 +0530)]
net/octeontx2: fix VLAN filter

This patch fixes incorrect MCAM key preparation when creating
MCAM entry to allow VLAN IDs after vlan filtering is enabled on port.

Fixes: ba1b3b081edf ("net/octeontx2: support VLAN offloads")
Cc: stable@dpdk.org
Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
3 years agonet/octeontx2: support flow API dump
Satheesh Paul [Mon, 8 Feb 2021 13:01:57 +0000 (18:31 +0530)]
net/octeontx2: support flow API dump

Add support to dump hardware internal representation information of
rte flow to file.

Every flow rule added will be dumped in the below format.

MCAM Index:1881
Interface :NIX-RX (0)
Priority  :1
NPC RX Action:0X00000000404001
        ActionOp:NIX_RX_ACTIONOP_UCAST (1)
        PF_FUNC: 0X400
        RQ Index:0X004
        Match Id:0000
        Flow Key Alg:0
NPC RX VTAG Action:0X00000000008100
        VTAG0:relptr:0
        lid:0X1
        type:0
Patterns:
        NPC_PARSE_NIBBLE_CHAN:000
        NPC_PARSE_NIBBLE_LA_LTYPE:LA_ETHER
        NPC_PARSE_NIBBLE_LB_LTYPE:NONE
        NPC_PARSE_NIBBLE_LC_LTYPE:LC_IP
        NPC_PARSE_NIBBLE_LD_LTYPE:LD_TCP
        NPC_PARSE_NIBBLE_LE_LTYPE:NONE
        LA_ETHER, hdr offset:0, len:0X6, key offset:0X8,\
Data:0X4AE124FC7FFF, Mask:0XFFFFFFFFFFFF
        LA_ETHER, hdr offset:0XC, len:0X2, key offset:0X4, Data:0XCA5A,\
Mask:0XFFFF
        LC_IP, hdr offset:0XC, len:0X8, key offset:0X10,\
Data:0X0A01010300000000, Mask:0XFFFFFFFF00000000
        LD_TCP, hdr offset:0, len:0X4, key offset:0X18, Data:0X03450000,\
Mask:0XFFFF0000
MCAM Raw Data :
        DW0     :0000CA5A01202000
        DW0_Mask:0000FFFF0FF0F000
        DW1     :00004AE124FC7FFF
        DW1_Mask:0000FFFFFFFFFFFF
        DW2     :0A01010300000000
        DW2_Mask:FFFFFFFF00000000
        DW3     :0000000003450000
        DW3_Mask:00000000FFFF0000
        DW4     :0000000000000000
        DW4_Mask:0000000000000000
        DW5     :0000000000000000
        DW5_Mask:0000000000000000
        DW6     :0000000000000000
        DW6_Mask:0000000000000000

Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
3 years agonet/mlx5: fix Rx segmented packets on mbuf starvation
Jiawei Zhu [Mon, 1 Mar 2021 17:19:50 +0000 (12:19 -0500)]
net/mlx5: fix Rx segmented packets on mbuf starvation

The issue occurred if mbuf starvation happened
in the middle of segmented packet reception.
In such a situation, after release the segments of
packet being received, code did not advance the
consumer index to the next stride. This caused
the receiving of the wrong segmented packet data.

The possible error scenario:
- we assume segs_n is 4 and we are receiving 4
  segments of multi-segment packet.
- we fail to allocate mbuf while receiving the 3rd segment,
  and this frees the mbufs of the packet chain we have built.
  There are the 1st and 2nd segments in the chain.
- the 1st and the 2nd segments of this stride of Rx queue
  are filled up (in elts array) with the new allocated
  mbufs and their data are random (the 3rd and 4th
  segments still contain the valid data of the packet though).
- on the next iteration of stride processing we get
  the wrong two segments of the multi-segment packet.

Hence, we should skip these mbufs in the stride and
we should advance the consumer index on loop exit.

Fixes: 15a756b63734 ("net/mlx5: fix possible NULL dereference in Rx path")
Cc: stable@dpdk.org
Signed-off-by: Jiawei Zhu <zhujiawei12@huawei.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/i40e: fix IPv4 fragment offload
Xiaoyun Li [Tue, 2 Mar 2021 07:03:20 +0000 (15:03 +0800)]
net/i40e: fix IPv4 fragment offload

IPv4 fragment_offset mask was required to be 0 no matter what the
spec value was. But zero mask means not caring about fragment_offset
field then both non-frag and frag packets should hit the rule.

But the actual fragment rules should be like the following:
Only non-fragment packets can hit Rule 1:
Rule 1: mask=0x3fff, spec=0
Only fragment packets can hit rule 2:
Rule 2: mask=0x3fff, spec=0x8, last=0x2000

This patch allows the above rules.

Fixes: 42044b69c67d ("net/i40e: support input set selection for FDIR")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
3 years agoraw/ifpga: add miscellaneous APIs
Wei Huang [Wed, 3 Mar 2021 02:34:31 +0000 (21:34 -0500)]
raw/ifpga: add miscellaneous APIs

Below miscellaneous APIs are used to implement OPAE application.
1. rte_pmd_ifpga_get_pci_bus() get PCI bus ifpga driver registered.
2. rte_pmd_ifpga_partial_reconfigure() do partial reconfiguration.
3. rte_pmd_ifpga_cleanup() free software resources allocated by driver.
4. rte_pmd_ifpga_set_rsu_status() set status of rsu process.

Signed-off-by: Wei Huang <wei.huang@intel.com>
Acked-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
3 years agoraw/ifpga: add APIs to get FPGA information
Wei Huang [Wed, 3 Mar 2021 02:34:30 +0000 (21:34 -0500)]
raw/ifpga: add APIs to get FPGA information

There are some information data can be got from FPGA, they are
implemented in below APIs:
1. rte_pmd_ifpga_get_property() get properties of FPGA (include BMC).
2. rte_pmd_ifpga_get_phy_info() get information of PHY connect to FPGA.
3. rte_pmd_ifpga_get_rsu_status() get status of rsu process.

Signed-off-by: Wei Huang <wei.huang@intel.com>
Acked-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
3 years agoraw/ifpga: add FPGA RSU APIs
Wei Huang [Wed, 3 Mar 2021 02:34:29 +0000 (21:34 -0500)]
raw/ifpga: add FPGA RSU APIs

RSU (Remote System Update) depends on secure manager which may be
different on various implementations, so a new secure manager device
is implemented for adapting such difference.
There are five APIs added:
1. rte_pmd_ifpga_get_dev_id() get raw device ID of ifpga device from PCI
   address like 'Domain:Bus:Dev.Func'.
2. rte_pmd_ifpga_update_flash() update flash with specific image file.
3. rte_pmd_ifpga_stop_update() abort flash update process.
4. rte_pmd_ifpga_reboot_try() check current ifpga status and change it
   to reboot status if it is idle.
5. rte_pmd_ifpga_reload() trigger full reconfiguration of ifpga device.

Signed-off-by: Wei Huang <wei.huang@intel.com>
Acked-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
3 years agonet/i40evf: fix packet loss for X722
Beilei Xing [Wed, 24 Feb 2021 02:09:00 +0000 (10:09 +0800)]
net/i40evf: fix packet loss for X722

When Tx queue number is more than Rx queue number, and RSS is
enabled, there'll be packet loss with X722.
The root cause is the lookup table is not configured correctly,
since it uses VF's queue pair number but not Rx queue number.

Fixes: 2da3ba746795 ("net/i40e: fix VF runtime queues RSS config")
Cc: stable@dpdk.org
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Signed-off-by: Hengjian Zhang <hengjianx.zhang@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
3 years agonet/ice: clean GTPU flow type for flow director
Zhirun Yan [Tue, 2 Mar 2021 02:54:07 +0000 (10:54 +0800)]
net/ice: clean GTPU flow type for flow director

Currently, FDIR only support GTPU outer fields in PF. Clean the
redundant GTPU inner info in flow type definition and align with
shared code.

Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: distinguish input set outer fields
Zhirun Yan [Tue, 2 Mar 2021 02:54:06 +0000 (10:54 +0800)]
net/ice: distinguish input set outer fields

Distinguish input_set_mask to inner and outer part. Use
input_set_mask_o for tunnel outer or non-tunnel input set.
input_set_mask_i is used for tunnel inner fields only.

Adjust indentation of ice_pattern_match_item list in switch, ACL, RSS
and FDIR for easy review.

For switch, ACL and RSS, only use
input_set_mask_o and set the input_set_mask_i all none.

Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: refactor input set config
Zhirun Yan [Tue, 2 Mar 2021 02:54:05 +0000 (10:54 +0800)]
net/ice: refactor input set config

For tunnel or non-tunnel packet, the input set is in outer_input_set
and use seg_tun[0]. seg_tun[1] is only used for tunnel inner fields.
This patch make align with input_set inner/outer with seg_tun[] and
simplify it.

Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: refactor flow pattern parser
Zhirun Yan [Tue, 2 Mar 2021 02:54:04 +0000 (10:54 +0800)]
net/ice: refactor flow pattern parser

Distinguish inner/outer input_set. And avoid too many nested
conditionals in each type's parser. input_set_o is used for
tunnel outer fields or non-tunnel fields , input_set_i is only
used for inner fields.

For GTPU, store the outer IP fields in inner part to align with
shared code behavior.

Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: refactor flow director filter structure
Zhirun Yan [Tue, 2 Mar 2021 02:54:03 +0000 (10:54 +0800)]
net/ice: refactor flow director filter structure

This patch use input_set_o and input_set_i to distinguish inner/outer
input set. input_set_i is only used for inner field.

Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice: clean input set macro definition
Zhirun Yan [Tue, 2 Mar 2021 02:54:02 +0000 (10:54 +0800)]
net/ice: clean input set macro definition

Currently, the macro of input set use 2 bits, one bit for protocol and
inner/outer, another bit for src/dst field. But this could not
distinguish a rule with inner and outer fields for tunnel packet.
Redefine input set macro to make it clear. Only use these two bits for
protocol and field. Ignore the redundant inner/outer info.

ICE_INSET_TUN_* is used by switch module, should be removed after
switch refactor.

Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agonet/ice/base: cleanup filter list on error
Qi Zhang [Tue, 2 Mar 2021 07:23:57 +0000 (15:23 +0800)]
net/ice/base: cleanup filter list on error

When ice_remove_vsi_lkup_fltr is called, by calling
ice_add_to_vsi_fltr_list local copy of vsi filter list
is created. If any issues during creation of vsi filter
list occurs it up for the caller to free already
allocated memory. This patch ensures proper memory
deallocation in these cases.

Fixes: c7dd15931183 ("net/ice/base: add virtual switch code")
Cc: stable@dpdk.org
Signed-off-by: Robert Malz <robertx.malz@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: fix uninitialized struct
Qi Zhang [Tue, 2 Mar 2021 07:23:56 +0000 (15:23 +0800)]
net/ice/base: fix uninitialized struct

One of the structs being used for ACL counter rules was allocated on
the stack and left uninitialized.  Rather than depending on
undefined behavior around the .amount member during rule removal,
just leave a comment and initialize the struct to zero, as this is a
slow path call anyway. This bug could have caused silent failures
during counter removal.

Fixes: f3202a097f12 ("net/ice/base: add ACL module")
Cc: stable@dpdk.org
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: update GTPU EH dummy packets for FDIR
Qi Zhang [Tue, 2 Mar 2021 07:23:55 +0000 (15:23 +0800)]
net/ice/base: update GTPU EH dummy packets for FDIR

Update GTPU EH dummy pkts for FDIR, including EH/DL/UL.

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: update boost TCAM for DVM
Qi Zhang [Tue, 2 Mar 2021 07:23:54 +0000 (15:23 +0800)]
net/ice/base: update boost TCAM for DVM

Add code to update boost TCAM entries to enable DVM. This requires
enabled DVM entries, and disabling SVM entries.

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: mark ptype 2 as reserved
Qi Zhang [Tue, 2 Mar 2021 07:23:53 +0000 (15:23 +0800)]
net/ice/base: mark ptype 2 as reserved

The entry for PTYPE 2 in the ice_ptype_lkup table incorrectly states
that this is an L2 packet with no payload. According to the datasheet,
this PTYPE is actually unused and reserved.

Modify the lookup entry to indicate this is an unused entry that is
reserved.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: fix payload indicator on ptype
Qi Zhang [Tue, 2 Mar 2021 07:23:52 +0000 (15:23 +0800)]
net/ice/base: fix payload indicator on ptype

The entry for PTYPE 90 indicates that the payload is layer 3. This does
not match the specification in the datasheet which indicates the packet
is a MAC, IPv6, UDP packet, with a payload in layer 4.

Fix the lookup table to match the data sheet.

Fixes: 64e9587d5629 ("net/ice/base: add structures for Rx/Tx queues")
Cc: stable@dpdk.org
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: support GTPU IP inner IPv6 for flow director
Qi Zhang [Tue, 2 Mar 2021 07:23:51 +0000 (15:23 +0800)]
net/ice/base: support GTPU IP inner IPv6 for flow director

Support IPV4_GTPU with inner IPV6/UDP/TCP for FDIR.

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: support switch filter (GTP tunnel+IP flow)
Qi Zhang [Tue, 2 Mar 2021 07:23:50 +0000 (15:23 +0800)]
net/ice/base: support switch filter (GTP tunnel+IP flow)

Enabled support for advanced switch filter to satisfy match criteria
such as: GTP tunnel + Inner IPv4[6]

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: enable more GTPU inner L3 fields for FDIR
Qi Zhang [Tue, 2 Mar 2021 07:23:49 +0000 (15:23 +0800)]
net/ice/base: enable more GTPU inner L3 fields for FDIR

Add support for FDIR filter by GTPU inner L3 fields
(i.e., tos, ttl, proto).

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: expose link configuration error
Qi Zhang [Tue, 2 Mar 2021 07:23:48 +0000 (15:23 +0800)]
net/ice/base: expose link configuration error

Store the link_cfg_err byte in order to determine whether an unsupported
power configuration is preventing link establishment.

Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: enable GTPU inner L3/L4 for flow director
Qi Zhang [Tue, 2 Mar 2021 07:23:47 +0000 (15:23 +0800)]
net/ice/base: enable GTPU inner L3/L4 for flow director

For FDIR, GTPU with inner L3/L4 layers should only support inner
L3/L4 addrs/ports, instead of outer fields. Thus, we use TUN offsets
for GTPU IP/EH to insert inner L3/L4 addrs/ports fields.

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: indicate double reset solution restriction
Qi Zhang [Tue, 2 Mar 2021 07:23:46 +0000 (15:23 +0800)]
net/ice/base: indicate double reset solution restriction

Add capability which indicates double reset solution restriction.
Added "Post-update EMPR enabled" field to "Response Flags" field
(byte 19 in the response structure).

Signed-off-by: Amir Shay <shay.amir@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: support external device secure programming
Qi Zhang [Tue, 2 Mar 2021 07:23:45 +0000 (15:23 +0800)]
net/ice/base: support external device secure programming

External topology devices (e.g. PHYs) connected to controller or to SoC
might have a firmware engine within the device and the firmware is
usually loaded from NVM connected to the topology device.

In some cases, those firmware packages might need to be regularly
updated in a secure way to prevent malicious user to burn malicious
firmware into the topology device. In other cases, the topology device
firmware might be burned independently, as burning the NVM attached to
the device might cause the device to stop function but could be fixed
without permanent damage.
SoC topologies also enable mezzanine card, with an ID EEPROM
within it. This ID EEPROM might need an update also.
This patch provides these abilities.

Signed-off-by: Amir Shay <shay.amir@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/ice/base: support firmware log
Qi Zhang [Tue, 2 Mar 2021 07:23:44 +0000 (15:23 +0800)]
net/ice/base: support firmware log

Currently we do not provide full end-to-end solution for system level
debug and diagnostics. This change purpose is to fulfill design and
implementation gaps to provide full end-to-end (HW-FW-SW) diagnostic
solution. In addition to functional improvements, it will provide
feasible, user-friendly Debug information.

Signed-off-by: Amir Shay <shay.amir@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
3 years agonet/e1000: remove MTU setting limitation
Dapeng Yu [Fri, 19 Feb 2021 10:03:23 +0000 (18:03 +0800)]
net/e1000: remove MTU setting limitation

Currently, if requested MTU is bigger than mbuf size and scattered
receive is not enabled, setting MTU to that value fails.

This patch allows setting this special MTU when device is stopped,
because scattered_rx will be re-configured during next port start
and driver may enable scattered receive according new MTU value.

After this patch, driver may select different receive function
automatically after MTU set, according MTU values selected.

Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
3 years agonet/igc: remove MTU setting limitation
Dapeng Yu [Fri, 19 Feb 2021 10:01:07 +0000 (18:01 +0800)]
net/igc: remove MTU setting limitation

Currently, if requested MTU is bigger than mbuf size and scattered
receive is not enabled, setting MTU to that value fails.

This patch allows setting this special MTU when device is stopped,
because scattered_rx will be re-configured during next port start
and driver may enable scattered receive according new MTU value.

After this patch, driver may select different receive function
automatically after MTU set, according MTU values selected.

Fixes: a5aeb2b9e225 ("net/igc: support Rx and Tx")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
3 years agonet/ice: fix VLAN filter with PF
Alvin Zhang [Fri, 19 Feb 2021 05:13:46 +0000 (13:13 +0800)]
net/ice: fix VLAN filter with PF

The macro flag DEV_RX_OFFLOAD_VLAN_FILTER is used to enable/disable
Rx VLAN filter, but not Tx VLAN filter. Therefore, Tx VLAN filter
should not be enabled/disabled in function ice_vsi_config_vlan_filter
called after checking DEV_RX_OFFLOAD_VLAN_FILTER flag.

In addition, the kernel driver doesn't enable/disable the TX VLAN
filter in the similar function ice_cfg_vlan_pruning.

This patch removes the setting about the TX VLAN filter in function
ice_vsi_config_vlan_filter.

Fixes: e0dcf94a0d7f ("net/ice: support VLAN ops")
Cc: stable@dpdk.org
Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Tested-by: Zhimin Huang <zhiminx.huang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
3 years agoethdev: document generic modify flow action
Alexander Kozyrev [Wed, 3 Mar 2021 22:19:04 +0000 (22:19 +0000)]
ethdev: document generic modify flow action

Field IDs for the MODIFY_FIELD action lack doxygen comments
and not visible in online DPDK documentation because of that.
Provide a meaningful description for every Field ID for the
rte_flow_field_id enumeration.

Fixes: 73b68f4c54a0 ("ethdev: introduce generic modify flow action")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agonet/ring: support secondary process
Ferruh Yigit [Wed, 30 Sep 2020 11:02:40 +0000 (12:02 +0100)]
net/ring: support secondary process

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
3 years agoapp/testpmd: support forced ethernet speed
Ajit Khaparde [Fri, 5 Mar 2021 19:42:32 +0000 (11:42 -0800)]
app/testpmd: support forced ethernet speed

Add support for forced ethernet speed setting.
Currently testpmd tries to configure the Ethernet port in autoneg mode.
It is not possible to set the Ethernet port to a specific speed while
starting testpmd. In some cases capability to configure a forced speed
for the Ethernet port during initialization may be necessary. This patch
tries to add this support.

The patch assumes full duplex setting and does not attempt to change that.
So speeds like 10M, 100M are not configurable using this method.

The command line to configure a forced speed of 10G:
dpdk-testpmd -c 0xff  -- -i  --eth-link-speed  10000

The command line to configure a forced speed of 50G:
dpdk-testpmd -c 0xff  -- -i  --eth-link-speed  50000

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
3 years agodoc: update release notes for dpaax
Hemant Agrawal [Fri, 5 Mar 2021 05:36:14 +0000 (11:06 +0530)]
doc: update release notes for dpaax

This patch updates the release notes for recently submitted changes.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
3 years agonet/txgbe: fix adding crypto SA
Jiawen Wu [Fri, 5 Mar 2021 02:14:38 +0000 (10:14 +0800)]
net/txgbe: fix adding crypto SA

By register definition, Ipsec Rx IPv4 address should to be written
in the reg(0).

Fixes: 07cafb2adbc5 ("net/txgbe: add security session create operation")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/txgbe: update packet type
Jiawen Wu [Fri, 5 Mar 2021 02:14:37 +0000 (10:14 +0800)]
net/txgbe: update packet type

Update the packet type lookup table according to the HW design.
Fix the bug that inner L3 and L4 type can not be parsed when
QINQ insert in tunnel packet.

Fixes: 9e30b88f60b2 ("net/txgbe: support packet type")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/txgbe: fix Rx missed packet counter
Jiawen Wu [Fri, 5 Mar 2021 02:14:36 +0000 (10:14 +0800)]
net/txgbe: fix Rx missed packet counter

Add the Rx dropped packet counter into stats->imissed, to ensure the
stats correct.

Fixes: c9bb590d4295 ("net/txgbe: support device statistics")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/txgbe: remove unused functions
Jiawen Wu [Fri, 5 Mar 2021 02:14:35 +0000 (10:14 +0800)]
net/txgbe: remove unused functions

Remove unused functions for EEPROM read and write.

Fixes: 35c90ecccfd4 ("net/txgbe: add EEPROM functions")
Cc: stable@dpdk.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
3 years agonet/bonding: fix LACP system address check
Vadim Podovinnikov [Wed, 17 Feb 2021 16:26:55 +0000 (16:26 +0000)]
net/bonding: fix LACP system address check

In bond (LACP) we have several NICs (ports), when we have negotiation
with peer about what port we prefer, we send information about what
system we preferred in partner system name field. Peer also sends us
what partner system name it prefer.

When we receive a message from it we must compare its preferred system
name with our system name, but not with our port mac address

In my test I have several problems with that:
1. If master port (mac address same as system address) shuts down (I
   have two ports) I loose connection
2. If secondary port (mac address not same as system address) receives
   message before master port, my connection is not established.

Fixes: 56cbc0817399 ("net/bonding: fix LACP negotiation")
Cc: stable@dpdk.org
Signed-off-by: Vadim Podovinnikov <podovinnikov@protei.ru>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
3 years agonet/hns3: fix imprecise statistics
Chengchang Tang [Thu, 4 Mar 2021 07:44:54 +0000 (15:44 +0800)]
net/hns3: fix imprecise statistics

Currently, the hns3 statistics may be inaccurate due to the
following two problems:

1. Queue-level statistics are read from the firmware, and only one Rx or
   Tx can be read at a time. This results in a large time interval
   between reading multiple queues statistics in a stress scenario, such
   as 1280 queues used by a PF or 256 functions used at the same time.
   Especially when the 256 functions are used at the same time, the
   interval between every two firmware commands in a function can be
   huge, because the scheduling mechanism of the firmware is similar to
   RR.

2. The current statistics are read by type. The HW statistics are read
   first, and then the software statistics are read. Due to preceding
   reasons, HW reading may be time-consuming, which cause a
   synchronization problem between SW and HW statistics of the same
   queue.

In this patch, queue-level statistics are directly read from the bar
instead of the firmware, and all the statistics of a queue include HW
and SW are read at a time to reduce inconsistency.

Fixes: 8839c5e202f3 ("net/hns3: support device stats")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: process MAC interrupt
Hongbo Zheng [Thu, 4 Mar 2021 07:44:53 +0000 (15:44 +0800)]
net/hns3: process MAC interrupt

TNL is the abbreviation of tunnel, which means port
here. MAC TNL interrupt indicates the MAC status
report of the network port, which will be generated
when the MAC status changes.

This patch enables MAC TNL interrupt reporting, and
queries and prints the corresponding MAC status when
the interrupt is received, then clear the MAC interrupt
status. Because this interrupt uses the same interrupt
as RAS, the interrupt log is adjusted.

Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: fix mbuf leakage
Huisong Li [Thu, 4 Mar 2021 07:44:52 +0000 (15:44 +0800)]
net/hns3: fix mbuf leakage

The mbufs of rx queue will be allocated in "hns3_do_start" function.
But these mbufs are not released when "hns3_dev_start" executes
failed.

Fixes: c4ae39b2cfc5 ("net/hns3: fix Rx interrupt after reset")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: remove unused parameter markers
Huisong Li [Thu, 4 Mar 2021 07:44:51 +0000 (15:44 +0800)]
net/hns3: remove unused parameter markers

All input parameters in the "hns3_dev_xstats_get_by_id" API are used,
so the rte_unused flag of some variables should be deleted.

Fixes: 3213d584b698 ("net/hns3: fix xstats with id and names")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: fix HW buffer size on MTU update
Chengchang Tang [Thu, 4 Mar 2021 07:44:50 +0000 (15:44 +0800)]
net/hns3: fix HW buffer size on MTU update

After MTU changed, the buffer used to store packets in HW should be
reallocated. And buffer size is allocated based on the maximum frame
size in the PF struct. However, the value of maximum frame size  is
not updated in time when MTU is changed. This would lead to a packet
loss for not enough buffer.

This patch update the maximum frame size before reallocating the HW
buffer. And a rollback operation is added to avoid the side effects
of buffer reallocation failures.

Fixes: 1f5ca0b460cd ("net/hns3: support some device operations")
Fixes: d51867db65c1 ("net/hns3: add initialization")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: support Rx descriptor advanced layout
Chengwen Feng [Thu, 4 Mar 2021 07:44:49 +0000 (15:44 +0800)]
net/hns3: support Rx descriptor advanced layout

Currently, the driver get packet type by parse the
L3_ID/L4_ID/OL3_ID/OL4_ID from Rx descriptor and then lookup multiple
tables, it's time consuming.

Now Kunpeng930 support advanced RXD layout, which:
1. Combine OL3_ID/OL4_ID to 8bit PTYPE filed, so the driver get packet
   type by lookup only one table.  Note: L3_ID/L4_ID become reserved
   fields.
2. The 1588 timestamp located at Rx descriptor instead of query from
   firmware.
3. The L3E/L4E/OL3E/OL4E will be zero when L3L4P is zero, so driver
   could optimize the good checksum calculations (when L3E/L4E is zero
   then mark PKT_RX_IP_CKSUM_GOOD/PKT_RX_L4_CKSUM_GOOD).

Considering compatibility, the firmware will report capability of
RXD advanced layout, the driver will identify and enable it by default.

This patch only provides basic function: identify and enable the RXD
advanced layout, and lookup ptype table if supported.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: support PF device with copper PHYs
Huisong Li [Thu, 4 Mar 2021 07:44:48 +0000 (15:44 +0800)]
net/hns3: support PF device with copper PHYs

The normal operation of devices with copper phys depends on the
initialization and configuration of the PHY chip. The task of
driving the PHY chip is implemented in some firmware versions.
If firmware supports the phy driver, it will report a capability
flag to driver in probing process. The driver determines whether
to support PF device with copper phys based on the capability bit.
If supported, the driver set a flag indicating that the firmware
takes over the PHY, and then the firmware initializes the PHY.

This patch supports the query of link status and link info, and
existing basic features for PF device with copper phys.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: fix device capabilities for copper media type
Huisong Li [Thu, 4 Mar 2021 07:44:47 +0000 (15:44 +0800)]
net/hns3: fix device capabilities for copper media type

The configuration operation for PHY is implemented by firmware. And
a capability flag will be report to driver, which means the firmware
supports the PHY driver.  However, the current implementation only
supports obtaining the capability bit, but some basic functions of
copper ports in driver, such as, the query of link status and link
info, are not supported.

Therefore, it is necessary for driver to set the copper capability
bit to zero when the firmware supports the configuration of the PHY.

Fixes: 438752358158 ("net/hns3: get device capability from firmware")
Fixes: 95e50325864c ("net/hns3: support copper media type")
Cc: stable@dpdk.org
Signed-off-by: Huisong Li <lihuisong@huawei.com>
3 years agonet/hns3: encapsulate port shaping interface
Huisong Li [Thu, 4 Mar 2021 07:44:46 +0000 (15:44 +0800)]
net/hns3: encapsulate port shaping interface

When rate of port changes, the rate limit of the port needs to
be updated. So it is necessary to encapsulate an interface that
configures the rate limit based on the rate.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: add imissed packet stats
Min Hu (Connor) [Thu, 4 Mar 2021 07:44:45 +0000 (15:44 +0800)]
net/hns3: add imissed packet stats

This patch implement Rx imissed stats by querying cmdq.

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: add bytes stats
Min Hu (Connor) [Thu, 4 Mar 2021 07:44:44 +0000 (15:44 +0800)]
net/hns3: add bytes stats

In current HNS3 PMD, Rx/Tx bytes from packet stats are not
implemented.

This patch implemented Rx/Tx bytes using soft counters.

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: implement Tx mbuf free on demand
Chengwen Feng [Thu, 4 Mar 2021 07:44:43 +0000 (15:44 +0800)]
net/hns3: implement Tx mbuf free on demand

This patch add support tx_done_cleanup ops, which could support for
the API rte_eth_tx_done_cleanup to free consumed mbufs on Tx ring.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: add more registers to dump
Chengchang Tang [Thu, 4 Mar 2021 07:44:42 +0000 (15:44 +0800)]
net/hns3: add more registers to dump

This patch makes more registers dumped in the dump_reg API to help
locate the fault.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/hns3: support module EEPROM dump
Chengchang Tang [Thu, 4 Mar 2021 07:44:41 +0000 (15:44 +0800)]
net/hns3: support module EEPROM dump

This patch add support for dumping module EEPROM.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
3 years agonet/mlx5: fix imissed statistics
Matan Azrad [Thu, 25 Feb 2021 10:45:01 +0000 (10:45 +0000)]
net/mlx5: fix imissed statistics

The imissed port statistic counts packets that were dropped by the
device Rx queues.

In mlx5, the imissed counter summarizes 2 counters:
- packets dropped by the SW queue handling counted by SW.
- packets dropped by the HW queues due to "out of buffer" events
  detected when no SW buffer is available for the incoming
  packets.

There is HW counter object that should be created per device, and all
the Rx queues should be assigned to this counter in configuration time.

This part was missed when the Rx queues were created by DevX what
remained the "out of buffer" counter clean forever in this case.

Add 2 options to assign the DevX Rx queues to queue counter:
- Create queue counter per device by DevX and assign all the
  queues to it.
- Query the kernel counter and assign all the queues to it.

Use the first option by default and if it is failed, fallback to the
second option.

Fixes: e79c9be91515 ("net/mlx5: support Rx hairpin queues")
Fixes: dc9ceff73c99 ("net/mlx5: create advanced RxQ via DevX")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agocommon/mlx5: add DevX commands for queue counters
Matan Azrad [Thu, 25 Feb 2021 10:45:00 +0000 (10:45 +0000)]
common/mlx5: add DevX commands for queue counters

A queue counter set is an HW object that can be assigned to any RQ\QP
and it counts HW events on the assigned QPs\RQs.

Add DevX API to allocate and query queue counter set object.

The only used counter event is the "out of buffer" where the queue
drops packets when no SW buffer is available to receive it.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agocommon/mlx5: add DevX command to query WQ
Matan Azrad [Thu, 25 Feb 2021 10:44:59 +0000 (10:44 +0000)]
common/mlx5: add DevX command to query WQ

Add a DevX command to query Rx queues attributes created by VERBS.

Currently support only counter_set_id attribute.

This counter ID is managed by the kernel driver and being assigned to
any queue created by the kernel.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agocommon/mlx5/linux: add glue function to query WQ
Matan Azrad [Thu, 25 Feb 2021 10:44:58 +0000 (10:44 +0000)]
common/mlx5/linux: add glue function to query WQ

When Rx queue is created by VERBS API ibv_create_wq there is a dedicated
rdma-core API to query an information about this WQ(Work Queue).

VERBS WQ querying is needed for PMD cases which combine VERBS objects
with DevX objects.

Next feature to use this glue function is the HW queue counters.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
3 years agonet/pcap: fix file descriptor leak on close
Tengfei Zhang [Tue, 2 Mar 2021 16:51:30 +0000 (16:51 +0000)]
net/pcap: fix file descriptor leak on close

pcap fd was opend when vdev probed,
but not closed when vdev removed.

Fixes: c956caa6eabf ("pcap: support port hotplug")
Cc: stable@dpdk.org
Signed-off-by: Tengfei Zhang <zypscode@outlook.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>