dpdk.git
4 years agocommon/mlx5: share kernel interface name getter
Matan Azrad [Thu, 18 Jun 2020 19:06:02 +0000 (19:06 +0000)]
common/mlx5: share kernel interface name getter

Some configuration of the mlx5 port are done by the kernel net device
associated to the IB device represents the PCI device.

The DPDK mlx5 driver uses Linux system calls, for example ioctl, in
order to configure per port configurations requested by the DPDK user.

One of the basic knowledges required to access the correct kernel net
device is its name.

Move function to get interface name from IB device path to the common
library.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agovdpa/mlx5: adjust virtio queue protection domain
Matan Azrad [Tue, 2 Jun 2020 15:51:44 +0000 (15:51 +0000)]
vdpa/mlx5: adjust virtio queue protection domain

In other to fill the new requirement for virtq
configuration, set the single PD managed by the driver for
all the virtqs.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agocommon/mlx5: add virtio queue protection domain
Matan Azrad [Tue, 2 Jun 2020 15:51:43 +0000 (15:51 +0000)]
common/mlx5: add virtio queue protection domain

Starting from FW version 22.27.4002, it is required to
configure protection domain (PD) for each virtq created by
DevX.

Add PD requirement in virtq DevX APIs.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agoexamples/vdpa: add statistics show command
Matan Azrad [Thu, 18 Jun 2020 18:59:44 +0000 (18:59 +0000)]
examples/vdpa: add statistics show command

A new vDPA driver feature was added to query the virtq
statistics from the HW.

Use this feature to show the HW queues statistics for the virtqs.

Command description: stats X Y.
X is the device ID.
Y is the queue ID, Y=0xffff to show all the virtio queues
statistics of the device X.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agovdpa/mlx5: support virtio queue statistics get
Matan Azrad [Thu, 18 Jun 2020 18:59:43 +0000 (18:59 +0000)]
vdpa/mlx5: support virtio queue statistics get

Add support for statistics operations.

A DevX counter object is allocated per virtq in order to
manage the virtq statistics.

The counter object is allocated before the virtq creation
and destroyed after it, so the statistics are valid only in
the life time of the virtq.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agocommon/mlx5: support DevX virtq stats operations
Matan Azrad [Thu, 18 Jun 2020 18:59:42 +0000 (18:59 +0000)]
common/mlx5: support DevX virtq stats operations

Add DevX API to create and query virtio queue statistics
from the HW. The next counters are supported by the HW per
virtio queue:
received_desc.
completed_desc.
error_cqes.
bad_desc_errors.
exceed_max_chain.
invalid_buffer.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agovhost: introduce operation to get vDPA queue stats
Matan Azrad [Thu, 18 Jun 2020 18:59:41 +0000 (18:59 +0000)]
vhost: introduce operation to get vDPA queue stats

The vDPA device offloads all the datapath of the vhost
device to the HW device.

In order to expose to the user traffic information this
patch introduces new 3 APIs to get traffic statistics, the
device statistics name and to reset the statistics per
virtio queue.

The statistics are taken directly from the vDPA driver
managing the HW device and can be different for each vendor
driver.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agovhost: enable reply-ack systematically
Maxime Coquelin [Thu, 28 May 2020 09:03:47 +0000 (11:03 +0200)]
vhost: enable reply-ack systematically

As announced during v20.05 release cycle, this
patch makes reply-ack protocol feature to be enabled
unconditionally.

This protocol feature makes the communication between the
master and the slave more robust, avoiding for example
possible undefined behaviour with VHOST_USER_SET_MEM_TABLE.

Also, reply-ack support will be required for upcoming
VHOST_USER_SET_STATUS request.

Note that this protocol feature was disabled by default
because Qemu version 2.7.0 to 2.9.0 had a bug causing a
deadlock when reply-ack was negotiated and multiqueue
enabled. These Qemu version are now very old and no more
maintained, so we can reasonably consider we no more
support them.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
4 years agonet/ice/base: replace RSS profile locks
Qi Zhang [Thu, 11 Jun 2020 08:43:30 +0000 (16:43 +0800)]
net/ice/base: replace RSS profile locks

Replacing flow profile locks with RSS profile locks in the function to
remove all RSS rules for a given VSI. This is to align the locks used
for RSS rule addition to VSI and removal during VSI teardown to avoid
a race condition owing to several iterations of the above operations.
In function to get RSS rules for given VSI and protocol header replacing
the pointer reference of the RSS entry with a copy of hash value to
ensure thread safety.

Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: fix VSI ID mask to 10 bits
Qi Zhang [Thu, 11 Jun 2020 08:43:29 +0000 (16:43 +0800)]
net/ice/base: fix VSI ID mask to 10 bits

set_rss_lut failed due to incorrect vsi_id mask. vsi_id is 10 bit
but mask was 0x1FF whereas it should be 0x3FF.

For vsi_num >= 512, FW set_rss_lut has been failing with return code
EACCESS (vsi ownership issue) because software was providing
incorrect vsi_num (dropping 10th bit due to incorrect mask) for
set_rss_lut admin command

Fixes: a90fae1d0755 ("net/ice/base: add admin queue structures and commands")
Cc: stable@dpdk.org
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: choose TCP dummy packet by protocol
Qi Zhang [Thu, 11 Jun 2020 08:43:28 +0000 (16:43 +0800)]
net/ice/base: choose TCP dummy packet by protocol

In order to find proper dummy packets for switch filter,
it need to check ipv4 next protocol number, if it is 0x06,
which means next payload is TCP, we need to use TCP
format dummy packet.

Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: get tunnel type for recipe
Qi Zhang [Thu, 11 Jun 2020 08:43:27 +0000 (16:43 +0800)]
net/ice/base: get tunnel type for recipe

This patch add support to get tunnel type of recipe
after get recipe from FW. This will fix the issue in
function ice_find_recp() for tunnel type comparing.

Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: support flow director for GTPU with outer IPv6
Qi Zhang [Thu, 11 Jun 2020 08:43:26 +0000 (16:43 +0800)]
net/ice/base: support flow director for GTPU with outer IPv6

Add FDIR support for MAC_IPV6_GTPU type with outer IPv6 address, teid
and qfi fields matching.

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: rename misleading variable
Qi Zhang [Thu, 11 Jun 2020 08:43:25 +0000 (16:43 +0800)]
net/ice/base: rename misleading variable

The grst_delay variable in ice_check_reset contains the maximum time
(in 100 msec units) that the driver will wait for a reset event to
transition to the Device Active state. The value is the sum of three
separate components:
1) The maximum time it may take for the firmware to process its
outstanding command before handling the reset request.
2) The value in RSTCTL.GRSTDEL (the delay firmware inserts between first
seeing the driver reset request and the actual hardware assertion).
3) The maximum expected reset processing time in hardware.

Referring to this total time as "grst_delay" is misleading and
potentially confusing to someone checking the code and cross-referencing
the hardware specification.

Fix this by renaming the variable to "grst_timeout", which is more
descriptive of its actual use.

Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: add commands for system diagnostic
Qi Zhang [Thu, 11 Jun 2020 08:43:24 +0000 (16:43 +0800)]
net/ice/base: add commands for system diagnostic

System diagnostic solution extend the ability to fetch FW
internal status data and error indication.

Signed-off-by: Sharon Haroni <sharon.haroni@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: support flow director for outer IP of GTPU
Qi Zhang [Thu, 11 Jun 2020 08:43:23 +0000 (16:43 +0800)]
net/ice/base: support flow director for outer IP of GTPU

Add outer IP address fields while generating the training packets for
GTPU, so that we can support FDIR based on outer IP of GTPU.

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: refactor to avoid need to retry
Qi Zhang [Thu, 11 Jun 2020 08:43:22 +0000 (16:43 +0800)]
net/ice/base: refactor to avoid need to retry

The ice_discover_caps function is used to read the device and function
capabilities, updating the hardware capabilities structures with
relevant data.

The exact number of capabilities returned by the hardware is unknown
ahead of time. The AdminQ command will report the total number of
capabilities in the return buffer.

The current implementation involves requesting capabilities once,
reading this returned size, and then re-requested with that size.

This isn't really necessary. The firmware interface has a maximum size
of ICE_AQ_MAX_BUF_LEN. Firmware can never return more than
ICE_AQ_MAX_BUF_LEN / sizeof(struct ice_aqc_list_caps_elem) capabilities.

Avoid the retry loop by simply allocating a buffer of size
ICE_AQ_MAX_BUF_LEN. This is significantly simpler than retrying. The
extra allocation isn't a big deal, as it will be released after we
finish parsing the capabilities.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: adjust profile ID map locks
Qi Zhang [Thu, 11 Jun 2020 08:43:21 +0000 (16:43 +0800)]
net/ice/base: adjust profile ID map locks

The profile id map lock should be held till the caller completes
all references of that profile entries.

The current code releases the lock right after the match search.
This caused a driver issue when the profile map entries were
referenced after it was freed in other thread after the lock was
released earlier.

Also return type of get/set profile functions were changed to
return the ice status instead of the profile entry pointer.
This will prevent the caller referencing the profile fields
outside the lock.

Signed-off-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agobuild: replace meson OS detection with variable
Thomas Monjalon [Mon, 29 Jun 2020 20:31:19 +0000 (22:31 +0200)]
build: replace meson OS detection with variable

Some places were calling the meson function host_machine.system()
instead of the variables is_windows and is_linux defined
in config/meson.build.

At the same time, the missing "Linux restriction" reason is added to
pfe and octeontx2 crypto PMDs.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
4 years agoapp/flow-perf: use macro for cache alignment
Thomas Monjalon [Mon, 29 Jun 2020 20:57:50 +0000 (22:57 +0200)]
app/flow-perf: use macro for cache alignment

The macro __rte_cache_aligned is better suited for aligning
a structure on a cache line (of any size).

Fixes: 15c431864000 ("app/flow-perf: add packet forwarding support")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Wisam Jaddo <wisamm@mellanox.com>
4 years agoring: enable for Windows
Fady Bader [Sun, 21 Jun 2020 12:06:19 +0000 (15:06 +0300)]
ring: enable for Windows

Building ring on Windows.

Signed-off-by: Fady Bader <fady@mellanox.com>
4 years agodevtools: add Windows cross-build test with MinGW
Thomas Monjalon [Sun, 14 Jun 2020 21:58:45 +0000 (23:58 +0200)]
devtools: add Windows cross-build test with MinGW

The Meson cross file is renamed from meson_mingw.txt to cross-mingw,
and is added to test-meson-builds.sh.

The only example supported on Windows so far is "helloworld",
that's why the default list of examples is overridden.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
4 years agodevtools: add ppc64 in meson build test
Thomas Monjalon [Sun, 14 Jun 2020 22:01:44 +0000 (00:01 +0200)]
devtools: add ppc64 in meson build test

Add cross-compilation support of a PPC target in the build test matrix.
The CPU is defined as Power8, running as little endian.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
4 years agodevtools: allow non-standard toolchain in meson test
Thomas Monjalon [Sun, 14 Jun 2020 22:18:44 +0000 (00:18 +0200)]
devtools: allow non-standard toolchain in meson test

If a compiler is not found in $PATH, the compilation test is skipped.
In some cases, the compiler could be found after extending $PATH
in an environment configuration script (called by load-devel-config).

The decision to skip is deferred to a later stage, after loading the
configuration script.

In such case, the variable DPDK_TARGET, used by the configuration script
as input, is the compiler name.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
4 years agodevtools: shrink cross-compilation test definition
Thomas Monjalon [Sun, 14 Jun 2020 22:03:29 +0000 (00:03 +0200)]
devtools: shrink cross-compilation test definition

Each cross-compilation case needs to define the target compiler
and the meson cross file.
Given the compiler is already defined in the cross file,
the latter is enough.

The function "build" is changed to accept a cross file alternatively
to the compiler name. In the case of a file (detected if readable),
the compiler is extracted with sed and tr, and the option --cross-file
is automatically added.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
4 years agoeal/windows: fix thread handle
Tasnim Bashar [Thu, 25 Jun 2020 19:25:39 +0000 (12:25 -0700)]
eal/windows: fix thread handle

Casting thread ID to handle is not accurate way to get thread handle.
Need to use OpenThread function to get thread handle from thread ID.

pthread_setaffinity_np and pthread_getaffinity_np functions
for Windows are affected because of it.

Signed-off-by: Tasnim Bashar <tbashar@mellanox.com>
4 years agobus/pci: support Windows with bifurcated drivers
Tal Shnaiderman [Mon, 29 Jun 2020 12:37:40 +0000 (15:37 +0300)]
bus/pci: support Windows with bifurcated drivers

Uses SetupAPI.h functions to scan PCI tree.
Uses DEVPKEY_Device_Numa_Node to get the PCI NUMA node.
Uses SPDRP_BUSNUMBER and SPDRP_BUSNUMBER to get the BDF.
scanning currently supports types RTE_KDRV_NONE.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
4 years agobus/pci: introduce Windows support with stubs
Tal Shnaiderman [Mon, 29 Jun 2020 12:37:39 +0000 (15:37 +0300)]
bus/pci: introduce Windows support with stubs

Addition of stub eal and bus/pci functions to compile
bus/pci for Windows.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
4 years agopci: fix address domain format size
Tal Shnaiderman [Mon, 29 Jun 2020 12:37:36 +0000 (15:37 +0300)]
pci: fix address domain format size

the struct rte_pci_addr defines domain as uint32_t variable however
the PCI_PRI_FMT macro used for logging the struct sets the format
of domain to uint16_t.

The mismatch causes the following warning messages
in Windows clang build:

format specifies type 'unsigned short' but the argument
has type 'uint32_t' (aka 'unsigned int') [-Wformat]

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
4 years agopci: build on Windows
Tal Shnaiderman [Mon, 29 Jun 2020 12:37:35 +0000 (15:37 +0300)]
pci: build on Windows

Added <sys/types.h> in rte_pci header file
to include off_t type since it is missing for Windows.

Define the implementation of the Linux function rte_pci_get_sysfs_path
in pci_common.c for Linux OS only as it is unneeded for other OSs
and to avoid the warning on deprecated call to getenv() on Windows:

"warning: 'getenv' is deprecated: This function or variable may be unsafe.
Consider using _dupenv_s instead."

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
4 years agopci: use OS generic memory mapping functions
Tal Shnaiderman [Mon, 29 Jun 2020 12:37:34 +0000 (15:37 +0300)]
pci: use OS generic memory mapping functions

Changing all of PCIs Unix memory mapping to the
new memory allocation API wrapper.

Change all of PCI mapping function usage in
bus/pci to support the new API.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
4 years agoeal: move OS common options usage functions
Tal Shnaiderman [Mon, 29 Jun 2020 12:37:33 +0000 (15:37 +0300)]
eal: move OS common options usage functions

Move common functions between Unix and Windows to eal_common_options.c.

Those functions are getter functions for rte_application_usage_hook.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
4 years agoeal: move OS common config objects
Tal Shnaiderman [Mon, 29 Jun 2020 12:37:32 +0000 (15:37 +0300)]
eal: move OS common config objects

Move common functions between Unix and Windows to eal_common_config.c.

Those functions are getter functions for IOVA,
configuration, Multi-process.

Move rte_config, internal_config, early_mem_config and runtime_dir
to be defined in the common file with getter functions.

Refactor the users of the config variables above to use
the getter functions.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
4 years agobuild: generate version map file for MinGW
Tal Shnaiderman [Mon, 29 Jun 2020 12:37:41 +0000 (15:37 +0300)]
build: generate version map file for MinGW

The MinGW build for Windows has special cases where exported
function contain additional prefix:

__emutls_v.per_lcore__*

To avoid adding those prefixed functions to the version.map file
the map_to_def.py script was modified to create a map file for MinGW
with the needed changed.

The file name was changed to map_to_win.py and lib/meson.build map output
was unified with drivers/meson.build output

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
4 years agobuild: fix drivers library path on Windows
Tal Shnaiderman [Mon, 29 Jun 2020 12:37:38 +0000 (15:37 +0300)]
build: fix drivers library path on Windows

import library (/IMPLIB) in meson.build should use
the 'drivers' and not 'libs' folder.

The error is: fatal error LNK1149: output filename matches input filename.
The fix uses the correct folder.

Fixes: 5ed3766981 ("drivers: process shared link dependencies as for libs")
Cc: stable@dpdk.org
Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
4 years agobuild: skip pmdinfogen on Windows
Tal Shnaiderman [Mon, 29 Jun 2020 12:37:37 +0000 (15:37 +0300)]
build: skip pmdinfogen on Windows

pmdinfogen generation is currently unsupported for Windows.
The relevant part in meson.build is skipped.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
4 years agomk: add a paused deprecation warning before each build
Thomas Monjalon [Wed, 17 Jun 2020 23:52:32 +0000 (01:52 +0200)]
mk: add a paused deprecation warning before each build

DPDK 20.05 had some deprecation notes after "make config"
and after the build.
For DPDK 20.08, the config note is replaced with a warning
before the config and before the build.
After the warning, there is a pause which can be skipped
with the variable MAKE_PAUSE.

This deprecation process was discussed in the Technical Board:
http://mails.dpdk.org/archives/dev/2020-April/162839.html

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
4 years agodoc: update build instructions in the Linux guide
Thomas Monjalon [Wed, 17 Jun 2020 21:41:25 +0000 (23:41 +0200)]
doc: update build instructions in the Linux guide

Before removing the "make" build system completely,
the Linux guide instructions are made more concise and accurate.
Some detailed explanations are also available in
doc/guides/prog_guide/dev_kit_root_make_help.rst

This is the swan song for makefile system,
in order to have accurate information backported in LTS.

Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
4 years agodoc: remove some build instructions where unneeded
Thomas Monjalon [Wed, 17 Jun 2020 21:37:51 +0000 (23:37 +0200)]
doc: remove some build instructions where unneeded

The build should be described only in few places,
in order to maintain up-to-date, accurate and detailed instructions.
This change is removing some of the unneeded repetitions.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
4 years agodoc: remove outdated guidelines for library addition
Thomas Monjalon [Wed, 17 Jun 2020 21:29:32 +0000 (23:29 +0200)]
doc: remove outdated guidelines for library addition

There was a doc about how to extend DPDK by adding a library.
It could have been useful but was never updated,
so it is lacking a lot of explanations about doxygen,
meson, versioning, maintainership, etc.

Anyway such guidelines should fit in the contributors guide.
Better to completely remove this obsolete document.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
4 years agoapp/flow-perf: add packet forwarding support
Wisam Jaddo [Thu, 4 Jun 2020 13:35:02 +0000 (13:35 +0000)]
app/flow-perf: add packet forwarding support

Introduce packet forwarding support to the app to do
some performance measurements.

The measurements are reported in term of packet per
second unit. The forwarding will start after the end
of insertion/deletion operations.

The support has single and multi performance measurements.

Signed-off-by: Wisam Jaddo <wisamm@mellanox.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
4 years agoapp/flow-perf: add memory dump to app
Wisam Jaddo [Thu, 4 Jun 2020 13:35:01 +0000 (13:35 +0000)]
app/flow-perf: add memory dump to app

Introduce new feature to dump memory statistics of each socket
and a total for all before and after the creation.

This will give two main advantage:
1- Check the memory consumption for large number of flows
"insertion rate scenario alone"

2- Check that no memory leackage after doing insertion then
deletion.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Signed-off-by: Wisam Jaddo <wisamm@mellanox.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
4 years agoapp/flow-perf: add deletion rate calculation
Wisam Jaddo [Thu, 4 Jun 2020 13:35:00 +0000 (13:35 +0000)]
app/flow-perf: add deletion rate calculation

Add the ability to test deletion rate for flow performance
application.

This feature is disabled by default, and can be enabled by
add "--deletion-rate" in the application command line options.

Signed-off-by: Wisam Jaddo <wisamm@mellanox.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
4 years agoapp/flow-perf: add insertion rate calculation
Wisam Jaddo [Thu, 4 Jun 2020 13:34:59 +0000 (13:34 +0000)]
app/flow-perf: add insertion rate calculation

Add insertion rate calculation feature into flow
performance application.

The application now provide the ability to test
insertion rate of specific rte_flow rule, by
stressing it to the NIC, and calculate the
insertion rate.

The application offers some options in the command
line, to configure which rule to apply.

After that the application will start producing
rules with same pattern but increasing the outer IP
source address by 1 each time, thus it will give
different flow each time, and all other items will
have open masks.

The current design have single core insertion rate.

Signed-off-by: Wisam Jaddo <wisamm@mellanox.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
4 years agoapp/flow-perf: add flow performance skeleton
Wisam Jaddo [Thu, 4 Jun 2020 13:34:58 +0000 (13:34 +0000)]
app/flow-perf: add flow performance skeleton

Add flow performance application skeleton.

Signed-off-by: Wisam Jaddo <wisamm@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
4 years agombuf: add dump of free dynamic flags
Xiaolong Ye [Sat, 13 Jun 2020 15:49:21 +0000 (23:49 +0800)]
mbuf: add dump of free dynamic flags

Add support to dump free_flags as below format:

Free bit in mbuf->ol_flags (0 = occupied, 1 = free):
  0000: 0 0 0 0 0 0 0 0
  0008: 0 0 0 0 0 0 0 0
  0010: 0 0 0 0 0 0 0 1
  0018: 1 1 1 1 1 1 1 1
  0020: 1 1 1 1 1 1 1 1
  0028: 1 0 0 0 0 0 0 0
  0030: 0 0 0 0 0 0 0 0
  0038: 0 0 0 0 0 0 0 0

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
4 years agombuf: fix dynamic field dump log
Xiaolong Ye [Sat, 13 Jun 2020 15:49:20 +0000 (23:49 +0800)]
mbuf: fix dynamic field dump log

For each mbuf byte, free_space[i] == 0 means the space is occupied,
free_space[i] != 0 means space is free.

Fixes: 4958ca3a443a ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
4 years agombuf: fix free space update for dynamic field
Xiaolong Ye [Sat, 13 Jun 2020 15:49:19 +0000 (23:49 +0800)]
mbuf: fix free space update for dynamic field

The value free_space[i] is used to save the size of biggest aligned
element that can fit in the zone, current implementation has one flaw,
for example, if user registers dynfield1 (size = 4, align = 4, req = 124)
first, the free_space would be as below after registration:

  0070: 08 08 08 08 08 08 08 08
  0078: 08 08 08 08 00 00 00 00

Then if user continues to register dynfield2 (size = 4, align = 4),
free_space would become:

  0070: 00 00 00 00 04 04 04 04
  0078: 04 04 04 04 00 00 00 00

Further request dynfield3 (size = 8, align = 8) would fail to register
due to alignment requirement can't be satisfied, though there is enough
space remained in mbuf.

This patch fixes above issue by saving alignment only in aligned zone,
after the fix, above registrations order can be satisfied, free_space
would be like:

After dynfield1 registration:

  0070: 08 08 08 08 08 08 08 08
  0078: 04 04 04 04 00 00 00 00

After dynfield2 registration:

  0070: 08 08 08 08 08 08 08 08
  0078: 00 00 00 00 00 00 00 00

After dynfield3 registration:

  0070: 00 00 00 00 00 00 00 00
  0078: 00 00 00 00 00 00 00 00

This patch also reduces iterations in process_score() by jumping align
steps in each loop.

Fixes: 4958ca3a443a ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
4 years agombuf: fix error code in dynamic field/flag registration
Xiaolong Ye [Sat, 13 Jun 2020 15:49:18 +0000 (23:49 +0800)]
mbuf: fix error code in dynamic field/flag registration

Set rte_errno as ENOMEM when allocation failure.

Fixes: 4958ca3a443a ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
4 years agombuf: fix boundary check at dynamic field registration
Xiaolong Ye [Sat, 13 Jun 2020 15:49:17 +0000 (23:49 +0800)]
mbuf: fix boundary check at dynamic field registration

We should make sure off + size < sizeof(struct rte_mbuf) to avoid
possible out-of-bounds access of free_space array, there is no issue
currently due to the low bits of free_flags (which is adjacent to
free_space) are always set to 0. But we shouldn't rely on it since it's
fragile and layout of struct mbuf_dyn_shm may be changed in the future.
This patch adds boundary check explicitly to avoid potential risk of
out-of-bounds access.

Fixes: 4958ca3a443a ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
4 years agobus/pci: fix VF memory access
Haiyue Wang [Thu, 25 Jun 2020 03:50:46 +0000 (11:50 +0800)]
bus/pci: fix VF memory access

To fix CVE-2020-12888, the linux vfio-pci module will invalidate mmaps
and block MMIO access on disabled memory, it will send a SIGBUS to the
application:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abafbc551fdd

When the application opens the vfio PCI device, the vfio-pci module will
enable the bus memory space through PCI read/write access. According to
the PCIe specification, the 'Memory Space Enable' is always zero for VF:

             Table 9-13 Command Register Changes

Bit Location | PF and VF Register Differences | PF         | VF
             | From Base                      | Attributes | Attributes
-------------+--------------------------------+------------+-----------
             | Memory Space Enable - Does not |            |
             | apply to VFs. Must be hardwired|  Base      |  0b
     1       | to 0b for VFs. VF Memory Space |            |
             | is controlled by the VF MSE bit|            |
             | in the VF Control register.    |            |
-------------+--------------------------------+------------+-----------

Afterwards the vfio-pci will initialize its own virtual PCI config space
data ('vconfig') by reading the VF's physical PCI config space, then the
'Memory Space Enable' bit in vconfig will always be 0b value. This will
make the vfio-pci treat the BAR memory space as disabled, and the SIGBUS
will be triggered if access these BARs.

By investigation, the VF PCI device *passthrough* into the Guest OS by
QEMU has the 'Memory Space Enable' with 1b value. That's because every
PCI driver will start to enable the memory space, and this action will
be hooked by vfio-pci virtual PCI read/write to set the 'Memory Space
Enable' in vconfig space to 1b. So VF runs in guest OS has 'Mem+', but
VF runs in host OS has 'Mem-'.

Align with PCI working mode in Guest/QEMU/Host, in DPDK, enable the PCI
bus memory space explicitly to avoid access on disabled memory.

Fixes: 33604c31354a ("vfio: refactor PCI BAR mapping")
Cc: stable@dpdk.org
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Harman Kalra <hkalra@marvell.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Tested-by: Thierry Martin <thierry.martin.public@gmail.com>
4 years agotest/bitops: fix command name
David Marchand [Fri, 19 Jun 2020 09:58:28 +0000 (11:58 +0200)]
test/bitops: fix command name

Caught by code review, bitops test name is incorrect.

Fixes: 7660614c11e2 ("test/bitops: add bit operations test case")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Phil Yang <phil.yang@arm.com>
4 years agoeal: remove redundant newline in alert message
David Marchand [Wed, 10 Jun 2020 14:30:24 +0000 (16:30 +0200)]
eal: remove redundant newline in alert message

rte_eal_init_alert() already appends a newline.

Fixes: 0a529578f162 ("eal: clean up unused files on initialization")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agousertools: fix telemetry user socket path
Ciara Power [Wed, 10 Jun 2020 13:30:33 +0000 (14:30 +0100)]
usertools: fix telemetry user socket path

The path to the socket when running the script as a regular user needed
to be updated to match the logic in EAL.

Fixes: 6a2967c112a3 ("usertools: add new telemetry script")
Cc: stable@dpdk.org
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
4 years agobus/vmbus: fix ring buffer mapping
Long Li [Fri, 12 Jun 2020 00:48:25 +0000 (17:48 -0700)]
bus/vmbus: fix ring buffer mapping

vmbus_map_addr is used as the next start virtual address for mapping ring
buffer. However it's updated based on ring_buf, which is a pointer to an
address on the stack. The next ring buffer may be mapped to an unexpected
address.

Fix this by calculating vmbus_map_addr based on returned virtual address.

Fixes: 3f9277031a2e ("bus/vmbus: fix check for mmap failure")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agosched: fix 64-bit rate
Archit Pandey [Tue, 2 Jun 2020 08:55:28 +0000 (14:25 +0530)]
sched: fix 64-bit rate

64-bit support was missing from the functions pipe_profile_check
and rte_sched_subport_config_pipe_profile_table.

Fixes: 68c1f26d4236 ("sched: support 64-bit values")
Cc: stable@dpdk.org
Signed-off-by: Archit Pandey <architpandeynitk@gmail.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
4 years agosched: fix subport freeing
Hrvoje Habjanic [Tue, 26 May 2020 17:24:55 +0000 (19:24 +0200)]
sched: fix subport freeing

In function rte_sched_subport_free, there is code to free all allocated
stuff related to scheduler subport.
First there are some checks, and in the end, rte_bitmap_free is called.

Now, rte_bitmap_free is a dummy function, and it just checks if
provided pointer to bitmap is valid or not. So, actual memory for
subport is not freed.

This patch fixes this by removing call to rte_bitmap_free, and
instead calling rte_free.

Fixes: d9213b829a31 ("sched: remove pipe params config from port level")
Cc: stable@dpdk.org
Signed-off-by: Hrvoje Habjanic <hrvoje.habjanic@zg.ht.hr>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
4 years agoexamples: add flush after stats printing
Georgiy Levashov [Tue, 28 Apr 2020 13:27:41 +0000 (14:27 +0100)]
examples: add flush after stats printing

When printf()'s stdout is line-buffered for terminal, it is fully
buffered for pipes. So, stdout listener can only get the output
when it is flushed (on program termination, when buffer is filled or
manual flush).

stdout buffer might fill slowly since every stats report could be small.

Also when it is fully filled it might contain a part of the last stats
report which makes it very inconvenient for any automation which reads
and parses the output.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Georgiy Levashov <georgiy.levashov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
4 years agobus/pci: optimize bus scan
Jerin Jacob [Wed, 24 Jun 2020 11:46:17 +0000 (17:16 +0530)]
bus/pci: optimize bus scan

In order to optimize the PCI management, RTE_KDRV_NONE based
device driver probing removed by not adding them to list in
the scan phase.

The legacy virtio is the only consumer of RTE_KDRV_NONE based device
driver probe scheme. The legacy virtio support will be available
through the existing VFIO/UIO based kernel driver scheme.

This patch also removes the deprecation notice for the same.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
4 years agonet: fix IPv4 checksum
Hongzhi Guo [Tue, 26 May 2020 10:08:05 +0000 (18:08 +0800)]
net: fix IPv4 checksum

0xffff is invalid for IPv4 checksum (RFC1624)

Fixes: 6006818cfb26 ("net: new checksum functions")
Cc: stable@dpdk.org
Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
4 years agobpf/x86: support packet data load instructions
Konstantin Ananyev [Wed, 27 May 2020 14:16:53 +0000 (15:16 +0100)]
bpf/x86: support packet data load instructions

Make x86 JIT to generate native code for
(BPF_ABS | <size> | BPF_LD) and (BPF_IND | <size> | BPF_LD)
instructions.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
4 years agobpf: support packet data load instructions
Konstantin Ananyev [Wed, 27 May 2020 14:16:51 +0000 (15:16 +0100)]
bpf: support packet data load instructions

To fill the gap with linux kernel eBPF implementation,
add support for two non-generic instructions:
(BPF_ABS | <size> | BPF_LD) and (BPF_IND | <size> | BPF_LD)
which are used to access packet data.
These instructions can only be used when BPF context is a pointer
to 'struct rte_mbuf' (i.e: RTE_BPF_ARG_PTR_MBUF type).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
4 years agobpf: fix add/sub min/max estimations
Konstantin Ananyev [Wed, 27 May 2020 14:16:50 +0000 (15:16 +0100)]
bpf: fix add/sub min/max estimations

eval_add()/eval_sub() not always correctly estimate
minimum and maximum possible values of add/sub operations.

Fixes: 8021917293d0 ("bpf: add extra validation for input BPF program")
Cc: stable@dpdk.org
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
4 years agotest/bpf: fix few small issues
Konstantin Ananyev [Wed, 27 May 2020 14:16:49 +0000 (15:16 +0100)]
test/bpf: fix few small issues

Address for few small issues:
 - unreachable return statement
 - failed test-case can finish with 'success' status

Also use unified cmp_res() function to check return value.

Cc: stable@dpdk.org
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
4 years agoeal/windows: support exit and panic
Tal Shnaiderman [Tue, 23 Jun 2020 20:57:21 +0000 (23:57 +0300)]
eal/windows: support exit and panic

Support the debug functions in eal_common_debug.c for Windows.

Implementation of rte_dump_stack to get a backtrace similarly to Unix
and of rte_eal_cleanup in eal.c.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
4 years agoeal: move OS common debug functions to single file
Tal Shnaiderman [Tue, 23 Jun 2020 20:57:20 +0000 (23:57 +0300)]
eal: move OS common debug functions to single file

Move common functions between Unix and Windows to eal_common_debug.c.

Those functions are rte_exit, __rte_panic and rte_dump_registers
which has the same implementation on Unix and Windows.

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
4 years agoeal/linux: fix epoll fd list rebuild for interrupts
Harman Kalra [Fri, 19 Jun 2020 13:59:28 +0000 (19:29 +0530)]
eal/linux: fix epoll fd list rebuild for interrupts

An issue has been observed where epoll file descriptor
list rebuilds every time an interrupt/alarm event is
received.

eal_intr_process_interrupts() should notify pipe fd only
if any source is removed from the source list i.e (rv > 0)

Fixes: 0c7ce182a760 ("eal: add pending interrupt callback unregister")
Cc: stable@dpdk.org
Signed-off-by: Harman Kalra <hkalra@marvell.com>
4 years agomaintainers: update for interrupt subsystem
Harman Kalra [Wed, 17 Jun 2020 12:28:07 +0000 (17:58 +0530)]
maintainers: update for interrupt subsystem

Updating MAINTAINERS file for interrupt subsystem.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
4 years agometer: remove inline functions from export list
Fady Bader [Wed, 17 Jun 2020 08:24:31 +0000 (11:24 +0300)]
meter: remove inline functions from export list

The code didn't compile when using exported meter functions under Windows.

error LNK2001: unresolved external symbol
rte_meter_srtcm_color_aware_check
error LNK2001: unresolved external symbol
rte_meter_srtcm_color_blind_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_color_aware_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_color_blind_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_rfc4115_color_aware_check
error LNK2001: unresolved external symbol
rte_meter_trtcm_rfc4115_color_blind_check

The cause was that there were some inline functions that were included in
the export list.
To solve this the functions were removed from rte_meter_version.map export
list which are implemented in the header and shouldn't be exported.

Fixes: 655796d2b5fb ("meter: support RFC4115 trTCM")
Fixes: 9d41beed24b0 ("lib: provide initial versioning")
Cc: stable@dpdk.org
Signed-off-by: Fady Bader <fady@mellanox.com>
4 years agotimer: support EAL functions on Windows
Fady Bader [Thu, 18 Jun 2020 06:55:22 +0000 (09:55 +0300)]
timer: support EAL functions on Windows

Implemented the needed Windows eal timer functions.

Signed-off-by: Fady Bader <fady@mellanox.com>
Reviewed-by: Tal Shnaiderman <talshn@mellanox.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
4 years agotimer: move from common to Unix directory
Fady Bader [Thu, 18 Jun 2020 06:55:21 +0000 (09:55 +0300)]
timer: move from common to Unix directory

EAL common timer doesn't compile under Windows.

Compilation log:
error LNK2019:
unresolved external symbol nanosleep referenced in function
rte_delay_us_sleep
error LNK2019:
unresolved external symbol get_tsc_freq referenced in function set_tsc_freq
error LNK2019:
unresolved external symbol sleep referenced in function set_tsc_freq

The reason was that some functions called POSIX functions.
The solution was to move POSIX dependent functions from common to Unix.

Signed-off-by: Fady Bader <fady@mellanox.com>
Reviewed-by: Tal Shnaiderman <talshn@mellanox.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
4 years agodoc: clarify compilation with MinGW-w64
Dmitry Kozlyuk [Sat, 20 Jun 2020 22:35:44 +0000 (01:35 +0300)]
doc: clarify compilation with MinGW-w64

Provide a more direct link for installer download and clarify thread
model choice during installation. As pthread is not a requirement,
remove notice about its possible runtime dependency.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
4 years agoconfig/x86: remove path for MinGW-w64 cross toolchain
Dmitry Kozlyuk [Sat, 20 Jun 2020 22:35:43 +0000 (01:35 +0300)]
config/x86: remove path for MinGW-w64 cross toolchain

Absolute paths in Meson cross-file impose unnecessary limitation
on build environment, remove them.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
4 years agoconfig: never link with pthread on Windows
Dmitry Kozlyuk [Sat, 20 Jun 2020 22:35:42 +0000 (01:35 +0300)]
config: never link with pthread on Windows

Even if pthread is provided by the toolchain, it is not needed for DPDK
on Windows, because internal shim is used. As a side-effect, this
enables cross-build with MinGW configured with non-POSIX thread library,
e.g. mcfgthread, which is the default on some distributions.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
4 years agonet/cxgbe: always enable HASH filter support
Karra Satwik [Fri, 12 Jun 2020 22:10:20 +0000 (03:40 +0530)]
net/cxgbe: always enable HASH filter support

Disable all unused firmware resources during init time to give
more resources for HASH (exact-match) filter region and always
request firmware to enable HASH filter support when resources
are available.

Signed-off-by: Karra Satwik <kaara.satwik@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
4 years agonet/mlx5/linux: add memory region callbacks to Verbs
Ophir Munk [Tue, 16 Jun 2020 09:44:46 +0000 (09:44 +0000)]
net/mlx5/linux: add memory region callbacks to Verbs

Create a set of verbs callbacks in 'struct mlx5_verbs_ops'
and add MR operations to it (file net/mlx5/linux/mlx5_verbs.c).

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agonet/mlx5: add memory region callbacks in per-device cache
Ophir Munk [Tue, 16 Jun 2020 09:44:45 +0000 (09:44 +0000)]
net/mlx5: add memory region callbacks in per-device cache

Prior to this commit MR operations were verbs based and hard coded under
common/mlx5/linux directory. This commit enables upper layers (e.g.
net/mlx5) to determine which MR operations to use. For example the net
layer could set devx based MR operations in non-Linux environments. The
reg_mr and dereg_mr callbacks are added to the global per-device MR
cache 'struct mlx5_mr_share_cache'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agocommon/mlx5: export memory region Verbs operations
Ophir Munk [Tue, 16 Jun 2020 09:44:44 +0000 (09:44 +0000)]
common/mlx5: export memory region Verbs operations

The glue verbs operations reg_mr and dereg_mr are wrapped and exported
in functions mlx5_common_verbs_reg_mr and mlx5_common_verbs_dereg_mr
respectively.  The exported functions are added to a new file
linux/mlx5_common_verbs.c.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agocommon/mlx5: remove memory region dependency on Verbs
Ophir Munk [Tue, 16 Jun 2020 09:44:43 +0000 (09:44 +0000)]
common/mlx5: remove memory region dependency on Verbs

Replace 'struct ibv_mr *' (in 'struct mlx5_mr') with a new 'struct
mlx5_pmd_mr'.  The new struct contains the required MR field: lkey,
addr, len and is independent of ibv.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agonet/cxgbe: ignore flow default masks for unrequested fields
Rahul Lakkireddy [Fri, 12 Jun 2020 22:07:27 +0000 (03:37 +0530)]
net/cxgbe: ignore flow default masks for unrequested fields

commit 536db938a444 ("net/cxgbe: add devargs to control filtermode and
filtermask") allows configuring hardware to select specific combination
of header fields to match in the incoming packets. However, the default
mask is set for all fields in the requested pattern items, even if the
field is not explicitly set in the combination and results in
validation errors. To prevent this, ignore setting the default masks
for the unrequested fields and the hardware will also ignore them in
validation, accordingly. Also, tweak the filter spec before finalizing
the masks.

Fixes: 536db938a444 ("net/cxgbe: add devargs to control filtermode and filtermask")
Cc: stable@dpdk.org
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
4 years agonet/cxgbe: fix SMT leak in filter error and free path
Rahul Lakkireddy [Fri, 12 Jun 2020 22:07:26 +0000 (03:37 +0530)]
net/cxgbe: fix SMT leak in filter error and free path

Free up Source MAC Table (SMT) entry properly during filter create
failure and filter delete.

Fixes: 993541b2fa4f ("net/cxgbe: support flow API for source MAC rewrite")
Cc: stable@dpdk.org
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
4 years agonet/cxgbe: fix double MPS alloc by flow validate and create
Rahul Lakkireddy [Fri, 12 Jun 2020 22:07:25 +0000 (03:37 +0530)]
net/cxgbe: fix double MPS alloc by flow validate and create

The Multi Port Switch (MPS) entry is allocated twice when both
flow validate and create are invoked, but only freed once during
flow destroy. Avoid double alloc by moving MPS entry allocation
closer to when the filter create request is sent to hardware and
will be ignored for filter validate request.

Fixes: fefee7a619a4 ("net/cxgbe: add flow ops to match based on dest MAC")
Cc: stable@dpdk.org
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
4 years agonet/cxgbe: fix L2T leak in filter error and free path
Rahul Lakkireddy [Fri, 12 Jun 2020 22:07:24 +0000 (03:37 +0530)]
net/cxgbe: fix L2T leak in filter error and free path

Free up Layer 2 Table (L2T) entry properly during filter create
failure and filter delete.

Fixes: 1decc62b1cbe ("net/cxgbe: add flow operations to offload VLAN actions")
Cc: stable@dpdk.org
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
4 years agonet/cxgbe: fix CLIP leak in filter error path
Rahul Lakkireddy [Fri, 12 Jun 2020 22:07:23 +0000 (03:37 +0530)]
net/cxgbe: fix CLIP leak in filter error path

Free up Compressed Local IP (CLIP) entry properly during filter
creation failure path. Also consolidate all various tables
cleanup to a common function and invoke it from both wild-card
and exact-match filter paths.

Fixes: af44a577988b ("net/cxgbe: support to offload flows to HASH region")
Cc: stable@dpdk.org
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
4 years agonet/ice/base: remove PPPoD from PPPoE bitmap
Qi Zhang [Mon, 15 Jun 2020 02:05:15 +0000 (10:05 +0800)]
net/ice/base: remove PPPoD from PPPoE bitmap

Remove PPPoD's packet type from PPPoE's ptype bitmap.

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: update IPv4 and IPv6 flow packet type masks
Qi Zhang [Mon, 15 Jun 2020 02:05:14 +0000 (10:05 +0800)]
net/ice/base: update IPv4 and IPv6 flow packet type masks

In the flow API, add ability to add IPV4/IPV6 rules that match on
packets with or without inner L4 protocols.

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: add 1G SGMII PHY type
Qi Zhang [Mon, 15 Jun 2020 02:05:13 +0000 (10:05 +0800)]
net/ice/base: add 1G SGMII PHY type

There isn't a case for 1G SGMII in ice_get_media_type() so add
the handling for it.

Also handle the special case where some direct attach
cables may report that they support 1G SGMII, but
that is erroneous since SGMII is supposed to be a
backplane media type (between a MAC and a PHY). If
the driver doesn't handle this special case then a
user could see the 'Port' in ethtool change from
'Direct attach Copper' to 'Backplane' when they have
forced the speed to 1G, but the cable hasn't changed.

Lastly, change ice_aq_get_phy_caps() to save the
module_type info if the function was called with
ICE_AQC_REPORT_TOPO_CAP. This call uses the media
information to populate the module_type. If no
media is present then the values in module_type
will be 0.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: support E823L devices
Qi Zhang [Mon, 15 Jun 2020 02:05:12 +0000 (10:05 +0800)]
net/ice/base: support E823L devices

Add support for E823L devices.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: fix initializing resource for field vector
Qi Zhang [Mon, 15 Jun 2020 02:05:11 +0000 (10:05 +0800)]
net/ice/base: fix initializing resource for field vector

This patch add initialization for prof_res_bm_init flag
to zero in order that the possible resource for field vector
in the package file can be initialized.(in ice_init_prof_result_bm)

Fixes: 453d087ccaff ("net/ice/base: add common functions")
Cc: stable@dpdk.org
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: add more tunnel type for IPv4 and IPv6
Qi Zhang [Mon, 15 Jun 2020 02:05:10 +0000 (10:05 +0800)]
net/ice/base: add more tunnel type for IPv4 and IPv6

This patch add more tunnel type definition ipv4/ipv6 packet,
it enable tcp/udp layer of ipv4/ipv6 as L4 payload but without
L4 dst/src port number as input set for switch filter rule.

For example:
we can download a switch rule to direct ipv4 packet with specific
source and destination ip address to queue index 1.
"eth / ipv4 src is 192.168.0.1 dst is 192.168.0.2 / udp /
end actions queue index 1 / end"
this type of rule will be matched with ipv4/udp file only.

Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: fix reference count on VSI list update
Qi Zhang [Mon, 15 Jun 2020 02:05:09 +0000 (10:05 +0800)]
net/ice/base: fix reference count on VSI list update

The parameter ref_cnt is used for tracking how many
rules are reusing this VSI list, so it can only be
updated when a rule which using this list be deleted.

Fixes: f89aa3affa9e ("net/ice/base: support removing advanced rule")
Cc: stable@dpdk.org
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: remove unused code for VSI list free
Qi Zhang [Mon, 15 Jun 2020 02:05:08 +0000 (10:05 +0800)]
net/ice/base: remove unused code for VSI list free

When free vsi list resource after vsi list update to empty, some
useless code in function ice_remove_vsi_list_rule() should be deleted.

Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: add command to LLDP
Qi Zhang [Mon, 15 Jun 2020 02:05:07 +0000 (10:05 +0800)]
net/ice/base: add command to LLDP

Add support for LLDP forwarding to SW programming in FW
LLDP Filter Control is 0x0A0A.

Signed-off-by: Sharon Haroni <sharon.haroni@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: distribute Tx queues evenly
Qi Zhang [Mon, 15 Jun 2020 02:05:06 +0000 (10:05 +0800)]
net/ice/base: distribute Tx queues evenly

Distribute the tx queues evenly across all queue groups. This will
help the queues to get more equal sharing among the queues when all
are in use.

In the previous algorithm, the next queue group node will be picked up
only after the previous one filled with max children.
For example: if VSI is configured with 9 queues, the first 8 queues
will be assigned to queue group 1 and the 9th queue will be assigned to
queue group 2.

The 2 queue groups split the bandwidth between them equally (50:50).
The first queue group node will share the 50% bandwidth with all of
its children (8 queues). And the second queue group node will share
the entire 50% bandwidth with its only children.

Signed-off-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: adjust scheduler default bandwidth weight
Qi Zhang [Mon, 15 Jun 2020 02:05:05 +0000 (10:05 +0800)]
net/ice/base: adjust scheduler default bandwidth weight

By default the queues are configured in legacy mode. The default
bandwidth settings for legacy/advanced modes are different. The existing
code was using the advanced mode default value of 1 which was
incorrect. This caused the unbalanced BW sharing among siblings.
The recommended default value is applied.

Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: clear error status before set FC
Qi Zhang [Mon, 15 Jun 2020 02:05:04 +0000 (10:05 +0800)]
net/ice/base: clear error status before set FC

ice_set_fc takes a u8 pointer 'aq_failures' as an input parameter. If
this function encounters an error, in addition to returning an
appropriate ice_status enum code, it also populates aq_failures with a
link specific error value.

If the caller does not initialize this variable to 0 before calling
ice_set_fc, it would appear as if ice_set_fc returned an error code in
this variable. So initialize it to 0.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: avoid PPPoE IPv4 overlap
Qi Zhang [Mon, 15 Jun 2020 02:05:03 +0000 (10:05 +0800)]
net/ice/base: avoid PPPoE IPv4 overlap

When PPPoE header is not selected, PPPoE should not be included in
ipv4 ptype bitmaps.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: support checking all autoneg enable bits
Qi Zhang [Mon, 15 Jun 2020 02:05:02 +0000 (10:05 +0800)]
net/ice/base: support checking all autoneg enable bits

struct ice_aqc_get_phy_caps_data has multiple autoneg enable bits.
ice_is_phy_caps_an_enabled checks all bits and returns true if any
autoneg enable bits are set.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: remove unimplemented function prototypes
Qi Zhang [Mon, 15 Jun 2020 02:05:01 +0000 (10:05 +0800)]
net/ice/base: remove unimplemented function prototypes

There are no implementations for these two functions so remove the
prototypes.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>
4 years agonet/ice/base: add entries in profile TCAM with priority
Qi Zhang [Mon, 15 Jun 2020 02:05:00 +0000 (10:05 +0800)]
net/ice/base: add entries in profile TCAM with priority

The profile TCAM tables are implemented such that entries with a smaller
index in the table have a higher priority. When records to be added to
the table have flags to differentiate between standard PTG and VSIG
records, then these entries need to have higher priority in order to be
found and processed first.

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Qiming Yang <qiming.yang@intel.com>