dpdk.git
5 years agonet/ice/base: add helper functions for flow management
Qi Zhang [Mon, 25 Mar 2019 05:44:29 +0000 (13:44 +0800)]
net/ice/base: add helper functions for flow management

1. ice_rem_all_sw_rules_info - remove all switch rules.
2. ice_reply_all_fltr - replay all filters stored in book keeping list.

These APIs will be used when switch rule feature is enabled.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: add MAC filter with marker and counter
Qi Zhang [Mon, 25 Mar 2019 05:44:28 +0000 (13:44 +0800)]
net/ice/base: add MAC filter with marker and counter

1. ice_add_mac_with_sw_marker - add filter with software marker.
2. ice_add_mac_with_counter - add filter with counter enabled.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: add functions to get VSI promiscuous mode
Qi Zhang [Mon, 25 Mar 2019 05:44:27 +0000 (13:44 +0800)]
net/ice/base: add functions to get VSI promiscuous mode

1. ice_get_vsi_promisc - get promiscuous mode of give VSI.
2. ice_get_vsi_vlan_promisc - get VLAN promiscuous mode of given VSI.

PMD may use these APIs to check the real HW status, but not rely on
a software flag when something abnormal.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: add functions for resource counter
Qi Zhang [Mon, 25 Mar 2019 05:44:26 +0000 (13:44 +0800)]
net/ice/base: add functions for resource counter

1. ice_alloc_res_cntr - allocate resource counter
2. ice_free_res_cntr - free resource counter
3. ice_alloc_vlan_res_counter - allocate vlan resource counter
4. ice_free_vlan_res_counter - free vlan resource counter

These APIs will be used when try to count the number of a flow be
hit.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: add functions to get allocated resources
Qi Zhang [Mon, 25 Mar 2019 05:44:25 +0000 (13:44 +0800)]
net/ice/base: add functions to get allocated resources

1. ice_aq_get_res_alloc - get allocated resources.
2. ice_aq_get_res_descs - get allocated resource descriptors.

These APIs may help to PMD to enable some debug utilities to
dump the resource allocation status.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: add functions for ethertype filter
Qi Zhang [Mon, 25 Mar 2019 05:44:24 +0000 (13:44 +0800)]
net/ice/base: add functions for ethertype filter

Add API ice_remove_eth_mac and ice_add_eth_mac to support
adding / removing ethertype (or MAC) based filter rules.

PMD driver can use these APIs to enable related rte_flow rule.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: add VSI queue context framework
Qi Zhang [Mon, 25 Mar 2019 05:44:23 +0000 (13:44 +0800)]
net/ice/base: add VSI queue context framework

Added code to allocate VSI queue contexts to save the queue specific
information like bandwidth etc.

Signed-off-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: add more APIs in switch module
Qi Zhang [Mon, 25 Mar 2019 05:44:22 +0000 (13:44 +0800)]
net/ice/base: add more APIs in switch module

Add below APIs in switch module

1. ice_aq_get_vsi_params -  get VSI context info
2. ice_aq_add_update_mir_rule - add/update mirror rule
3. ice_aq_delete_mir_rule - delete mirror rule
4. ice_aq_set_storm_ctrl - set storm control configuration
5. ice_aq_get_storm_ctrl - get storm control configuration

PMD can use these APIs to enable mirror rule and storm control
related features.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: declare functions as external
Qi Zhang [Mon, 25 Mar 2019 05:44:21 +0000 (13:44 +0800)]
net/ice/base: declare functions as external

Remove static of below functions and declare them as external
APIs.

ice_aq_add_vsi
ice_aq_free_vsi
ice_aq_update_vsi
ice_aq_add_lan_txq
ice_init_pkg

So far the purpose is just to sync with kernel driver.
They are reserved for future use.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: remove unnecessary code
Qi Zhang [Mon, 25 Mar 2019 05:44:20 +0000 (13:44 +0800)]
net/ice/base: remove unnecessary code

Remove unnecessary macro and data structure.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: allow package copy to be used after resets
Qi Zhang [Mon, 25 Mar 2019 05:44:19 +0000 (13:44 +0800)]
net/ice/base: allow package copy to be used after resets

For components that make a copy of an external pipeline package file
(i.e.  the Linux and FreeBSD drivers), save the size of the package
file along with the copy so that both can be used when calling
ice_init_pkg() after a CORER/GLOBR reset.
Also, do not free the copy of the package file in ice_init_pkg()
since it is needed afterward for subsequent resets.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: add helper macros
Qi Zhang [Mon, 25 Mar 2019 05:44:18 +0000 (13:44 +0800)]
net/ice/base: add helper macros

1. Add macro ice_for_each_traffic_class to loop for each
traffic class.
2. Add macro MIN_T to wrap min with type conversion.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: add two helper functions
Qi Zhang [Mon, 25 Mar 2019 05:44:17 +0000 (13:44 +0800)]
net/ice/base: add two helper functions

Add two helper functions in common module.
1. ice_aq_set_mac_cfg to help configure maximum frame size with AQ
command
2. ice_get_ctx help to extract context bits from a packet structure.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: improve comments
Qi Zhang [Mon, 25 Mar 2019 05:44:16 +0000 (13:44 +0800)]
net/ice/base: improve comments

Improve comments to follow naming rules.
Also the patch include some minor cleanup.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice/base: add switch resource allocation and free
Qi Zhang [Mon, 25 Mar 2019 05:44:15 +0000 (13:44 +0800)]
net/ice/base: add switch resource allocation and free

Add two APIs ice_alloc_sw and ice_free_sw to support
switch related resource allocation and free.

These APIs are required when we enable switch flow.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
5 years agonet/ice: support vector AVX2 in Tx
Wenzhuo Lu [Tue, 26 Mar 2019 06:16:51 +0000 (14:16 +0800)]
net/ice: support vector AVX2 in Tx

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
5 years agonet/ice: support Rx scatter AVX2 vector
Wenzhuo Lu [Tue, 26 Mar 2019 06:16:50 +0000 (14:16 +0800)]
net/ice: support Rx scatter AVX2 vector

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
5 years agonet/ice: support Rx AVX2 vector
Wenzhuo Lu [Tue, 26 Mar 2019 06:16:49 +0000 (14:16 +0800)]
net/ice: support Rx AVX2 vector

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
5 years agonet/ice: support Tx SSE vector
Wenzhuo Lu [Tue, 26 Mar 2019 06:16:48 +0000 (14:16 +0800)]
net/ice: support Tx SSE vector

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
5 years agonet/ice: support Rx scatter SSE vector
Wenzhuo Lu [Tue, 26 Mar 2019 06:16:47 +0000 (14:16 +0800)]
net/ice: support Rx scatter SSE vector

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
5 years agonet/ice: support vector SSE in Rx
Wenzhuo Lu [Tue, 26 Mar 2019 06:16:46 +0000 (14:16 +0800)]
net/ice: support vector SSE in Rx

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
5 years agonet/ice: add pointer for queue buffer release
Wenzhuo Lu [Tue, 26 Mar 2019 06:16:45 +0000 (14:16 +0800)]
net/ice: add pointer for queue buffer release

Add function pointers of buffer releasing for RX and
TX queues, for vector functions will be added for RX
and TX.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
5 years agonet/ice: fix Tx function setting
Wenzhuo Lu [Tue, 26 Mar 2019 06:16:44 +0000 (14:16 +0800)]
net/ice: fix Tx function setting

The TX setting functions is not called.

Fixes: 17c7d0f9d6a4 ("net/ice: support basic Rx/Tx")
Cc: stable@dpdk.org
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
5 years agoapp/test: fix build with musl libc
Natanael Copa [Wed, 13 Mar 2019 17:06:50 +0000 (18:06 +0100)]
app/test: fix build with musl libc

Fix following build error with musl libc:

app/test/test_eal_flags.c:152:55: error:
'O_RDONLY' undeclared (first use in this function)
      fd = openat(dirfd(hugepage_dir), dirent->d_name, O_RDONLY);
                                                       ^~~~~~~~

Fixes: 45f1b6e8680a ("app: add new tests on eal flags")
Cc: stable@dpdk.org
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
5 years agoapp/test: fix flags with meson
Natanael Copa [Wed, 13 Mar 2019 17:06:51 +0000 (18:06 +0100)]
app/test: fix flags with meson

in app/test/meson.build the default_cflag is never used so the
-D_GNU_SOURCE was never passed as intended.

Fixes the following build error with musl libc:

lib/librte_eal/common/include/rte_lcore.h:26:9: error:
unknown type name 'cpu_set_t'
 typedef cpu_set_t rte_cpuset_t;
         ^~~~~~~~~

The problem is that cpu_set_t is only defined when _GNU_SOURCE is set.

Fixes: 5d7b673d5fd6 ("mk: build with _GNU_SOURCE defined by default")
Cc: stable@dpdk.org
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
5 years agonet/netvsc: fix include of fcntl.h
Natanael Copa [Wed, 13 Mar 2019 17:06:56 +0000 (18:06 +0100)]
net/netvsc: fix include of fcntl.h

Fix the following warning when building with musl libc:

In file included from ../drivers/net/netvsc/hn_vf.c:14:
/usr/include/sys/fcntl.h:1:2: warning: #warning redirecting
incorrect #include <sys/fcntl.h> to <fcntl.h> [-Wcpp]

Fixes: dc7680e8597c ("net/netvsc: support integrated VF")
Cc: stable@dpdk.org
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
5 years agonet/nfp: fix build with musl libc
Natanael Copa [Wed, 13 Mar 2019 17:06:49 +0000 (18:06 +0100)]
net/nfp: fix build with musl libc

Fixes following build error on systems without execinfo.h:

drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c:19:10: fatal error:
execinfo.h: No such file or directory
 #include <execinfo.h>
          ^~~~~~~~~~~~

Fixes: c7e9729da6b5 ("net/nfp: support CPP")
Cc: stable@dpdk.org
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Acked-by: Alejandro Lucero <alejandro.lucero@netronome.com>
5 years agobus/fslmc: fix build with musl libc
Natanael Copa [Wed, 13 Mar 2019 17:06:47 +0000 (18:06 +0100)]
bus/fslmc: fix build with musl libc

This fixes the following compile error with musl libc:

drivers/bus/fslmc/qbman/include/compat.h:41:10: error:
'stdout' undeclared (first use in this function)
   fflush(stdout); \
          ^~~~~~

Fixes: 531b17a780dc ("bus/fslmc: add QBMAN driver to bus")
Cc: stable@dpdk.org
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
5 years agobus/fslmc: remove unused include of error.h
Natanael Copa [Wed, 13 Mar 2019 17:06:48 +0000 (18:06 +0100)]
bus/fslmc: remove unused include of error.h

Fixes following build error with musl libc:

In file included from drivers/bus/fslmc/qbman/qbman_debug.c:6:
drivers/bus/fslmc/qbman/include/compat.h:21:10: fatal error:
error.h: No such file or directory
 #include <error.h>
          ^~~~~~~~~

Apparently it is not used anywere in qbman so simply remove the include.

Fixes: 531b17a780dc ("bus/fslmc: add QBMAN driver to bus")
Cc: stable@dpdk.org
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
5 years agoeal/linux: remove thread ID from debug message
Natanael Copa [Wed, 13 Mar 2019 17:06:57 +0000 (18:06 +0100)]
eal/linux: remove thread ID from debug message

There is no guarantee that pthread_self() returns the thread ID or that
pthread_t is an integer. The thread ID is not that useful so simply
remove it.

This fixes the following warning when building with musl libc:

lib/librte_eal/linuxapp/eal/eal_dev.c: In function 'sigbus_handler':
lib/librte_eal/linuxapp/eal/eal_dev.c:70:3: warning:
cast from pointer to integer of different size [-Wpointer-to-int-cast]
   (int)pthread_self(), info->si_addr);
   ^

Fixes: 0fc54536b14a ("eal: add failure handling for hot-unplug")
Cc: stable@dpdk.org
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
5 years agodoc: announce deprecation of VFIO DMA map functions
Shahaf Shuler [Sun, 10 Mar 2019 08:28:03 +0000 (10:28 +0200)]
doc: announce deprecation of VFIO DMA map functions

As those should be replaced by rte_dev_dma_map and rte_dev_dma_unmap
APIs.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
5 years agonet/mlx5: support PCI device DMA map and unmap
Shahaf Shuler [Sun, 10 Mar 2019 08:28:02 +0000 (10:28 +0200)]
net/mlx5: support PCI device DMA map and unmap

The implementation reuses the external memory registration work done by
commit[1].

Note about representors:

The current representor design will not work
with those map and unmap functions. The reason is that for representors
we have multiple IB devices share the same PCI function, so mapping will
happen only on one of the representors and not all of them.

While it is possible to implement such support, the IB representor
design is going to be changed during DPDK19.05. The new design will have
a single IB device for all representors, hence sharing of a single
memory region between all representors will be possible.

[1]
commit 7e43a32ee060
("net/mlx5: support externally allocated static memory")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: refactor external memory registration
Shahaf Shuler [Sun, 10 Mar 2019 08:28:01 +0000 (10:28 +0200)]
net/mlx5: refactor external memory registration

Move the memory region creation to a separate function to
prepare the ground for the reuse of it on the PCI driver map and unmap
functions.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agobus: introduce device level DMA memory mapping
Shahaf Shuler [Sun, 10 Mar 2019 08:28:00 +0000 (10:28 +0200)]
bus: introduce device level DMA memory mapping

The DPDK APIs expose 3 different modes to work with memory used for DMA:

1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.

2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.

3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.

The scope of the patch focus on #3 above.

Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).

The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.

For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.

Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.

Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
5 years agovfio: skip DMA map failure if already mapped
Shahaf Shuler [Sun, 10 Mar 2019 08:27:59 +0000 (10:27 +0200)]
vfio: skip DMA map failure if already mapped

Currently vfio DMA map function will fail in case the same memory
segment is mapped twice.

This is too strict, as this is not an error to map the same memory
twice.

Instead, use the kernel return value to detect such state and have the
DMA function to return as successful.

For type1 mapping the kernel driver returns EEXISTS.
For spapr mapping EBUSY is returned since kernel 4.10.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
5 years agovfio: allow DMA map to the default container
Shahaf Shuler [Sun, 10 Mar 2019 08:27:58 +0000 (10:27 +0200)]
vfio: allow DMA map to the default container

Enable users the option to call rte_vfio_dma_map with request to map
to the default vfio fd.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
5 years agoexamples: detect default build directory
Bruce Richardson [Wed, 27 Mar 2019 13:58:05 +0000 (13:58 +0000)]
examples: detect default build directory

Most examples have in their makefiles a default RTE_TARGET directory to be
used in case RTE_TARGET is not set. Rather than just using a hard-coded
default, we can instead detect what the build directory is relative to
RTE_SDK directory.

This fixes a potential issue for anyone who continues to build using
"make install T=x86_64-native-linuxapp-gcc" and skips setting RTE_TARGET
explicitly, instead relying on the fact that they were building in a
directory which corresponded to the example default path - which was
changed to "x86_64-native-linux-gcc" by commit 218c4e68c1d9 ("mk: use
linux and freebsd in config names").

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agokni: calculate MTU from mbuf size
Liron Himi [Tue, 26 Mar 2019 18:40:10 +0000 (20:40 +0200)]
kni: calculate MTU from mbuf size

- mbuf_size and mtu are now being calculated according
to the given mb-pool.

- max_mtu is now being set according to the given mtu

the above two changes provide the ability to work with jumbo frames

Signed-off-by: Liron Himi <lironh@marvell.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agomem: warn user when running without NUMA support
Anatoly Burakov [Fri, 29 Mar 2019 14:01:29 +0000 (14:01 +0000)]
mem: warn user when running without NUMA support

Running in non-legacy mode on a NUMA-enabled system without libnuma
is unsupported, so explicitly print out a warning when trying to
do so.

Running in legacy mode without libnuma is still supported whether or
not we are running with libnuma support enabled, so also fix init to
allow that scenario.

Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agoci: fix arm64 config filename
David Marchand [Fri, 29 Mar 2019 15:55:38 +0000 (16:55 +0100)]
ci: fix arm64 config filename

The ARM64 config file has been renamed in the commit
ae2f2fee247a ("build: rename linuxapp to linux in meson cross files").

Fixes: 99889bd85228 ("ci: introduce Travis builds for GitHub repositories")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
5 years agopower: add some logs on requests
Lukasz Krakowiak [Thu, 28 Mar 2019 16:11:54 +0000 (17:11 +0100)]
power: add some logs on requests

Extend debugs on power instruction and cmd police destroy
requests.

Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
5 years agopower: update error handling
Lukasz Krakowiak [Thu, 28 Mar 2019 15:55:07 +0000 (16:55 +0100)]
power: update error handling

Update for handling negative returned status from functions
call.

Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agopower: fix frequency list buffer validation
Kevin Traynor [Wed, 6 Feb 2019 12:19:06 +0000 (12:19 +0000)]
power: fix frequency list buffer validation

The frequency list buffer was already validated in
power_acpi_cpufreq_freqs(), so the newly added check was redundant.
To keep consistency with power_pstate_cpufreq_freqs(), remove the
original check and update the log message.

Fixes: 2e6ccdb4e088 ("power: fix frequency list to handle null buffer")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
5 years agonet/dpaa2: support flow table flush
Hemant Agrawal [Tue, 26 Mar 2019 12:01:48 +0000 (12:01 +0000)]
net/dpaa2: support flow table flush

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
5 years agobus/dpaa: delay fman device list to bus probe
Hemant Agrawal [Tue, 26 Mar 2019 12:01:46 +0000 (12:01 +0000)]
bus/dpaa: delay fman device list to bus probe

The fman device list need to be accessed across processes.
The hw device structures should be allocated with rte_calloc
instead of calloc. The rte_calloc is not available at the
time of bus scan, so better prepare the device list at probe.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
5 years agomempool/dpaa: allocate bp info for multiprocess
Akhil Goyal [Tue, 26 Mar 2019 12:01:45 +0000 (12:01 +0000)]
mempool/dpaa: allocate bp info for multiprocess

rte_dpaa_bpid_info shall be allocated with the hugepage memory
which can be shared across processes.

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
5 years agobus/dpaa: save fq lookup table for secondary process
Akhil Goyal [Tue, 26 Mar 2019 12:01:44 +0000 (12:01 +0000)]
bus/dpaa: save fq lookup table for secondary process

A reference to qman_fq_lookup_table need to be saved in each
fq, so that it is retrieved while in running secondary process.

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
5 years agobus/dpaa: fix Rx discard register mask
Shreyansh Jain [Fri, 22 Feb 2019 10:09:44 +0000 (10:09 +0000)]
bus/dpaa: fix Rx discard register mask

Current value of 'fmbm_rfsdm' register (0x010CE3F0) doesn't include
the bit to drop colored (red) packets. New value (0x010EE3F0) fixes
this.
Check with 'fmbm_rffc' register of fm_port_bmi_regs.

Fixes: 6d6b4f49a155 ("bus/dpaa: add FMAN hardware operations")
Cc: stable@dpdk.org
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
5 years agobus/fslmc: remove unneeded strdup
Stephen Hemminger [Tue, 12 Mar 2019 17:11:52 +0000 (10:11 -0700)]
bus/fslmc: remove unneeded strdup

The fslmc bus code was duplicating the device name and
doing extra initialization. The code can be simplified
to just use the device name directly.

Compile tested only; do not have this hardware.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
5 years agobus/fslmc: decrease log level for unsupported devices
Stephen Hemminger [Tue, 12 Mar 2019 17:11:51 +0000 (10:11 -0700)]
bus/fslmc: decrease log level for unsupported devices

When fslmc is built as part of a general distribution, the
bus code will log errors when other devices are present.

This could confuse users it is not an error.

Fixes: 50245be05d1a ("bus/fslmc: support device blacklisting")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
5 years agonet/netvsc: remove unnecessary format of MAC address
Stephen Hemminger [Fri, 8 Feb 2019 03:44:07 +0000 (19:44 -0800)]
net/netvsc: remove unnecessary format of MAC address

The ethernet address was being converted to a string but
the code using that is no longer present.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
5 years agobus/vmbus: refactor secondary mapping
Stephen Hemminger [Fri, 8 Feb 2019 03:44:06 +0000 (19:44 -0800)]
bus/vmbus: refactor secondary mapping

The secondary mapping function was duplicating the code
used to search the uio_resource list.

Skip the unwinding since map failure already makes device
unusable.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
5 years agobus/vmbus: map ring in secondary process
Stephen Hemminger [Fri, 8 Feb 2019 03:44:05 +0000 (19:44 -0800)]
bus/vmbus: map ring in secondary process

Need to remember primary channel in secondary process.
Then use it to iterate over subchannels in secondary
process mapping setup.

Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
5 years agobus/vmbus: stop mapping if empty resource found
Stephen Hemminger [Fri, 8 Feb 2019 03:44:04 +0000 (19:44 -0800)]
bus/vmbus: stop mapping if empty resource found

If vmbus is run on older kernel (without all the uio mappings),
then the bus driver should stop when it hits the missing mappings
rather than recording the empty values.

Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
5 years agobus/vmbus: fix check for mmap failure
Stephen Hemminger [Fri, 8 Feb 2019 03:44:03 +0000 (19:44 -0800)]
bus/vmbus: fix check for mmap failure

The code was testing the result of mmap incorrectly.
I.e the test that a local pointer is not MAP_FAILED would
always succeed and therefore hid any potential problems.

Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
5 years agonet/netvsc: fix VF support with secondary process
Stephen Hemminger [Fri, 8 Feb 2019 03:44:02 +0000 (19:44 -0800)]
net/netvsc: fix VF support with secondary process

The VF device management in netvsc was using a pointer to the
rte_eth_devices. But the actual rte_eth_devices array is likely to
be place in the secondary process; which causes a crash.

The solution is to record the port of the VF (instead of a pointer)
and find the device in the per process array as needed.

Fixes: dc7680e8597c ("net/netvsc: support integrated VF")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
5 years agobus/vmbus: fix secondary process setup
Stephen Hemminger [Fri, 8 Feb 2019 03:44:01 +0000 (19:44 -0800)]
bus/vmbus: fix secondary process setup

The secondary process doesn't correctly map the second
and later resources because it doesn't change the offset.

Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
5 years agomalloc: fix IPC message initialization
Anatoly Burakov [Fri, 29 Mar 2019 10:56:15 +0000 (10:56 +0000)]
malloc: fix IPC message initialization

The memset size for an IPC message is set incorrectly. Fix it to
cover the entire IPC message.

Fixes: 07dcbfe0101f ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agofbarray: fix init unlock without lock
Anatoly Burakov [Fri, 29 Mar 2019 10:57:08 +0000 (10:57 +0000)]
fbarray: fix init unlock without lock

Certain failure paths of rte_fbarray_init() will unlock the
mem area lock without locking it first. Fix this by properly
handling the failures.

Fixes: 5b61c62cfd76 ("fbarray: add internal tailq for mapped areas")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agofbarray: fix attach deadlock
Darek Stojaczyk [Fri, 29 Mar 2019 09:52:39 +0000 (10:52 +0100)]
fbarray: fix attach deadlock

rte_fbarray_attach() currently locks its internal
spinlock, but never releases it. Secondary processes
won't even start if there is more than one fbarray
to be attached to - the second rte_fbarray_attach()
would be just stuck.

Fix it by releasing the lock at the end of
rte_fbarray_attach(). I believe this was the original
intention.

Fixes: 5b61c62cfd76 ("fbarray: add internal tailq for mapped areas")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agodrivers: fix SPDX license id consistency
Stephen Hemminger [Wed, 6 Feb 2019 22:27:57 +0000 (14:27 -0800)]
drivers: fix SPDX license id consistency

All drivers should have SPDX on the first line of the source
files in the format
  /* SPDX-License-Identifier: ...

Several files used minor modifications which were inconsistent
with the pattern. Fix it to make scanning tools easier.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agovfio: document multiprocess limitation for container API
Anatoly Burakov [Wed, 27 Feb 2019 15:41:24 +0000 (15:41 +0000)]
vfio: document multiprocess limitation for container API

Currently, there is no support for sharing custom VFIO containers
between multiple processes, but it is not documented.

Document this limitation.

Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agoeal: remove redundant atomic API description
Thomas Monjalon [Tue, 19 Mar 2019 21:16:00 +0000 (22:16 +0100)]
eal: remove redundant atomic API description

Atomic functions are described in doxygen of the file
lib/librte_eal/common/include/generic/rte_atomic.h
The copies in arch-specific files are redundant
and confuse readers about the genericity of the API.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agoeal/ppc: fix global memory barrier
Dekel Peled [Mon, 18 Mar 2019 12:58:13 +0000 (14:58 +0200)]
eal/ppc: fix global memory barrier

From previous patch description: "to improve performance on PPC64,
use light weight sync instruction instead of sync instruction."

Excerpt from IBM doc [1], section "Memory barrier instructions":
"The second form of the sync instruction is light-weight sync,
or lwsync.
This form is used to control ordering for storage accesses to system
memory only. It does not create a memory barrier for accesses to
device memory."

This patch removes the use of lwsync, so calls to rte_wmb() and
rte_rmb() will provide correct memory barrier to ensure order of
accesses to system memory and device memory.

[1] https://www.ibm.com/developerworks/systems/articles/powerpc.html

Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
5 years agomem: count overcommit hugepages as available
Michał Mirosław [Mon, 25 Feb 2019 20:57:28 +0000 (21:57 +0100)]
mem: count overcommit hugepages as available

With nr_overcommit_hugepages > 0 application may be able to allocate
hugepages even when free_hugepages == 0. Take this into account when
counting available hugepages.

Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agomem: attempt multiple hugepage allocations at init
Anatoly Burakov [Fri, 22 Feb 2019 16:14:03 +0000 (16:14 +0000)]
mem: attempt multiple hugepage allocations at init

When requesting memory with ``-m`` or ``--socket-mem`` flags,
currently the init will fail if the requested memory amount was
bigger than any one memseg list, even if total amount of
available memory was sufficient.

Fix this by making EAL to attempt to allocate pages multiple
times, until we either fulfill our memory requirements, or run
out of hugepages to allocate.

Bugzilla ID: 95

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agomem: improve best-effort allocation
Anatoly Burakov [Fri, 22 Feb 2019 16:14:02 +0000 (16:14 +0000)]
mem: improve best-effort allocation

Previously, when using non-exact allocation, we were requesting
N pages to be allocated, but allowed the memory subsystem to
allocate less than requested. However, we were still expecting
to see N contigous free pages in the memseg list.

This presents a problem because there is no way to try and
allocate as many pages as possible, even if there isn't
enough contiguous free entries in the list.

To address this, use the new "find biggest" fbarray API's when
allocating non-exact number of pages. This way, we will first
check how many entries in the list are actually available, and
then try to allocate up to that number.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agofbarray: add API to find biggest used or free chunks
Anatoly Burakov [Fri, 22 Feb 2019 16:14:01 +0000 (16:14 +0000)]
fbarray: add API to find biggest used or free chunks

Currently, while there is a way to find total amount of used/free
space in an fbarray, there is no way to find biggest contiguous
chunk. Add such API, as well as unit tests to test this API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agofbarray: add internal tailq for mapped areas
Anatoly Burakov [Tue, 26 Feb 2019 17:13:11 +0000 (17:13 +0000)]
fbarray: add internal tailq for mapped areas

Currently, there are numerous reliability issues with fbarray,
such as:
- There is no way to prevent attaching to overlapping memory
  areas
- There is no way to prevent double-detach
- Failed destroy leaves fbarray in an invalid state (fbarray
  itself is valid, but its backing memory area is already
  detached)

In addition, on FreeBSD, doing mmap() on a file descriptor
does not keep the lock, so we also need to store the fd
in order to keep the lock.

This patch improves upon fbarray to address both of these
issues by adding an internal tailq to track allocated areas
and their respective file descriptors.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agoservice: fix parameter type for attribute
Nikhil Rao [Thu, 28 Mar 2019 06:29:03 +0000 (11:59 +0530)]
service: fix parameter type for attribute

The type of value parameter to rte_service_attr_get
should be uint64_t *, since the attributes
are of type uint64_t.

Fixes: 4d55194d76a4 ("service: add attribute get function")

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
5 years agohash: optimize signature compare for Arm NEON
Ruifeng Wang [Tue, 12 Feb 2019 07:01:04 +0000 (15:01 +0800)]
hash: optimize signature compare for Arm NEON

Implemented signature compare function based on neon intrinsic.
Hash bulk lookup had 3% - 6% performance gain after optimization.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
5 years agotest/timer: replace config macro with runtime log level
Dharmik Thakkar [Tue, 26 Feb 2019 23:02:29 +0000 (17:02 -0600)]
test/timer: replace config macro with runtime log level

This patch replaces macro with log-level based approach to print debug
information. Need to set timer log type to debug  using the following
eal parameter: --log-level=test.timer:debug

Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agotest/efd: replace config macro with runtime log level
Dharmik Thakkar [Tue, 26 Feb 2019 23:02:28 +0000 (17:02 -0600)]
test/efd: replace config macro with runtime log level

This patch enables compilation of print_key_info() always using
log-level based approach instead of a macro. Need to set efd log type
to debug to print debug information, using the following eal parameter:
--log-level=test.efd:debug

Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agotest/hash: replace config macro with runtime log level
Dharmik Thakkar [Tue, 26 Feb 2019 23:02:27 +0000 (17:02 -0600)]
test/hash: replace config macro with runtime log level

Need to set hash log type to debug to print debug information, using
following eal parameter: --log-level=test.hash:debug

Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agotest/ticketlock: add test cases
Joyce Kong [Mon, 25 Mar 2019 11:11:09 +0000 (19:11 +0800)]
test/ticketlock: add test cases

Add test cases for ticket lock, recursive ticket lock,
and ticket lock performance.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agoticketlock: enable generic ticketlock on all arch
Joyce Kong [Mon, 25 Mar 2019 11:11:08 +0000 (19:11 +0800)]
ticketlock: enable generic ticketlock on all arch

Let all architectures use generic ticketlock implementation.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agoticketlock: introduce fair ticket based locking
Joyce Kong [Mon, 25 Mar 2019 11:11:07 +0000 (19:11 +0800)]
ticketlock: introduce fair ticket based locking

The spinlock implementation is unfair, some threads may take locks
aggressively while leaving the other threads starving for long time.

This patch introduces ticketlock which gives each waiting thread a
ticket and they can take the lock one by one. First come, first serviced.
This avoids starvation for too long time and is more predictable.

Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agotest/rwlock: amortize the cost of getting time
Joyce Kong [Mon, 25 Mar 2019 09:14:59 +0000 (17:14 +0800)]
test/rwlock: amortize the cost of getting time

Instead of getting timestamp per iteration, amortize its
overhead can help to get more precise benchmarking results.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agotest/rwlock: benchmark on all available cores
Joyce Kong [Mon, 25 Mar 2019 09:14:58 +0000 (17:14 +0800)]
test/rwlock: benchmark on all available cores

Add performance test on all available cores to benchmark
the scaling up performance of rw_lock.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Suggested-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agorwlock: reimplement with atomic builtins
Joyce Kong [Mon, 25 Mar 2019 09:14:57 +0000 (17:14 +0800)]
rwlock: reimplement with atomic builtins

The __sync builtin based implementation generates full memory
barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins
to generate one way barriers.

Here is the assembly code of __sync_compare_and_swap builtin.
__sync_bool_compare_and_swap(dst, exp, src);
   0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
   0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
   0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
   0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
   0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
   0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
   0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
<rte_atomic16_cmpset+52>  // b.any
   0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
   0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
<rte_atomic16_cmpset+32>
   0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
   0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq = none

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Tested-by: Joyce Kong <joyce.kong@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agospinlock: reimplement with atomic one-way barrier
Gavin Hu [Fri, 8 Mar 2019 07:56:37 +0000 (15:56 +0800)]
spinlock: reimplement with atomic one-way barrier

The __sync builtin based implementation generates full memory barriers
('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way
barriers.

Here is the assembly code of __sync_compare_and_swap builtin.
__sync_bool_compare_and_swap(dst, exp, src);
   0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
   0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
   0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
   0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
   0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
   0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
   0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
<rte_atomic16_cmpset+52>  // b.any
   0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
   0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
<rte_atomic16_cmpset+32>
   0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
   0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq = none

The benchmarking results showed constant improvements on all available
platforms:
1. Cavium ThunderX2: 126% performance;
2. Hisilicon 1616: 30%;
3. Qualcomm Falkor: 13%;
4. Marvell ARMADA 8040 with A72 cores on macchiatobin: 3.7%

Here is the example test result on TX2:
$sudo ./build/app/test -l 16-27 -- i
RTE>>spinlock_autotest

*** spinlock_autotest without this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 53886 us
Core [17] Cost Time = 53605 us
Core [18] Cost Time = 53163 us
Core [19] Cost Time = 49419 us
Core [20] Cost Time = 34317 us
Core [21] Cost Time = 53408 us
Core [22] Cost Time = 53970 us
Core [23] Cost Time = 53930 us
Core [24] Cost Time = 53283 us
Core [25] Cost Time = 51504 us
Core [26] Cost Time = 50718 us
Core [27] Cost Time = 51730 us
Total Cost Time = 612933 us

*** spinlock_autotest with this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 18808 us
Core [17] Cost Time = 29497 us
Core [18] Cost Time = 29132 us
Core [19] Cost Time = 26150 us
Core [20] Cost Time = 21892 us
Core [21] Cost Time = 24377 us
Core [22] Cost Time = 27211 us
Core [23] Cost Time = 11070 us
Core [24] Cost Time = 29802 us
Core [25] Cost Time = 15793 us
Core [26] Cost Time = 7474 us
Core [27] Cost Time = 29550 us
Total Cost Time = 270756 us

In the tests on ThunderX2, with more cores contending, the performance gain
was even higher, indicating the __atomic implementation scales up better
than __sync.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agotest/spinlock: amortize the cost of getting time
Gavin Hu [Fri, 8 Mar 2019 07:56:36 +0000 (15:56 +0800)]
test/spinlock: amortize the cost of getting time

Instead of getting timestamps per iteration, amortize its overhead
can help getting more precise benchmarking results.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agotest/spinlock: remove delay for correct benchmarking
Gavin Hu [Fri, 8 Mar 2019 07:56:35 +0000 (15:56 +0800)]
test/spinlock: remove delay for correct benchmarking

The test is to benchmark the performance of spinlock by counting the
number of spinlock acquire and release operations within the specified
time.
A typical pair of lock and unlock operations costs tens or hundreds of
nano seconds, in comparison to this, delaying 1 us outside of the locked
region is too much, compromising the goal of benchmarking the lock and
unlock performance.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agoring: enforce reading tail before slots
Gavin Hu [Tue, 12 Mar 2019 16:58:53 +0000 (00:58 +0800)]
ring: enforce reading tail before slots

In weak memory models, like arm64, reading the prod.tail may get
reordered after reading the ring slots, which corrupts the ring and
stale data is observed.

This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
likely caused by missing the acquire semantics when reading
prod.tail (in SC dequeue) which makes it possible to read a
stale value from the ring slots.

For MP (and MC) case, rte_atomic32_cmpset() already provides the required
ordering. For SP case, the control depependency between if-statement (which
depends on the read of r->cons.tail) and the later stores to the ring slots
make RMB unnecessary. About the control dependency, read more at:
https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf

This patch is adding the required read barrier to prevent reading the ring
slots get reordered before reading prod.tail for SC case.

Fixes: c9fb3c62896f ("ring: move code in a new header file")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agoeal: roundup TSC frequency when estimating
Pavan Nikhilesh [Sat, 16 Mar 2019 19:01:54 +0000 (19:01 +0000)]
eal: roundup TSC frequency when estimating

When estimating tsc frequency using sleep/gettime round it up to the
nearest multiple of 10Mhz for more accuracy.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
5 years agoeal: add macro to align value to the nearest multiple
Pavan Nikhilesh [Sat, 16 Mar 2019 19:01:50 +0000 (19:01 +0000)]
eal: add macro to align value to the nearest multiple

Add macro to align value to the nearest multiple of the given value,
resultant value might be greater than or less than the first parameter
whichever difference is the lowest.
Update unit test to include the new macro.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
5 years agouse appropriate EAL macro for constructors
Jerin Jacob [Mon, 18 Mar 2019 04:15:56 +0000 (04:15 +0000)]
use appropriate EAL macro for constructors

Use eal's RTE_INIT abstraction for defining constructors.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
5 years agoeal: add pending interrupt callback unregister
Jakub Grajciar [Fri, 22 Mar 2019 11:52:36 +0000 (12:52 +0100)]
eal: add pending interrupt callback unregister

use case: if callback is used to receive message form socket,
and the message received is disconnect/error, this callback needs
to be unregistered, but cannot because it is still active.

With this patch it is possible to mark the callback to be
unregistered once the interrupt process is done with this
interrupt source.

Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
5 years agoeal/linux: fix log levels for pagemap reading failure
Kevin Traynor [Thu, 14 Feb 2019 17:56:56 +0000 (17:56 +0000)]
eal/linux: fix log levels for pagemap reading failure

Commit cdc242f260e7 says:
    For Linux kernel 4.0 and newer, the ability to obtain
    physical page frame numbers for unprivileged users from
    /proc/self/pagemap was removed. Instead, when an IOMMU
    is present, simply choose our own DMA addresses instead.

In this case the user still sees error messages, so adjust
the log levels. Later, other checks will ensure that errors
are logged in the appropriate cases.

Fixes: cdc242f260e7 ("eal/linux: support running as unprivileged user")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
5 years agodoc: deprecate KNI ethtool support
Ferruh Yigit [Mon, 18 Feb 2019 12:30:02 +0000 (12:30 +0000)]
doc: deprecate KNI ethtool support

Announce removal of KNI ethtool support.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
5 years agodoc: add deprecation marker usage
Ferruh Yigit [Fri, 1 Mar 2019 17:32:50 +0000 (17:32 +0000)]
doc: add deprecation marker usage

Define '__rte_deprecated' usage process.

Suggests keeping old API with '__rte_deprecated' marker including
next LTS, they will be removed just after the LTS release.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
5 years agodoc: make RTE_NEXT_ABI optional in guidelines
Ferruh Yigit [Fri, 1 Mar 2019 17:32:49 +0000 (17:32 +0000)]
doc: make RTE_NEXT_ABI optional in guidelines

Initial process requires oncoming changes described in deprecation
notice should be implemented in a RTE_NEXT_ABI gated way.

This has been discussed in technical board, and since this can cause a
multiple #ifdef blocks in multiple locations of the code, can be
confusing specially for the modifications that requires data structure
changes. Anyway this was not happening in practice.

Making RTE_NEXT_ABI usage more optional based on techboard decision:
http://mails.dpdk.org/archives/dev/2019-January/123519.html

The intention with using RTE_NEXT_ABI was to provide more information
to the user about planned changes, and force developer to think more in
coding level. Since RTE_NEXT_ABI become optional, now the preferred way
to do this is, if possible, sending changes, described in deprecation
notice, as a separate patch and reference it in deprecation notice.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
5 years agodoc: clean ABI/API policy guide
Ferruh Yigit [Fri, 1 Mar 2019 17:32:48 +0000 (17:32 +0000)]
doc: clean ABI/API policy guide

The original document written from the point of ABI versioning but later
additions make document confusing, convert document into a ABI/API
policy documentation and organize the document in subsections:
- ABI/API Deprecation
- Experimental APIs
- Library versioning
- ABI versioning

Aim to clarify confusion between deprecation versioned ABI and overall
ABI/API deprecation, also ABI versioning and Library versioning by
organizing the sections.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
5 years agodoc: update DPDK LTS versions
Kevin Traynor [Fri, 22 Mar 2019 11:29:35 +0000 (11:29 +0000)]
doc: update DPDK LTS versions

Support for 16.11 has ended. 17.11 and 18.11 are the current LTSs.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Luca Boccassi <bluca@debian.org>
5 years agomalloc: fix documentation of realloc function
Anatoly Burakov [Fri, 22 Feb 2019 15:29:29 +0000 (15:29 +0000)]
malloc: fix documentation of realloc function

The documentation for rte_realloc claims that the resized area
will always reside on the same NUMA node. This is not actually
the case - while *resized* area will be on the same NUMA node,
if resizing the area is not possible, then the memory will be
reallocated using rte_malloc(), which can allocate memory on
another NUMA node, depending on which lcore rte_realloc() was
called from and which NUMA nodes have memory available.

Fix the API doc to match the actual code of rte_realloc().

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agomem: poison memory when freed
Stephen Hemminger [Sat, 16 Feb 2019 01:50:16 +0000 (17:50 -0800)]
mem: poison memory when freed

DPDK malloc library allows broken programs to work because
the semantics of zmalloc and malloc are the same.

This patch enables a  more secure model which will catch
(and crash) programs that reuse memory already freed if
RTE_MALLOC_DEBUG is enabled.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agoacl: fix compiler flags with meson and AVX2 runtime
Andrius Sirvys [Mon, 11 Mar 2019 15:18:11 +0000 (15:18 +0000)]
acl: fix compiler flags with meson and AVX2 runtime

When compiling the ACL library on a system without AVX2 support,
the flags used to compile the AVX2-specific code for later run-time
use were not based on the regular cflags for the rest of the library.
This can cause errors due to symbols being missed/undefined
due to incorrect flags. For example,
when testing compilation on Alpine linux, we got:
error: unknown type name 'cpu_set_t'
due to _GNU_SOURCE not being defined in the cflags.

This issue can be fixed by appending "-mavx2" to
the cflags rather than replacing them with it.

Fixes: 5b9656b157d3 ("lib: build with meson")
Cc: stable@dpdk.org
Signed-off-by: Andrius Sirvys <andrius.sirvys@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
5 years agoeal: remove unneeded version logic
Bruce Richardson [Fri, 15 Mar 2019 18:20:22 +0000 (18:20 +0000)]
eal: remove unneeded version logic

The version number in the DPDK_VERSION file will never have an offset
that needs to be subtracted, so remove that logic from the version
string generation.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
5 years agobuild: use version number from config file
Bruce Richardson [Fri, 15 Mar 2019 18:20:21 +0000 (18:20 +0000)]
build: use version number from config file

Since we have the version number in a separate file at the root level,
we should not need to duplicate this in rte_version.h too. Best
approach here is to move the macros for specifying the year/month/etc.
parts from the version header file to the build config file - leaving
the other utility macros for e.g. printing the version string, where they
are.

For "make", this is done by having a little bit of awk parse the version
file and pass the results through to the preprocessor for the config
generation stage.

For "meson", this is done by parsing the version and adding it to the
standard dpdk_conf object.

In both cases, we need to append a large number - in this case "99",
previously 16 in original code - to the version number when we want to do
version number comparisons. Without this, the release version e.g. 19.05.0
will compare as less than it's RC's e.g. 19.05.0-rc4. With it, the
comparison is correct as "19.05.0.99 > 19.05.0-rc4.99".

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
5 years agobuild: move meson version handling to config directory
Bruce Richardson [Fri, 15 Mar 2019 18:20:20 +0000 (18:20 +0000)]
build: move meson version handling to config directory

To keep the top-level meson.build file as clean and clear as possible, we
move the version handling to the config/meson.build file, where the rest of
the build configuration is already being set up.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>