git.droids-corp.org - dpdk.git/log

compress/qat: create FW request and process response

Add functions to create the request message to send to
firmware and to process the firmware response.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>

compress/qat: add xform processing

Add code to process compressdev rte_comp_xforms, creating
private qat_comp_xforms with prepared firmware message templates.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>

compress/qat: add empty driver

Add Makefiles, meson files, and empty source files for compression PMD.
Handle cases for building either symmetric crypto PMD
or compression PMD or both and the common files both depend on.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>

common/qat: update firmware headers

Updated to latest firmware headers files for QuickAssist devices.
Includes updates for symmetric crypto, PKE and Compression services.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>

examples/vm_power: make branch ratio configurable

For different workloads and poll loops, the theshold
may be different for when you want to scale up and down.

This patch allows changing of the default branch ratio
by using the -b command line argument (or --branch-ratio=)

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>

examples/vm_power: add options to guest app

Add new command line arguments to the guest app to make
    testing and validation of the policy usage easier.
    These arguments are mainly around setting up the power
    management policy that is sent from the guest vm to
    to the vm_power_manager in the host

    New command line parameters:
    -n or --vm-name
       sets the name of the vm to be used by the host OS.
    -b or --busy-hours
       sets the list of hours that are predicted to be busy
    -q or --quiet-hours
       sets the list of hours that are predicted to be quiet
    -l or --vcpu-list
       sets the list of vcpus to monitor
    -p or --port-list
       sets the list of posts to monitor when using a
       workload policy.
    -o or --policy
       sets the default policy type
          TIME
          WORKLOAD
          TRAFFIC
          BRANCH_RATIO

    The format of the hours or list paramers is a comma-separated
    list of integers, which can take the form of
       a. x    e.g. --vcpu-list=1
       b. x,y  e.g. --quiet-hours=3,4
       c. x-y  e.g. --busy-hours=9-12
       d. combination of above (e.g. --busy-hours=4,5-7,9)

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>

examples/vm_power: add branch ratio policy type

Add the capability for the vm_power_manager to receive
a policy of type BRANCH_RATIO. This will add any vcpus
in the policy to the oob monitoring thread.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>

examples/vm_power: add --port-list option

add in the long form of -p, which is --port-list

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>

examples/vm_power: add thread for oob core monitor

Change the app to now require three cores, as the third core
will be used to run the oob montoring thread.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>

examples/vm_power: allow greater than 64 cores

To facilitate more info per core, change the global_cpu_mask
from a uint64_t to an array. This also removes the limit on
64 cores, allocing the aray at run-time based on the number of
cores found in the system.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>

examples/vm_power: add oob monitoring functions

This patch introduces the out-of-band (oob) core monitoring
functions.

The functions are similar to the channel manager functions.
There are function to add and remove cores from the
list of cores being monitored. There is a function to initialise
the monitor setup, run the monitor thread, and exit the monitor.

The monitor thread runs in it's own lcore, and is separate
functionality to the channel monitor which is epoll based.
THis thread is timer based. It loops through all monitored cores,
calculates the branch ratio, scales up or down the core, then
sleeps for an interval (~250 uS).

The method it uses to read the branch counters is a pread on the
/dev/cpu/x/msr file, so the 'msr' kernel module needs to be loaded.
Also, since the msr.h file has been made unavailable in recent
kernels, we have #defines for the relevant MSRs included in the
code.

The makefile has a switch for x86 and non-x86 platforms,
and compiles stub function for non-x86 platforms.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>

examples/vm_power: add core list parameter

Add in the '-l' command line parameter (also --core-list)
So the user can now pass --corelist=4,6,8-10 and it will
expand out to 4,6,8,9,10 using the parse function provided
in parse.c (parse_set).

This list of cores is then used to enable out-of-band monitoring
to scale up and down these cores based on the ratio of branch
hits versus branch misses. The ratio will be low when a poll
loop is spinning with no packets being received, so the frequency
will be scaled down.

Also , as part of this change, we introduce a core_info struct
which keeps information on each core in the system, and whether
we're doing out of band monitoring on them.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>

examples/vm_power: add check for port count

If we don't pass any ports to the app, we don't need to create
any mempools, and we don't need to init any ports.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>

devtools: fix checkpatch for filename with space

If the patch filename or the temporary file path have a space
in their name, the script checkpatches.sh does not work.
The variables for the filenames must be enclosed in quotes
in order to preserve spaces.

Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>

vfio: remove uneccessary IPC for group fd clear

Clear vfio_group_fd is not necessary to involve any IPC.
Also, current IPC implementation for SOCKET_CLR_GROUP is not
correct. rte_vfio_clear_group on secondary will always fail,
that prevent device be detached correctly on a secondary process.
The patch simply removes all IPC related stuff in
rte_vfio_clear_group.

Fixes: 83a73c5fef66 ("vfio: use generic multi-process channel")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

vfio: enable unmapping resource for secondary

Subroutine to unmap VFIO resource is shared by secondary and
primary, and it does not work on the secondary process. Since
for secondary process, it is not necessary to close interrupt
handler, set pci bus mastering and remove vfio_res from
vfio_res_list. So, the patch adds a dedicate function to handle
the situation when a device is unmapped on a secondary process.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

vfio: fix PCI address comparison

When use memcmp to compare two PCI address, sizeof(struct rte_pci_addr)
is 4 bytes aligned, and it is 8. While only 7 byte of struct rte_pci_addr
is valid. So compare the 8th byte will cause the unexpected result, which
happens when repeatedly attach/detach a device.

Fixes: 94c0776b1bad ("vfio: support hotplug")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>

eal: fix hotplug add and remove

If hotplug add an already plugged PCI device, it will
cause rte_pci_device->device.name be corrupted due to unexpected
rte_devargs_remove. Also if try to hotplug remove an already
unplugged device, it will cause segment fault due to unexpected
bus->unplug on a rte_device whose driver is NULL.
The patch fix these issues.

Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>

devtools: fix symbol check for filename with space

If the patch filename or the temporary file path have a space
in their name, the script check-symbol-change.sh does not work.
The variables for the filenames must be enclosed in quotes
in order to preserve spaces.

Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>

mem: add logic check for static analyzer

Technically, single file segments codepath will never get
triggered when using in-memory mode, because EAL prohibits
mixing these two options at initialization time. However,
code analyzers do not know that, and some will complain
about either using uninitialized variables, or trying to
do operations on an already closed descriptor.

Fix this by assuring the compiler or code analyzer that
in-memory mode code never gets triggered when using
single-file segments mode.

Coverity issue: 302847
Fixes: 72b49ff623c4 ("mem: support --in-memory mode")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

malloc: do not skip pad on free

Previously, we were skipping erasing pad because we were
expecting it to be freed when we were merging adjacent
segments. However, if there were no adjacent segments to
merge, we would've skipped erasing the pad, leaving non-zero
memory in our free space.

Fix this by including pad in the erasing unconditionally.

Fixes: e43a9f52b7ff ("malloc: fix pad erasing")
Cc: stable@dpdk.org
Reported-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>

devargs: fix parsing truncation when using format

Space for string terminating NUL character should be provided to
snprintf() to avoid the last symbol truncation.

Fixes: a23bc2c4e01b ("devargs: add non-variadic parsing function")
Reported-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>

eal: fix dependency in multi-process detection

Currently, we need runtime dir to put all of our runtime info in,
including the DPDK shared config. However, we use the shared
config to determine our proc type, and this happens earlier than
we actually create the config dir and thus can know where to
place the config file.

Fix this by moving runtime dir creation right after the EAL
arguments parsing, but before proc type autodetection. Also,
previously we were creating the config file unconditionally,
even if we specified no_shconf - fix it by only creating
the config file if no_shconf is not set.

Fixes: adf1d867361c ("eal: move runtime config file to new location")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>

mem: fix alignment of requested virtual areas

The original code did not align any addresses that were requested as
page-aligned, but were different because addr_is_hint was set.

Below fix by Dariusz has introduced an issue where all unaligned addresses
were left as unaligned.

This patch is a partial revert of
commit 7fa7216ed48d ("mem: fix alignment of requested virtual areas")

and implements a proper fix for this issue, by asking for alignment in all
but the following two cases:

1) page size is equal to system page size, or
2) we got an aligned requested address, and will not accept a different one

This ensures that alignment is performed in all cases, except for those we
can guarantee that the address will not need alignment.

Fixes: b7cc54187ea4 ("mem: move virtual area function in common directory")
Fixes: 7fa7216ed48d ("mem: fix alignment of requested virtual areas")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>

devargs: fix build with gcc 4.7

Fixed possible out-of-bounds issue:

lib/librte_eal/common/eal_common_devargs.c:
In function ‘rte_devargs_layers_parse’:
lib/librte_eal/common/eal_common_devargs.c:121:7:
error: array subscript is above array bounds

Bugzilla ID: 71
Fixes: 338327d731e6 ("devargs: add function to parse device layers")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>

version: 18.08-rc1

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

devtools: add checks for ABI symbol addition

Recently, some additional patches were added to allow for programmatic
marking of C symbols as experimental.  The addition of these markers is
dependent on the manual addition of exported symbols to the EXPERIMENTAL
section of the corresponding libraries version map file.  The consensus
on review is that, in addition to mandating the addition of symbols to
the EXPERIMENTAL version in the map, we need a mechanism to enforce our
documented process of mandating that addition when they are introduced.
To that end, I am proposing this change.  It is an addition to the
checkpatches script, which scan incoming patches for additions and
removals of symbols to the map file, and warns the user appropriately.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

app/testpmd: fix typo in setting Tx offload command

udp_cksum is duplicated, second one should be tcp_cksum

Fixes: c73a9071877a ("app/testpmd: add commands to test new offload API")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>

app/testpmd: set keep CRC offload flag

If "--disable-crc-strip" testpmd parameter issued, it removes the
DEV_RX_OFFLOAD_CRC_STRIP flag.
With introduction of new DEV_RX_OFFLOAD_KEEP_CRC offload flag, this
flag also should be set when this parameter issued.

Fixes: 70815c9ecadd ("ethdev: add new offload flag to keep CRC")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>

kvargs: add generic string matching callback

This function can be used as a callback to
rte_kvargs_process.

This should reduce code duplication.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>

eal: implement device iteration

Use the iteration hooks in the abstraction layers to perform the
requested filtering on the internal device lists.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>

eal: implement device iteration initialization

Parse a device description.
Split this description in their relevant part for each layers.
No dynamic allocation is performed.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>

eal: add device iterator interface

A device iterator allows iterating over a set of devices.
This set is defined by the two descriptions offered,

* rte_bus
* rte_class

Only one description can be provided, or both. It is not allowed to
provide no description at all.

Each layer of abstraction then performs a filter based on the
description provided. This filtering allows iterating on their internal
set of devices, stopping when a match is valid and returning the current
iteration context.

This context allows starting the next iteration from the same point and
going forward.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>

devargs: add function to parse device layers

This function is private to the EAL.
It is used to parse each layers in a device description string,
and store the result in an rte_devargs structure.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

eal: introduce device class abstraction

This abstraction exists since the infancy of DPDK.
It needs to be fleshed out however, to allow a generic
description of devices properties and capabilities.

A device class is the northbound interface of the device, intended
for applications to know what it can be used for.

It is conceptually just above buses.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>

eal: introduce destructor macros

This macro adds symbols to the .fini section using the global
RTE priorities, to ensure consistency.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

kvargs: introduce a more flexible parsing function

This function permits defining additional terminating characters,
ending the parsing to arbitrary delimiters.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

kvargs: build before EAL

This library will be used by the EAL to parse parameters.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>

kvargs: remove error logs

Error logs in kvargs parsing should be better handled in components
calling the library.

This library must be as lean as possible.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>

devargs: add non-variadic parsing function

rte_devargs_parse becomes non-variadic,
rte_devargs_parsef becomes the variadic version, to be used to compose
device strings.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>

devargs: use log functions

Use the standard EAL logging functions in rte_devargs.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>

bus/vmbus: fix build without libuuid

The dependency on libuuid is useless because the required code
is embedded in EAL, see commit 6bc67c497a51 ("eal: add uuid API").

Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

ethdev: check queue stats mapping input arguments

With current implementation, we are not checking for queue_id range
and stat_idx range in stats mapping function. This patch will add
check for queue_id and stat_idx range.

Fixes: 5de201df892 ("ethdev: add stats per queue")
Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

net/netvsc: add documentation

Matching documentation for new netvsc device.
Includes a brief note about the restart issue.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>

net/netvsc: add Hyper-V network device

The driver supports Hyper-V networking directly like
virtio for KVM or vmxnet3 for VMware.

This code is based off of the FreeBSD driver. The file and variable
names are kept the same to help with understanding (with most of the
BSD style warts removed).

This version supports the latest NetVSP 6.1 version and
older versions.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>

bus/vmbus: add Hyper-V virtual bus support

This patch adds support for an additional bus type Virtual Machine BUS
(VMBUS) on Microsoft Hyper-V in Windows 10, Windows Server 2016
and Azure. Most of this code was extracted from FreeBSD and some of
this is from earlier code donated by Brocade.

Only Linux is supported at present, but the code is split
to allow future FreeBSD and Windows support.

The bus support relies on the uio_hv_generic driver from Linux
kernel 4.16. Multiple queue support requires additional sysfs
interfaces which is in kernel 5.0 (a.k.a 4.17).

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>

eal: add uuid API

Since uuid functions may not be available everywhere, implement
uuid functions in DPDK. These are based off the BSD licensed
libuuid in util-link.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>

vhost/crypto: use function to access mbuf private area

Use rte_mbuf_to_priv() to access the private data area in the mbuf.

Signed-off-by: Dan Gora <dg@adax.com>

examples/ipsec-secgw: use function to access mbuf private

Update get_priv() to use rte_mbuf_to_priv() to access the private
area in the mbuf.

In inbound_sa_check(), use the application's get_priv() function to
access the private area in the mbuf.

Signed-off-by: Dan Gora <dg@adax.com>

mbuf: add accessor function for private data area

Add an inline accessor function to return the starting address of
the private data area in the supplied mbuf.

This allows applications to easily access the private data area between
the struct rte_mbuf and the data buffer in the specified mbuf without
creating private macros or accessor functions.

No checks are made to ensure that a private data area actually exists
in the buffer.

Signed-off-by: Dan Gora <dg@adax.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

net/mlx5: support 32-bit systems

This patch adds support for building and running mlx5 PMD on
32bit systems such as i686.

The main issue to tackle was handling the 32bit access to the UAR
as quoted from the mlx5 PRM:
QP and CQ DoorBells require 64-bit writes. For best performance, it
is recommended to execute the QP/CQ DoorBell as a single 64-bit write
operation. For platforms that do not support 64 bit writes, it is
possible to issue the 64 bits DoorBells through two consecutive
writes,
each write 32 bits, as described below:
* The order of writing each of the Dwords is from lower to upper
addresses.
* No other DoorBell can be rung (or even start ringing) in the midst
of an on-going write of a DoorBell over a given UAR page.

The last rule implies that in a multi-threaded environment, the access
to a UAR page (which can be accessible by all threads in the process)
must be synchronized (for example, using a semaphore) unless an atomic
write of 64 bits in a single bus operation is guaranteed. Such a
synchronization is not required for when ringing DoorBells on different
UAR pages.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: fix build with rdma-core v19

The flow counter support introduced by
commit 9a761de8ea14 ("net/mlx5: flow counter support") was intend to
work only with MLNX_OFED_4.3 as the upstream rdma-core
libraries were lack such support.

On rdma-core v19 the support for the flow counters was added but with
different user APIs, hence causing compilation issues on the PMD.

This patch fix the compilation errors by forcing the flow counters
to be enabled only with MLNX_OFED APIs.
Once MLNX_OFED and rdma-core APIs will be aligned, a proper patch to
support the new API will be submitted.

Fixes: 9a761de8ea14 ("net/mlx5: flow counter support")
Cc: stable@dpdk.org
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>

net/mlx5: add count flow action

This is only supported by Mellanox OFED.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add flow MPLS item

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add flow GRE item

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add flow VXLAN-GPE item

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add flow VXLAN item

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: support inner RSS computation

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: remove useless arguments in hrxq API

RSS level is necessary to had a bit in the hash_fields which is already
provided in this API, for the tunnel, it is necessary to request such
queue to compute the checksum on the inner most, this last one should
always be activated.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add RSS flow action

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: use a macro for the RSS key size

ConnectX 4-5 support only 40 bytes of RSS key, using a compiled size
hash key is not necessary.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add mark/flag flow action

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add flow TCP item

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add flow UDP item

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add flow IPv6 item

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add flow IPv4 item

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add flow VLAN item

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add flow stop/start

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add flow queue action

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: support flow Ethernet item along with drop action

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: replace verbs priorities by flow

Previous work introduce verbs priorities, whereas the PMD is making
translation between Flow priority into Verbs. Rename this to make more
sense on what the PMD has to translate.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: handle drop queues as regular queues

Drop queues are essentially used in flows due to Verbs API, the
information if the fate of the flow is a drop or not is already present
in the flow. Due to this, drop queues can be fully mapped on regular
queues.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: remove flow support

This start a series to re-work the flow engine in mlx5 to easily support
flow conversion to Verbs or TC. This is necessary to handle both regular
flows and representors flows.

As the full file needs to be clean-up to re-write all items/actions
processing, this patch starts to disable the regular code and only let the
PMD to start in isolated mode.

After this patch flow API will not be usable.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add parameter for port representors

Prior to this patch, all port representors detected on a given device were
probed and Ethernet devices instantiated for each of them.

This patch adds support for the standard "representor" parameter, which
implies that port representors are not probed by default anymore, except
for the list provided through device arguments.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>

net/mlx5: probe port representors in natural order

Port representors are probed in whatever unspecified order
ibv_get_device_list() returns them.

This is counterintuitive to users since DPDK port IDs assignment almost
never follows the same sequence as representor IDs. Additionally, the
master device does not necessarily inherit the lowest DPDK port ID.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

net/mlx5: probe all port representors

Probe existing port representors in addition to their master device and
associate them automatically.

To avoid collision between Ethernet devices, they are named as follows:

- "{DBDF}" for master/switch devices.
- "{DBDF}_representor_{rep}" with "rep" starting from 0 for port
representors.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>

net/mlx5: add port representor awareness

The current PCI probing method is not aware of Verbs port representors,
which appear as standard Verbs devices bound to the same PCI address and
cannot be distinguished.

Problem is that more often than not, the wrong Verbs device is used,
resulting in unexpected traffic.

This patch makes the driver discard representors to only use the master
device. If unable to identify it (e.g. kernel drivers not recent enough),
either:

- There is only one matching device which isn't identified as a
representor, in that case use it.
- Otherwise log an error and do not probe the device.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>

net/mlx5: re-indent generic probing function

Since commit "net/mlx5: drop useless support for several Verbs ports"
removed an inner loop, mlx5_dev_spawn() is left with an unnecessary indent
level.

This patch eliminates a block, moves its local variables to function scope,
and re-indents its contents (diff best viewed with --ignore-all-space).

No functional impact.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>

net/mlx5: split PCI from generic probing

All the generic probing code needs is an IB device. While this device is
currently supplied by a PCI lookup, other methods will be added soon.

This patch divides the original function, which has become huge over time,
as follows:

1. PCI-specific (mlx5_pci_probe()).
2. Verbs device (mlx5_dev_spawn()).

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>

net/mlx5: drop useless support for several Verbs ports

Unlike mlx4 from which this capability was inherited, mlx5 devices expose
exactly one Verbs port per PCI bus address. Each physical port gets
assigned its own bus address with a single Verbs port.

While harmless, this code requires an extra loop that would get in the way
of subsequent refactoring.

No functional impact.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

net/mlx5: remove redundant objects in probe function

This patch gets rid of redundant calls to open the device and query its
attributes in order to simplify the code.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>

net/mlx5: rename confusing object in probe function

There are several attribute objects in this function:

- IB device attributes (struct ibv_device_attr_ex device_attr).
- Direct Verbs attributes (struct mlx5dv_context attrs_out).
- Port attributes (struct ibv_port_attr).
- IB device attributes again (struct ibv_device_attr_ex device_attr_ex).

"attrs_out" is both odd and initialized using a nonstandard syntax. Rename
it "dv_attr" for consistency.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>

net/mlx4: support hardware TSO

Implement support for hardware TSO.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>

test/power: fix 32-bit build

Compilation issue:

test/test/test_power_acpi_cpufreq.c:556:31:
error: format ‘%lx’ expects argument of type ‘long unsigned int’,
but argument 2 has type ‘uint64_t {aka long long unsigned int}’

  printf("ACPI: Capabilities %lx\n", caps.capabilities);
                             ~~^     ~~~~~~~~~~~~~~~~~
                             %llx

Fixes: 39e38d583075 ("test/power: add unit test for get capabilities API")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>

ethdev: fix missing function in map file

Add rte_flow_expand_rss in map file and tag it as experimental.

Fixes: 4ed05fcd441b ("ethdev: add flow API to expand RSS flows")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

doc: fix lists in release notes

Some blank lines and hyphens are missing, so lists were badly
interpreted and rendered.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

mem: support --in-memory mode

Implement the final piece of the in-memory mode puzzle - enable running
DPDK entirely in memory, without creating any files.

To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work
without hugetlbfs mountpoints. In order to enable this, a few things needed
to be changed.

First of all, we need to allow empty hugetlbfs mountpoints in
hugepage_info, and handle them correctly (by not trying to create any
files and lock any directories).

Next, we need to reorder the mapping sequence, because the page is not
really allocated until the page fault, and we cannot get its IOVA
address before we trigger the page fault.

Finally, decide at compile time whether we are going to be supporting
anonymous hugepages or not, because we cannot check for it at runtime.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

eal: add --in-memory option

This command-line option will cause DPDK to operate entirely in
memory and not create any shared files at runtime, including any
shared configuration or hugetlbfs files. This is useful for debug
purposes, as well as for certain use cases like containers or
automatic memory cleanup.

Currently, this option acts as a strict superset of --no-shconf and
--huge-unlink commands.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

mem: support --huge-unlink mode

Unlink hugepages after creating them, to honor the hugepage-unlink mode.
We cannot resize non-existing files, so make single file segments
explicitly unsupported.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

eal: do not create runtime dir in --no-shconf mode

Now that the rest of the EAL is adjusted to not create any shared
files, prevent runtime directory from ever being created.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

eal: support --no-shconf in hugepage data file

Do not create a shared hugepage data file if we were asked to
not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

eal: support --no-shconf for hugepage info

Do not create any shared hugepage size info files if we were
asked to not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

ipc: support --no-shconf mode

IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no-shconf mode, IPC will be useless, so
do not enable it in the first place. In the interests of API usage
convenience, we will still allow registering callbacks, but obviously
they won't ever be triggered.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

fbarray: support --no-shconf mode

When using --no-shconf option, the expectation is that no multiprocess
will be supported as no shared files are created. However, fbarray still
creates some shared files that prevent multiple processes with the same
prefix from starting.

Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

eal: move runtime config file to new location

As per deprecation notice [1], move DPDK runtime config to default
DPDK runtime data location. Also, remove the deprecation notice and
update release notes to indicate the changes.

[1] http://dpdk.org/patch/40418

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

doc: add IPC callback limitations

For asynchronous requests, user callback may be triggered either from
IPC thread or from interrupt thread. Because of this, delivery of
other interrupt-based events such as alarms may not be possible inside
the asynchronous IPC request callback handler. Document this
limitation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

ipc: remove thread for async requests

Previously, we were using two IPC threads - one to handle messages
and synchronous requests, and another to handle asynchronous requests.
To handle replies for an async request, rte_mp_handle woke up the
rte_mp_handle_async thread to process through pthread_cond variable.

Change it to handle asynchronous messages within the main IPC thread.
To handle timeout events, for each async request which is sent,
we set an alarm for it. If its reply is received before timeout,
we will cancel the alarm when we handle the reply; otherwise,
alarm will invoke the async_reply_handle() as the alarm callback.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Thomas Monjalon <thomas@monjalon.net>

eal: bring forward init of interrupt handling

Next commit will make asynchronous IPC requests rely on alarm API,
which in turn relies on interrupts to work. Therefore, move the EAL
interrupt initialization before IPC initialization to avoid breaking
IPC in the next commit.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

eal/bsd: support alarm API

Implement EAL alarm API support for FreeBSD. The implementation
is largely identical to that of Linux version, with one key
difference.

The alarm API is a little Linux-centric in that it is expecting
the alarm API to manage alarm timeouts without involvement of the
interrupt thread. This works on Linux because in Linux, there's
timerfd API which allows waiting for timer events on an fd.

On FreeBSD, however, there are no timerfd's, and timer events are
set up directly in kevent. There is no way to pass information from
the alarm API to the interrupt thread, so we also add a little
back-channel magic to get soonest alarm timeout from the alarm API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

eal/bsd: add interrupt thread

Add interrupt thread to FreeBSD. It is largely a copy-paste from
Linuxapp interrupt thread, except for a few key differences:

* Use kevent instead of epoll
* Do not recreate the event queue on adding/removing interrupt
sources, add/remove them to/from the queue on the fly instead
* No support for UIO/VFIO handles

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>