Pavan Nikhilesh [Tue, 4 May 2021 00:27:15 +0000 (05:57 +0530)]
event/cnxk: add option to disable NPA
If the chunks are allocated from NPA then TIM can automatically free
them when traversing the list of chunks.
Add devargs to disable NPA and use software mempool to manage chunks.
Shijith Thotton [Tue, 4 May 2021 00:27:14 +0000 (05:57 +0530)]
event/cnxk: create and free timer adapter
When the application calls timer adapter create the following is used:
- Allocate a TIM LF based on number of LF's provisioned.
- Verify the config parameters supplied.
- Allocate memory required for
* Buckets based on min and max timeout supplied.
* Allocate the chunk pool based on the number of timers.
On Free:
- Free the allocated bucket and chunk memory.
- Free the TIM lf allocated.
Pavan Nikhilesh [Tue, 4 May 2021 00:27:04 +0000 (05:57 +0530)]
event/cnxk: add option to configure getwork mode
Add devargs to configure the platform specific getwork mode.
CN9K getwork mode by default is set to use dual workslot mode.
Add option to force single workslot mode.
Example:
--dev "0002:0e:00.0,single_ws=1"
CN10K supports multiple getwork prefetch modes, by default the
prefetch mode is set to none.
Add option to select getwork prefetch mode
Example:
--dev "0002:1e:00.0,gw_mode=1"
Shijith Thotton [Tue, 4 May 2021 00:27:01 +0000 (05:57 +0530)]
event/cnxk: add option to control SSO HWGRP QoS
SSO HWGRPs i.e. queue uses DRAM & SRAM buffers to hold in-flight
events. By default the buffers are assigned to the SSO HWGRPs to
satisfy minimum HW requirements. SSO is free to assign the remaining
buffers to HWGRPs based on a preconfigured threshold.
We can control the QoS of SSO HWGRP by modifying the above mentioned
thresholds. HWGRPs that have higher importance can be assigned higher
thresholds than the rest.
Shijith Thotton [Tue, 4 May 2021 00:27:00 +0000 (05:57 +0530)]
event/cnxk: add option for in-flight buffer count
The number of events for a *open system* event device is specified
as -1 as per the eventdev specification.
Since, SSO inflight events are only limited by DRAM size, the
xae_cnt devargs parameter is introduced to provide upper limit for
in-flight events.
Shijith Thotton [Tue, 4 May 2021 00:26:57 +0000 (05:56 +0530)]
event/cnxk: add platform specific device config
Add platform specific event device configuration that attaches the
requested number of SSO HWS(event ports) and HWGRP(event queues) LFs
to the RVU PF/VF.
Update the dlb documentation for v2.5. Notable differences include
the new cobined credit scheme. Also cleaned up a couple of sections,
and removed a duplicate section.
event/dlb2: update config defines as runtime options
The new devarg names and their default values
are listed below. The defaults have not changed, and
none of these parameters are accessed in the fast path.
All references to the old register map have been removed,
so it is safe to rename the new combined file that supports
both DLB v2.0 and DLB v2.5. Also fixed all places where this
file is included.
event/dlb2: use new implementation of HW types header
As support for DLB v2.5 was added, modifications were made to
dlb_hw_types_new.h, but the old file needed to be preserved during
the port in order to meet the requirement that individual patches in
a series each compile successfully. Since the DLB v2.5 support is
completely integrated, it is now safe to remove the old (original)
file, as well as the DLB2_USE_NEW_HEADERS define that was used to
control which version of the file was to be included in certain
source files.
It is now safe to rename the new file, and use it unconditionally
in all DLB source files.
event/dlb2: use new implementation of resource file
The file dlb_resource_new.c now contains all of the low level
functions required to support both DLB v2.0 and DLB v2.5, and
the original file (dlb_resource.c) was removed in the previous
commit, so rename dlb_resource_new.c to dlb_resource.c, and
update the meson build file so that the new file is built.
event/dlb2: use new implementation of resource header
A temporary version of dlb_resource.h (dlb_resource_new.h) was used
by the previous commits in this patch series. Merge the two files
now that DLB v2.5 support has been fully added to dlb_resource.c.
Update the low level HW functions that perform the sequence number
management functions. These include getting a groups number of
sequence numbers per queue, managing in-use slots, getting the
current occupancy, and setting sequence numbers for a group.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware.
Update the low level HW functions responsible for
configuring sparse CQ mode, where each cache line
contains just one QE instead of 4.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware.
Update the low level HW functions responsible for
finishing the queue map/unmap operation, which is an
asynchronous operation.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware.
Update the low level hardware functions responsible for
getting the queue depth. The command arguments are also
validated.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware.
DLB v2.5 uses a different credit scheme than was used in DLB v2.0 .
Specifically, there is a single credit pool for both load balanced
and directed traffic, instead of a separate pool for each as is
found with DLB v2.0.
Update the low level HW functions responsible for
starting the scheduling domain. Once a domain is
started, its resources can no longer be configured,
except for QID remapping and port enable/disable.
The start domain arguments are validated, and an error
is returned if validation fails, or if the domain is
not configured or has already been started.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware.
Update the low level HW functions responsible for
removing the linkage between a queue and a load
balanced port. Runtime checks are performed on the
port and queue to make sure the state is appropriate
for the unmap operation, and the unmap arguments
are also validated.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware.
Update the low level HW functions responsible for
mapping queues to ports. These functions also validate
the map arguments and verify that the maximum number
of queues linked to a load balanced port does not
exceed the capabilities of the hardware.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware
version, v2.0 or v2.5.
Update the low level HW functions responsible for
creating directed queues. These functions configure
the depth threshold, configure queue depth, and
validate the queue creation arguments.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware
version, v2.0 or v2.5.
Update the low level HW functions responsible for
creating directed ports. These functions create the
producer port (PP), configure the consumer queue (CQ),
configure queue depth, and validate the port creation
arguments.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware
version, v2.0 or v2.5.
Update the low level HW functions responsible for
creating load balanced ports. These functions create the
producer port (PP), configure the consumer queue (CQ), and
validate the port creation arguments.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware
version, v2.0 or v2.5.
Updated low level hardware functions related to configuring
load balanced queues. These functions create the queues,
as well as attach related resources required by load
balanced queues, such as sequence numbers.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action based on the hardware
version, v2.0 or v2.5.
Reset hardware registers, consumer queues, ports,
interrupts and software. Queues must also be drained
as part of the reset process.
The logic is very similar to what was done for v2.0,
but the new combined register map for v2.0 and v2.5
uses new register names and bit names. Additionally,
new register access macros are used so that the code
can perform the correct action, based on the hardware
version, v2.0 or v2.5.
DLB v2.5 uses a new credit scheme, where directed and load balanced
credits are unified, instead of having separate directed and load
balanced credit pools.
Add support for DLB v2.5 probe-time hardware init,
and sets up a framework for incorporating the remaining
changes required to support DLB v2.5.
DLB v2.0 and DLB v2.5 are similar in many respects, but their
register offsets and definitions are different. As a result of these,
differences, the low level hardware functions must take the device
version into consideration. This requires that the hardware version be
passed to many of the low level functions, so that the PMD can
take the appropriate action based on the device version.
To ease the transition and keep the individual patches small, three
temporary files are added in this commit. These files have "new"
in their names. The files with "new" contain changes specific to a
consolidated PMD that supports both DLB v2.0 and DLB 2.5. Their sister
files of the same name (minus "new") contain the old DLB v2.0 specific
code. The intent is to remove code from the original files as that code
is ported to the combined DLB 2.0/2.5 PMD model and added to the "new"
files in a series of commits. At end of the patch series, the old files
will be empty and the "new" files will have the logic needed
to implement a single PMD that supports both DLB v2.0 and DLB v2.5.
At that time, the original DLB v2.0 specific files will be deleted,
and the "new" files will be renamed and replace them.
This commit adds dlb v2.5 probe support, and updates
parameter parsing.
The dlb v2.5 device differs from dlb v2, in that the
number of resources (ports, queues, ...) is different,
so macros have been added to take the device version
into account.
- Remove references of FPGA.
- Do not include dlb2_mbox.h as it is not needed.
- Remove duplicate macros/defines that were
present in both dlb2_priv.h and dlb2_hw_types.h.
Update dlb2_resource.c to include dlb2_priv.h
so that it picks up the macros/defines that
have now been consolidated.
event/octeontx2: configure crypto adapter xaq pool
Configure xaq pool based on number of in-use crypto queues to avoid CPT
add work failure due to xaq buffer run out. This patch configures
OTX2_CPT_DEFAULT_CMD_QLEN number of xae entries per queue pair.
Parameter queue_pair_id of crypto adapter queue pair add/del operation
can be -1 to select all pre configured crypto queue pairs. Added support
for the same in driver. Also added a member in cpt qp structure to
indicate binding state of a queue pair to an event queue.
Dekel Peled [Tue, 4 May 2021 17:54:57 +0000 (20:54 +0300)]
common/mlx5: support general object credential
CREDENTIAL object is used for any crypto operation in wrapped mode.
This patch add support of CREDENTIAL object create operation.
Add reading of CREDENTIAL support capability.
Add function to create general object type CREDENTIAL, using DevX API.
Shiri Kuzin [Tue, 4 May 2021 17:54:56 +0000 (20:54 +0300)]
common/mlx5: share Verbs device match function
The get_ib_device_match function iterates over the list of ib devices
returned by the get_device_list glue function and returns the ib device
matching the provided address.
Since this function is in use by several drivers, in this patch we
share the function in common part.
Dekel Peled [Tue, 4 May 2021 17:54:53 +0000 (20:54 +0300)]
common/mlx5: support general object crypto login
CRYPTO_LOGIN Object is used to login to the device as crypto user
or crypto officer.
Required in order to perform any crypto related control operations.
This patch adds support of CRYPTO_LOGIN object create operation.
Add reading of CRYPTO_LOGIN support capability.
Add function to create general object type CRYPTO_LOGIN, using DevX API.
Dekel Peled [Tue, 4 May 2021 17:54:52 +0000 (20:54 +0300)]
common/mlx5: support general object KEK import
IMPORT_KEK object is used to wrap (encrypt) critical security
parameters, such as other keys and credentials, when those need
to be passed between the device and the software.
This patch add support of IMPORT_KEK object create operation.
Add reading of IMPORT_KEK support capability.
Add function to create general object type IMPORT_KEK, using DevX API.
Dekel Peled [Tue, 4 May 2021 17:54:51 +0000 (20:54 +0300)]
common/mlx5: adjust DevX mkey fields for crypto
MKEY that will be used for crypto purposes must be created with
crypto_en and remote access attributes.
This patch adds support for them in the DevX MKEY context.
Dekel Peled [Tue, 4 May 2021 17:54:48 +0000 (20:54 +0300)]
common/mlx5: optimize read of general capabilities
General object types support is indicated in bitmap general_obj_types,
which is part of HCA capabilities list.
Currently this bitmap is read multiple times, and each time a different
bit is extracted.
This patch optimizes the code, reading the bitmap once into a local
variable, and then extracting the required bits.
Dekel Peled [Tue, 4 May 2021 17:54:46 +0000 (20:54 +0300)]
common/mlx5: remove redundant spaces in PRM header
File drivers/common/mlx5/mlx5_prm.h includes structs representing
data items as defined in PRM document.
Some of these structs were copied as-is from kernel file mlx5_ifc.h.
As result the structs are not all aligned with the same spacing.
This patch removes redundant spaces and new lines from several structs,
to align all structs in mlx5_prm.h to the same format.
The function rte_pktmbuf_init() expects that the mempool private area is
large enough and was previously initialized by rte_pktmbuf_pool_init(),
which is not the case.
This causes the function rte_pktmbuf_priv_size() to return an
unpredictable value, and this value is used as a size in a memset.
Replace the mempool object initializer by my_obj_init(), which does not
have this constraint, and fits the needs for this test.
Fixes: 923ceaeac140 ("test/mempool: add unit test cases") Cc: stable@dpdk.org Reported-by: Wenwu Ma <wenwux.ma@intel.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tal Shnaiderman [Wed, 21 Apr 2021 16:09:42 +0000 (19:09 +0300)]
eal/windows: fix MinGW build
the strncasecmp macro defined in rte_os_shim.h is already
defined in MinGW-w64, as a result the compiler prints out
the warning below on function redefinition whenever compiling
a file including the header in debug mode.
Fixed by defining the macro only to the clang compiler.
Fixes: 45d62067c237 ("eal: make OS shims internal") Signed-off-by: Tal Shnaiderman <talshn@nvidia.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Juraj Linkeš [Thu, 22 Apr 2021 09:11:51 +0000 (11:11 +0200)]
eal/arm64: fix platform register bit
REG_PLATFORM only uses bit 0 to indicate whether the value retrieved
from hardware matches PLATFORM_STR.
Fixes: 97523f822ba9 ("eal/arm: add CPU flags for ARMv8") Cc: stable@dpdk.org Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech> Reviewed-by: Jerin Jacob <jerinj@marvell.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Add improved error handling to rte_ioat_completed_ops(). This patch adds
new parameters to the function to enable the user to track the completion
status of each individual operation in a batch. With this addition, the
function can help the user to determine firstly, how many operations may
have failed or been skipped and then secondly, which specific operations
did not complete successfully.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Kevin Laatz [Tue, 4 May 2021 13:14:57 +0000 (14:14 +0100)]
raw/ioat: add API to query remaining ring space
Add a new API to query remaining descriptor ring capacity. This API is
useful, for example, when an application needs to enqueue a fragmented
packet and wants to ensure that all segments of the packet will be enqueued
together.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The ring management in the idxd part of the driver is more complex than
it needs to be, tracking individual batches in a ring and having null
descriptors as padding to avoid having single-operation batches. This can
be simplified by using a regular ring-based layout, with additional
overflow at the end to ensure that the one does not need to wrap within a
batch.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Split the rte_ioat_rawdev_fns.h file into two separate headers, so that
the data structures for the original ioat devices and the newer idxd
ones can be kept separate from each other. This makes code management
and rework easier.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
raw/ioat: add bus driver for device scanning automatically
Rather than using a vdev with args, DPDK can scan and initialize the
devices automatically using a bus-type driver. This bus does not need to
worry about registering device drivers, rather it can initialize the
devices directly on probe.
The device instances (queues) to use are detected from /dev with the
additional info about them got from /sys.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Kevin Laatz [Tue, 4 May 2021 13:14:53 +0000 (14:14 +0100)]
raw/ioat: allow perform operations function to return error
Change the return type for the rte_ioat_perform_ops() function from void to
int to allow the possibility of returning an error code in future, should
it be necessary.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
raw/ioat: make workqueue name configurable in script
Add a "--name-prefix" parameter to the quick configuration script for
DSA. This allows the queues configured on a DSA instance to be made
available to only one DPDK process in a setup with multiple DPDK process
instances.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
raw/ioat: fix script for configuring small number of queues
The dpdk_idxd_cfg.py script included with the driver for convenience did
not work properly where the number of queues to be configured was
less than the number of groups or engines. This was because there would
be configured groups/engines not assigned to queues. Fix this by
limiting the engine and group counts to be no bigger than the number of
queues.
Fixes: 01863b9d2354 ("raw/ioat: include example configuration script") Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
raw/ioat: expand descriptor struct to full 64 bytes
Although it's unused by the driver, add the interrupt handle field in
the descriptor to the descriptor structure for completeness, and
explicitly add the reserved padding field on the end of the structure
too. This means that when a descriptor is defined on the stack, or
initialized by the compiler, the unused/reserved space will be zeroed
appropriately.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
raw/ioat: support limiting queues for idxd PCI device
When using a full device instance via vfio, allow the user to specify a
maximum number of queues to configure rather than always using the max
number of supported queues.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Add in additional unit tests to verify that we can get completion reports
of multiple batches in a single completed_ops() call. Also verify we can
get smaller number of completions if that is requested too.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
When setting RTE_MAX_LCORES to the maximum value supported by ppc
hardware (1536), the lcores_autotest may timeout after 30 seconds
because the test takes nearly 60 seconds to complete. Set max_lcores to
a lower value because the maximum value is unlikely to be seen in any
production systems and to eliminate the quick test timeout error.
Bugzilla ID: 684 Fixes: db1f2f8a9fe5 ("config: increase maximum lcores for ppc") Cc: stable@dpdk.org Signed-off-by: David Christensen <drc@linux.vnet.ibm.com> Acked-by: Luca Boccassi <bluca@debian.org>
Bruce Richardson [Mon, 26 Apr 2021 10:54:02 +0000 (11:54 +0100)]
devtools: add script to check indentation of Meson lists
This is a script to fix up minor formatting issues in meson files.
It scans for, and can optionally fix, indentation issues and missing
trailing commas in the lists in meson.build files. It also detects,
and can fix, multi-line lists where more than one entry appears on a
line.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Joyce Kong [Tue, 27 Apr 2021 16:01:40 +0000 (11:01 -0500)]
mempool: distinguish cache and pool debug counters
If cache is enabled, objects will be retrieved/put from/to cache,
subsequently from/to the common pool. Now the debug stats calculate
the objects retrieved/put from/to cache and pool together, it is
better to distinguish them.
Signed-off-by: Joyce Kong <joyce.kong@arm.com> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
stack: allow lock-free only on relevant architectures
Since commit 7911ba0473e0 ("stack: enable lock-free implementation for
aarch64"), lock-free stack is supported on arm64 but this description was
missing from the doxygen for the flag.
Currently it is impossible to detect programmatically whether lock-free
implementation of rte_stack is supported. One could check whether the
header guard for lock-free stubs is defined (_RTE_STACK_LF_STUBS_H_) but
that's an unstable implementation detail. Because of that currently all
lock-free ring creations silently succeed (as long as the stack header
is 16B long) which later leads to push and pop operations being NOPs.
The observable effect is that stack_lf_autotest fails on platforms not
supporting the lock-free. Instead it should just skip the lock-free test
altogether.
This commit adds a new errno value (ENOTSUP) that may be returned by
rte_stack_create() to indicate that a given combination of flags is not
supported on a current platform.
This is detected by checking a compile-time flag in the include logic in
rte_stack_lf.h which may be used by applications to check the lock-free
support at compile time.
Use the added RTE_STACK_LF_SUPPORTED flag to disable the lock-free stack
tests at the compile time.
Perf test doesn't fail because rte_ring_create() succeeds, however
marking this test as skipped gives a better indication of what actually
was tested.