Kevin Laatz [Wed, 20 Oct 2021 16:30:06 +0000 (16:30 +0000)]
dma/idxd: add data path job submission
Add data path functions for enqueuing and submitting operations to DSA
devices.
Documentation updates are included for dmadev library and IDXD driver docs
as appropriate.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Kevin Laatz [Wed, 20 Oct 2021 16:30:05 +0000 (16:30 +0000)]
dma/idxd: add start and stop for PCI devices
Add device start/stop functions for DSA devices bound to vfio. For devices
bound to the IDXD kernel driver, these are not required since the IDXD
kernel driver takes care of this.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Kevin Laatz [Wed, 20 Oct 2021 16:30:04 +0000 (16:30 +0000)]
dma/idxd: add configure and info
Add functions for device configuration. The info_get function is included
here since it can be useful for checking successful configuration.
Documentation is also updated to add device configuration usage info.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Kevin Laatz [Wed, 20 Oct 2021 16:30:03 +0000 (16:30 +0000)]
dma/idxd: add datapath structures
Add data structures required for the data path for IDXD devices.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Kevin Laatz [Wed, 20 Oct 2021 16:30:02 +0000 (16:30 +0000)]
dma/idxd: create dmadev instances on PCI probe
When a suitable device is found during the PCI probe, create a dmadev
instance for each HW queue. HW definitions required are also included.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Kevin Laatz [Wed, 20 Oct 2021 16:30:01 +0000 (16:30 +0000)]
dma/idxd: create dmadev instances on bus probe
When a suitable device is found during the bus scan/probe, create a dmadev
instance for each HW queue. Internal structures required for device
creation are also added.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Kevin Laatz [Wed, 20 Oct 2021 16:30:00 +0000 (16:30 +0000)]
dma/idxd: add bus device probing
Add the basic device probing for DSA devices bound to the IDXD kernel
driver. These devices can be configured via sysfs and made available to
DPDK if they are found during bus scan. Relevant documentation is included.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Kevin Laatz [Wed, 20 Oct 2021 16:29:59 +0000 (16:29 +0000)]
dma/idxd: add skeleton for VFIO based DSA device
Add the basic device probe/remove skeleton code for DSA device bound to
the vfio pci driver. Relevant documentation and MAINTAINERS update also
included.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Bruce Richardson [Wed, 20 Oct 2021 16:29:58 +0000 (16:29 +0000)]
raw/ioat: build only if dmadev not present
Only build the rawdev IDXD/IOAT drivers if the dmadev drivers are not
present.
This change requires the dependencies to be reordered in
drivers/meson.build so that rawdev can use the "RTE_DMA_* build macros to
check for the presence of the equivalent dmadev driver.
A note is also added to the documentation to inform users of this change.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Stephen Hemminger [Wed, 20 Oct 2021 21:42:31 +0000 (14:42 -0700)]
app/dumpcap: add new packet capture application
This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.
It is not complete yet some features such as filtering are not implemented.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Stephen Hemminger [Wed, 20 Oct 2021 21:42:30 +0000 (14:42 -0700)]
pdump: support pcapng and filtering
This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.
The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.
The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.
Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Stephen Hemminger [Wed, 20 Oct 2021 21:42:29 +0000 (14:42 -0700)]
bpf: add function to dump eBPF instructions
When debugging converted (and other) programs it is useful
to see disassembled eBPF output.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Stephen Hemminger [Wed, 20 Oct 2021 21:42:28 +0000 (14:42 -0700)]
bpf: add function to convert classic BPF to DPDK BPF
The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs. The DPDK BPF library only implements
extended BPF (eBPF). Add an function to convert from old to
new.
The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.
The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
See https://github.com/tklauser/filter2xdp
Both authors have agreed that it is allowable to create a modified
version of this code and license it with BSD license used by DPDK.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Stephen Hemminger [Wed, 20 Oct 2021 21:42:27 +0000 (14:42 -0700)]
bpf: allow self-xor operation
Some BPF programs may use XOR of a register with itself
as a way to zero register in one instruction.
The BPF filter converter generates this in the prolog
to the generated code.
The BPF validator would not allow this because the value of
register was undefined. But after this operation it always zero.
Fixes:
8021917293d0 ("bpf: add extra validation for input BPF program")
Cc: stable@dpdk.org
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Stephen Hemminger [Wed, 20 Oct 2021 21:42:34 +0000 (14:42 -0700)]
test/bpf: enable in fast tests
The BPF autotest is defined but not run automatically.
Since it is short, it should be added to the autotest suite.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Stephen Hemminger [Wed, 20 Oct 2021 21:42:33 +0000 (14:42 -0700)]
test/pcapng: test pcapng library
Simple unit test that created pcapng file using API.
To run this test you need to have at least one device.
For example:
DPDK_TEST=pcapng_autotest ./build/app/test/dpdk-test -l 0-15 \
--no-huge -m 2048 --vdev=net_tap,iface=dummy
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Stephen Hemminger [Wed, 20 Oct 2021 21:42:26 +0000 (14:42 -0700)]
pcapng: add new library for writing pcapng files
This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.
See
https://github.com/pcapng/pcapng/
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Stephen Hemminger [Wed, 20 Oct 2021 21:42:25 +0000 (14:42 -0700)]
pdump: disable on Windows
The current version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built and Windows does not have multi-process support.
The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Chengwen Feng [Thu, 21 Oct 2021 12:59:38 +0000 (20:59 +0800)]
dmadev: fix debug build
This patch fix compile error when enable RTE_DMADEV_DEBUG.
Fixes:
ea8cf0f8536d ("dmadev: add burst capacity API")
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Conor Walsh <conor.walsh@intel.com>
Chengwen Feng [Thu, 21 Oct 2021 12:59:37 +0000 (20:59 +0800)]
test/dma: check DMA info query
This patch add check for rte_dma_info_get() API.
Fixes:
718f7804841f ("test/dma: add basic dmadev instance tests")
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Conor Walsh <conor.walsh@intel.com>
David Marchand [Thu, 21 Oct 2021 12:59:36 +0000 (20:59 +0800)]
dmadev: hide devices array
No need to expose rte_dma_devices out of the dmadev library.
Existing helpers should be enough, and inlines make use of
rte_dma_fp_objs.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Tested-by: Conor Walsh <conor.walsh@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
Pravin Pathak [Thu, 14 Oct 2021 14:51:41 +0000 (14:51 +0000)]
event/dlb2: optimize credit allocations using port
This commit implements the changes required for using suggested
port type hint feature. Each port uses different credit quanta
based on port type specified using port configuration flags.
Each port has separate quanta defined in dlb2_priv.h
Producer and consumer ports will need larger quanta value to reduce number
of credit calls they make. Workers can use small quanta as they mostly
work out of locally cached credits and don't request/return credits often.
Signed-off-by: Pravin Pathak <pravin.pathak@intel.com>
Harry van Haaren [Thu, 14 Oct 2021 14:51:40 +0000 (14:51 +0000)]
app/eventdev: add event port hints for perf mode
This commit adds producer, worker and consumer port hints for the
test-eventdev application performance tests.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Harry van Haaren [Thu, 14 Oct 2021 14:51:39 +0000 (14:51 +0000)]
examples/eventdev_pipeline: use port config hints
This commit adds the per-port hints added to the eventdev API, indicating
which eventdev ports will be used for producing, forwarding, or consuming
events from the system.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Harry van Haaren [Thu, 14 Oct 2021 14:51:38 +0000 (14:51 +0000)]
eventdev: add usage hints to port configure API
This commit introduces 3 flags to the port configuration flags.
These flags allow the application to indicate what type of work
is expected to be performed by an eventdev port.
The three new flags are
- RTE_EVENT_PORT_CFG_HINT_PRODUCER (mostly RTE_EVENT_OP_NEW events)
- RTE_EVENT_PORT_CFG_HINT_CONSUMER (mostly RTE_EVENT_OP_RELEASE events)
- RTE_EVENT_PORT_CFG_HINT_WORKER (mostly RTE_EVENT_OP_FORWARD events)
These flags are only hints, and the PMDs must operate under the
assumption that any port can enqueue an event with any type of op.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Shijith Thotton [Mon, 27 Sep 2021 06:21:35 +0000 (11:51 +0530)]
examples/l2fwd-event: support event vector
Added changes to receive packets as event vector. By default this is
disabled and can be enabled using the option --event-vector. Vector
size and timeout to form the vector can be configured using options
--event-vector-size and --event-vector-tmo.
Example:
dpdk-l2fwd-event -l 0-3 -n 4 -- -p 0x03 --mode=eventdev \
--eventq-sched=ordered --event-vector --event-vector-size 16
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Shijith Thotton [Mon, 27 Sep 2021 04:57:42 +0000 (10:27 +0530)]
examples/l3fwd: support event vector
Added changes to receive packets as event vector. By default this is
disabled and can be enabled using the option --event-vector. Vector
size and timeout to form the vector can be configured using options
--event-vector-size and --event-vector-tmo.
Example:
dpdk-l3fwd -l 0-3 -n 4 -- -p 0x03 --mode=eventdev \
--eventq-sched=ordered --event-vector --event-vector-size 16
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Naga Harish K S V [Mon, 18 Oct 2021 08:25:41 +0000 (03:25 -0500)]
eventdev/eth_rx: fix WRR buffer overrun
When a poll queue is removed from a rx_adapter instance, the WRR poll
array is recomputed. The wrr array length is reduced in this case. The
next wrr position to poll is stored in wrr_pos variable of rx_adapter
instance. This wrr_pos can become invalid in some cases after wrr is
recomputed. Using this variable to get the next queue and device pair
may leed to wrr buffer overruns.
Resetting the wrr_pos to zero after recomputation of wrr array fixes
the buffer overrun issue.
Fixes:
9c38b704d280 ("eventdev: add eth Rx adapter implementation")
Cc: stable@dpdk.org
Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Rashmi Shetty [Fri, 15 Oct 2021 15:18:53 +0000 (10:18 -0500)]
app/eventdev: support burst enqueue
Introduce a new command line option prod_enq_burst_sz
to set burst size for eventdev enqueue at producer in perf_queue
test. The newly added function perf_producer_burst is called when
prod_enq_burst_sz is greater than 1.
Signed-off-by: Rashmi Shetty <rashmi.shetty@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Harry van Haaren [Thu, 14 Oct 2021 09:54:44 +0000 (09:54 +0000)]
app/eventdev: fix terminal colour after control-c exit
Before this commit, a Control^C exit of the test-eventdev application
would print the worker packet percentages, and leave the terminal with
a green colour despite the colour reset being issued after the newline.
By moving the colour reset command before the \n the issue is fixed.
Fixes:
6b1a14a83a06 ("app/eventdev: add packet distribution logs")
Cc: stable@dpdk.org
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:36:09 +0000 (05:06 +0530)]
eventdev: mark trace variables as internal
Mark rte_trace global variables as internal i.e. remove them
from experimental section of version map.
Some of them are used in inline APIs, mark those as global.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:36:08 +0000 (05:06 +0530)]
eventdev: make trace API internal
Slowpath trace APIs are only used in rte_eventdev.c so make them
as internal.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:36:07 +0000 (05:06 +0530)]
eventdev: promote event vector API to stable
Promote event vector configuration APIs to stable.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:36:06 +0000 (05:06 +0530)]
eventdev/timer: move adapters memory to hugepage
Move memory used by timer adapters to hugepage.
Allocate memory on the first adapter create or lookup to address
both primary and secondary process usecases.
This will prevent TLB misses if any and aligns to memory structure
of other subsystems.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:36:05 +0000 (05:06 +0530)]
eventdev/timer: rearrange struct fields
Rearrange fields in rte_event_timer data structure to remove holes.
Also, remove use of volatile from rte_event_timer.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:36:04 +0000 (05:06 +0530)]
eventdev: remove rte prefix for internal structs
Remove rte_ prefix from rte_eth_event_enqueue_buffer,
rte_event_eth_rx_adapter and rte_event_crypto_adapter
as they are only used in rte_event_eth_rx_adapter.c and
rte_event_crypto_adapter.c
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:36:03 +0000 (05:06 +0530)]
eventdev: hide timer adapter PMD file
Hide rte_event_timer_adapter_pmd.h file as it is an internal file.
Remove rte_ prefix from rte_event_timer_adapter_ops structure.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:36:02 +0000 (05:06 +0530)]
eventdev: hide event device related structures
Move rte_eventdev, rte_eventdev_data structures to eventdev_pmd.h.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Harman Kalra <hkalra@marvell.com>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:36:01 +0000 (05:06 +0530)]
eventdev: use new API for inline functions
Use new driver interface for the fastpath enqueue/dequeue inline
functions.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:36:00 +0000 (05:06 +0530)]
drivers/event: invoke probing finish function
Invoke event_dev_probing_finish() function at the end of probing,
this function sets the function pointers in the fp_ops flat array.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:35:59 +0000 (05:05 +0530)]
eventdev: move inline APIs into separate structure
Move fastpath inline function pointers from rte_eventdev into a
separate structure accessed via a flat array.
The intention is to make rte_eventdev and related structures private
to avoid future API/ABI breakages.`
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:35:58 +0000 (05:05 +0530)]
eventdev: allocate max space for internal arrays
Allocate max space for internal port, port config, queue config and
link map arrays.
Introduce new macro RTE_EVENT_MAX_PORTS_PER_DEV and set it to max
possible value.
This simplifies the port and queue reconfigure scenarios and will
also allow inline functions to refer pointer to internal port data
without extra checking of current number of configured queues.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:35:57 +0000 (05:05 +0530)]
eventdev: separate internal structures
Create rte_eventdev_core.h and move all the internal data structures
to this file. These structures are mostly used by drivers, but they
need to be in the public header file as they are accessed by datapath
inline functions for performance reasons.
The accessibility of these data structures is not changed.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Pavan Nikhilesh [Mon, 18 Oct 2021 23:35:56 +0000 (05:05 +0530)]
eventdev: make driver interface as internal
Mark all the driver specific functions as internal, remove
`rte` prefix from `struct rte_eventdev_ops`.
Remove experimental tag from internal functions.
Remove `eventdev_pmd.h` from non-internal header files.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Ganapati Kundapura [Wed, 13 Oct 2021 07:57:03 +0000 (02:57 -0500)]
eventdev/eth_rx: support telemetry
Added telemetry callbacks to get Rx adapter stats, reset stats and
to get Rx queue config information.
Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Naga Harish K S V [Wed, 6 Oct 2021 07:55:46 +0000 (02:55 -0500)]
eventdev/eth_rx: add per-queue event buffer
Added per queue buffer. To configure per queue event buffer size,
application sets rte_event_eth_rx_adapter_params::use_queue_event_buf
flag as true while using rte_event_eth_rx_adapter_create_with_params().
The per queue event buffer size is populated in
rte_event_eth_rx_adapter_queue_conf::event_buf_size and passed
to rte_event_eth_rx_adapter_queue_add().
Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Naga Harish K S V [Wed, 6 Oct 2021 07:55:44 +0000 (02:55 -0500)]
eventdev/eth_rx: add event buffer size configurability
Currently event buffer is static array with a default size defined
internally.
To configure event buffer size from application,
rte_event_eth_rx_adapter_create_with_params() API is added which
takes struct rte_event_eth_rx_adapter_params to configure event
buffer size in addition other params. The event buffer size is
rounded up for better buffer utilization and performance. In case
of NULL params argument, default event buffer size is used.
Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Ganapati Kundapura [Thu, 16 Sep 2021 12:51:06 +0000 (07:51 -0500)]
eventdev/eth_rx: support Rx queue config get
Added rte_event_eth_rx_adapter_queue_conf_get() API to get rx queue
information - event queue identifier, flags for handling received packets,
scheduler type, event priority, polling frequency of the receive queue
and flow identifier in rte_event_eth_rx_adapter_queue_conf structure
Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Ganapati Kundapura [Tue, 28 Sep 2021 16:38:48 +0000 (11:38 -0500)]
eventdev/eth_rx: use timestamp as dynamic mbuf field
Add support to register timestamp dynamic field in mbuf.
Update the timestamp in mbuf for each packet before enqueuing
to event device if the timestamp is not already set.
Adding the timestamp in Rx adapter avoids additional latency
due to the event device.
Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Pavan Nikhilesh [Wed, 15 Sep 2021 13:15:20 +0000 (18:45 +0530)]
eventdev/eth_rx: simplify event vector config
Include vector configuration into the structure
``rte_event_eth_rx_adapter_queue_conf`` that is used to configure
Rx adapter ethernet device Rx queue parameters.
This simplifies event vector configuration as it avoids splitting
configuration per Rx queue.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Shijith Thotton [Fri, 3 Sep 2021 06:39:28 +0000 (12:09 +0530)]
eventdev/crypto: add cryptodev start in adapter spec
Event crypto adapter spec does not mention about cryptodev start and
stop. Cryptodev attached to the adapter should be started before calling
crypto adapter start. Added the same in spec and test application.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Shijith Thotton [Mon, 30 Aug 2021 16:14:46 +0000 (21:44 +0530)]
event/cnxk: fix max timer chunk pool cache size
Reduced max chunk pool cache size from RTE_MEMPOOL_CACHE_MAX_SIZE(512)
to 128.
If chunk pool cache is empty, it gets filled during arm. Filling 512
entries at a time will fail arm if timeout is shorter, hence
reduce the pool cache size.
Fixes:
0e792433d051 ("event/cnxk: create and free timer adapter")
Cc: stable@dpdk.org
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Shijith Thotton [Mon, 30 Aug 2021 16:06:46 +0000 (21:36 +0530)]
event/cnxk: fix SSO and TIM argument parsing
Type of kvargs value and handler function argument should match to avoid
spilling memory.
Fixes:
7ffa7379965e ("event/cnxk: add option to configure getwork mode")
Cc: stable@dpdk.org
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Ganapati Kundapura [Mon, 30 Aug 2021 13:06:25 +0000 (08:06 -0500)]
eventdev/eth_rx: make enqueue buffer circular
Rx adapter uses memove() to move unprocessed events to the beginning of
the packet enqueue buffer. The use memmove() was found to consume good
amount of CPU cycles (about 20%).
This patch removes the use of memove() while implementing a circular
buffer to avoid copying of data. With this change RX adapter is able
to fill the buffer of 16384 events.
Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
David Marchand [Tue, 19 Oct 2021 11:26:02 +0000 (13:26 +0200)]
test: rely on EAL detection for core list
Cores count has a direct impact on the time needed to complete unit
tests.
Currently, the core list used for unit test is enforced to "all cores on
the system" with no way for (CI) users to adapt it.
On the other hand, EAL default behavior (when no -c/-l option gets passed)
is to start threads on as many cores available in the process cpu
affinity.
Remove logic from meson: users can then select where to run the tests by
either running meson with a custom cpu affinity (using taskset/cpuset
depending on OS) or by passing a --test-args option to meson.
Example:
$ sudo meson test -C build --suite fast-tests -t 3 --test-args "-l 0-3"
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Michael Baum [Tue, 19 Oct 2021 20:56:02 +0000 (23:56 +0300)]
common/mlx5: share MR mempool registration
Expand the use of mempool registration to MR management for other
drivers.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:56:01 +0000 (23:56 +0300)]
common/mlx5: support device DMA map and unmap
Since MR management has moved to the common area, there is no longer a
need for the DMA map and unmap function for each driver.
This patch share those functions. For most drivers it supports these
operations for the first time.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:56:00 +0000 (23:56 +0300)]
common/mlx5: share MR management
Add global shared MR cache as a field of common device structure.
Move MR management to use this global cache for all drivers.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:59 +0000 (23:55 +0300)]
common/mlx5: share MR top-half search function
Add function to search in local liniar cache and use it in the drivers
instead of their functions.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:58 +0000 (23:55 +0300)]
common/mlx5: add global MR cache create function
Add function for global shared MR cache structure initialization.
This function include:
- btree initialization.
- set callbacks for reg and dereg MR.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:57 +0000 (23:55 +0300)]
common/mlx5: add MR control initialization
Add function for MR control structure initialization.
This function include:
- btree initialization.
- dev_gen_ptr initialization.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:56 +0000 (23:55 +0300)]
net/mlx5: remove redundancy in MR file
This patch remove two redundant things from MR file:
1. mr_find_contig_memsegs_data structure which is moved to common file
before.
2. External memory mechanism - mlx5_tx_update_ext_mp function.
Since commit [1] which added support for DMA map and unmap, external
mem must be configured by the user using rte_mem_map function and no
need to handle this in pmd.
[1]
commit
989e999d9305
("net/mlx5: support PCI device DMA map and unmap")
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:55 +0000 (23:55 +0300)]
common/mlx5: share HCA capabilities handle
Add HCA attributes structure as a field of device config structure.
It query in common probing, and updates the timestamp format fields.
Each driver use HCA attributes from common device config structure,
instead of query it for itself.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:54 +0000 (23:55 +0300)]
common/mlx5: share protection domain object
Create shared Protection Domain in common area and add it and its PDN as
fields of common device structure.
Use this Protection Domain in all drivers and remove the PD and PDN
fields from their private structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:53 +0000 (23:55 +0300)]
common/mlx5: disable RoCE in device context creation
Add option to get IB device after disabling RoCE. It is relevant if
there is vDPA class in device arguments list.
Use common device context in vDPA driver and remove the ctx field from
its private structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:52 +0000 (23:55 +0300)]
common/mlx5: share device context object
Create shared context device in common area and add it as a field of
common device.
Use this context device in all drivers and remove the ctx field from
their private structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:51 +0000 (23:55 +0300)]
net/mlx5: remove redundant flag in device config
Device configure structure has flag named devx as same as SH structure
with the same meaning.
Remove the flag from the configuration structure and move all the
usages to the SH flag.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:50 +0000 (23:55 +0300)]
common/mlx5: move basic probing functions to common
Move open IBV/DevX device function to common.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:49 +0000 (23:55 +0300)]
net/mlx5: rearrange probing functions for Windows
Rearrange device detection code.
Rearrange configuration structures filling.
Remove unneeded variables.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:48 +0000 (23:55 +0300)]
common/mlx5: share memory related devargs
Add device configure structure and function to parse user device
arguments into it.
Move parsing and management of relevant device arguments to common.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:47 +0000 (23:55 +0300)]
common/mlx5: share common definitions
Create MACRO definitions file in the common driver as preparation for MR
and basic probe sharing.
Move relevant definitions from the net driver to the above file.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:46 +0000 (23:55 +0300)]
common/mlx5: share basic probing with internal drivers
Create common probing structure that includes, for now, basic probing
information detected by the common driver and share it with all the
internal drivers.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Tue, 19 Oct 2021 20:55:45 +0000 (23:55 +0300)]
net/mlx5: register memory event callback in Windows
In device initialization, the driver registers to free hugepages events.
When hugepage is released, this callback frees all its related MRs.
In Windows initialization, this callback is not registered what may
cause to use invalid memory.
This patch adds memory event callback registration in Windows
initialization.
Fixes:
980826dc6f0f ("net/mlx5: probe on Windows")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Xueming Li [Wed, 20 Oct 2021 15:47:39 +0000 (23:47 +0800)]
test: add devargs test cases
Initial version to test global devargs syntax.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
Xueming Li [Wed, 20 Oct 2021 15:47:38 +0000 (23:47 +0800)]
devargs: make bus optional
Global devargs syntax is used as device iteration filter like
"class=vdpa", a devargs without bus args is valid from parsing
perspective.
This patch makes bus args optional.
Fixes:
d2a66ad79480 ("bus: add device arguments name parsing")
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
Xueming Li [Wed, 20 Oct 2021 15:47:37 +0000 (23:47 +0800)]
devargs: support path value with global device syntax
Slash is used to split global device arguments.
To support path value which contains slash, this patch parses devargs by
locating both slash and layer name key:
bus=a,name=/some/path/class=b,k1=v1/driver=c,k2=v2
"/class=" and "/driver" are valid start of a layer.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
Olivier Matz [Wed, 29 Sep 2021 21:37:07 +0000 (23:37 +0200)]
mbuf: fix reset on mbuf free
m->nb_seg must be reset on mbuf free whatever the value of m->next,
because it can happen that m->nb_seg is != 1. For instance in this
case:
m1 = rte_pktmbuf_alloc(mp);
rte_pktmbuf_append(m1, 500);
m2 = rte_pktmbuf_alloc(mp);
rte_pktmbuf_append(m2, 500);
rte_pktmbuf_chain(m1, m2);
m0 = rte_pktmbuf_alloc(mp);
rte_pktmbuf_append(m0, 500);
rte_pktmbuf_chain(m0, m1);
As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
segment (this is not required), after this code the mbuf chain
have 3 segments:
- m0: next=m1, nb_seg=3
- m1: next=m2, nb_seg=2
- m2: next=NULL, nb_seg=1
Then split this chain between m1 and m2, it would result in 2 packets:
- first packet
- m0: next=m1, nb_seg=2
- m1: next=NULL, nb_seg=2
- second packet
- m2: next=NULL, nb_seg=1
Freeing the first packet will not restore nb_seg=1 in the second
segment. This is an issue because it is expected that mbufs stored
in pool have their nb_seg field set to 1.
Fixes:
8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
Honnappa Nagarahalli [Fri, 15 Oct 2021 02:27:10 +0000 (21:27 -0500)]
hash: promote some functions to stable
Promote APIs to stable.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Vladimir Medvedkin [Thu, 14 Oct 2021 17:48:19 +0000 (18:48 +0100)]
test/hash: fix buffer overflow with jhash
This patch fixes buffer overflow reported by ASAN,
please reference https://bugs.dpdk.org/show_bug.cgi?id=818
Some tests for the rte_hash table use the rte_jhash_32b() as
the hash function. This hash function interprets the length
argument in units of 4 bytes.
This patch adds a wrapper function around rte_jhash_32b()
to reflect API differences regarding the length argument,
effectively dividing it by 4.
For some tests rte_jhash() is used with keys of length not
a multiple of 4 bytes. From the rte_jhash() documentation:
If input key is not aligned to four byte boundaries or a
multiple of four bytes in length, the memory region just
after may be read (but not used in the computation).
This patch increases the size of the proto field of the
flow_key struct up to uint32_t.
Bugzilla ID: 818
Fixes:
af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Honnappa Nagarahalli [Thu, 14 Oct 2021 20:55:50 +0000 (15:55 -0500)]
ring: fix name size in ring structure
Use correct define for the name array size. The change breaks ABI and
hence cannot be backported to stable branches.
Fixes:
38c9817ee1d8 ("mempool: adjust name size in related data types")
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Thomas Monjalon [Wed, 20 Oct 2021 14:55:17 +0000 (15:55 +0100)]
ethdev: replace bit shifts with macros
The macros RTE_BIT32 and RTE_BIT64 are used to replace bit shifts.
The macro UINT64C is also used to replace remaining occurrences of ULL.
The bit shifts of ETH_RSS_LEVEL_* are kept for aesthetic reason.
The API of rte_mtr and rte_tm is using enums for 64-bit variables.
As they are enums, unsigned bit cannot be used.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Andrew Rybchenko [Wed, 20 Oct 2021 13:15:40 +0000 (16:15 +0300)]
ethdev: forbid closing started device
Ethernet device must be stopped first before close in accordance
with the documentation.
Fixes:
980995f8cc56 ("ethdev: improve API comments of close and detach functions")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Gregory Etelson [Wed, 20 Oct 2021 15:14:57 +0000 (18:14 +0300)]
app/testpmd: add flex item commands
Network port hardware is shipped with fixed number of
supported network protocols. If application must work with a
protocol that is not included in the port hardware by default, it
can try to add the new protocol to port hardware.
Flex item or flex parser is port infrastructure that allows
application to add support for a custom network header and
offload flows to match the header elements.
Application must complete the following tasks to create a flow
rule that matches custom header:
1. Create flow item object in port hardware.
Application must provide custom header configuration to PMD.
PMD will use that configuration to create flex item object in
port hardware.
2. Create flex patterns to match. Flex pattern has a spec and a mask
components, like a regular flow item. Combined together, spec and mask
can target unique data sequence or a number of data sequences in the
custom header.
Flex patterns of the same flex item can have different lengths.
Flex pattern is identified by unique handler value.
3. Create a flow rule with a flex flow item that references
flow pattern.
Testpmd flex CLI commands are:
testpmd> flow flex_item create <port> <flex_id> <filename>
testpmd> set flex_pattern <pattern_id> \
spec <spec data> mask <mask data>
testpmd> set flex_pattern <pattern_id> is <spec_data>
testpmd> flow create <port> ... \
/ flex item is <flex_id> pattern is <pattern_id> / ...
The patch works with the jansson library API.
A new optional dependency on jansson library is added for
testpmd. If jansson not detected the flex item functionality
is disabled.
Jansson development files must be present:
jansson.pc, jansson.h libjansson.[a,so]
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Gregory Etelson [Wed, 20 Oct 2021 15:14:56 +0000 (18:14 +0300)]
app/testpmd: add flow command parsing routine
testpmd flow creation is constructed from these procedures:
1. receive string with flow rule description;
2. parse input string and build flow parameters: port_id value,
flow attributes, items array, actions array;
3. create a flow rule from flow rule parameters.
Flow rule creation procedures are built as a pipeline. A new
procedure starts immediately after successful predecessor completion.
Due to this we have no dedicated routines providing intermediate
results for step 1-3 above.
The patch adds `flow_parse()` function call. It parses input string
and provides a caller with parsed data. This is a preparation step
for introducing flex item command processing.
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Viacheslav Ovsiienko [Wed, 20 Oct 2021 15:14:55 +0000 (18:14 +0300)]
ethdev: introduce configurable flexible item
1. Introduction and Retrospective
Nowadays the networks are evolving fast and wide, the network
structures are getting more and more complicated, the new
application areas are emerging. To address these challenges
the new network protocols are continuously being developed,
considered by technical communities, adopted by industry and,
eventually implemented in hardware and software. The DPDK
framework follows the common trends and if we bother
to glance at the RTE Flow API header we see the multiple
new items were introduced during the last years since
the initial release.
The new protocol adoption and implementation process is
not straightforward and takes time, the new protocol passes
development, consideration, adoption, and implementation
phases. The industry tries to mitigate and address the
forthcoming network protocols, for example, many hardware
vendors are implementing flexible and configurable network
protocol parsers. As DPDK developers, could we anticipate
the near future in the same fashion and introduce the similar
flexibility in RTE Flow API?
Let's check what we already have merged in our project, and
we see the nice raw item (rte_flow_item_raw). At the first
glance, it looks superior and we can try to implement a flow
matching on the header of some relatively new tunnel protocol,
say on the GENEVE header with variable length options. And,
under further consideration, we run into the raw item
limitations:
- only fixed size network header can be represented
- the entire network header pattern of fixed format
(header field offsets are fixed) must be provided
- the search for patterns is not robust (the wrong matches
might be triggered), and actually is not supported
by existing PMDs
- no explicitly specified relations with preceding
and following items
- no tunnel hint support
As the result, implementing the support for tunnel protocols
like aforementioned GENEVE with variable extra protocol option
with flow raw item becomes very complicated and would require
multiple flows and multiple raw items chained in the same
flow (by the way, there is no support found for chained raw
items in implemented drivers).
This RFC introduces the dedicated flex item (rte_flow_item_flex)
to handle matches with existing and new network protocol headers
in a unified fashion.
2. Flex Item Life Cycle
Let's assume there are the requirements to support the new
network protocol with RTE Flows. What is given within protocol
specification:
- header format
- header length, (can be variable, depending on options)
- potential presence of extra options following or included
in the header the header
- the relations with preceding protocols. For example,
the GENEVE follows UDP, eCPRI can follow either UDP
or L2 header
- the relations with following protocols. For example,
the next layer after tunnel header can be L2 or L3
- whether the new protocol is a tunnel and the header
is a splitting point between outer and inner layers
The supposed way to operate with flex item:
- application defines the header structures according to
protocol specification
- application calls rte_flow_flex_item_create() with desired
configuration according to the protocol specification, it
creates the flex item object over specified ethernet device
and prepares PMD and underlying hardware to handle flex
item. On item creation call PMD backing the specified
ethernet device returns the opaque handle identifying
the object has been created
- application uses the rte_flow_item_flex with obtained handle
in the flows, the values/masks to match with fields in the
header are specified in the flex item per flow as for regular
items (except that pattern buffer combines all fields)
- flows with flex items match with packets in a regular fashion,
the values and masks for the new protocol header match are
taken from the flex items in the flows
- application destroys flows with flex items
- application calls rte_flow_flex_item_release() as part of
ethernet device API and destroys the flex item object in
PMD and releases the engaged hardware resources
3. Flex Item Structure
The flex item structure is intended to be used as part of the flow
pattern like regular RTE flow items and provides the mask and
value to match with fields of the protocol item was configured
for.
struct rte_flow_item_flex {
void *handle;
uint32_t length;
const uint8_t* pattern;
};
The handle is some opaque object maintained on per device basis
by underlying driver.
The protocol header fields are considered as bit fields, all
offsets and widths are expressed in bits. The pattern is the
buffer containing the bit concatenation of all the fields
presented at item configuration time, in the same order and
same amount. If byte boundary alignment is needed an application
can use a dummy type field, this is just some kind of gap filler.
The length field specifies the pattern buffer length in bytes
and is needed to allow rte_flow_copy() operations. The approach
of multiple pattern pointers and lengths (per field) was
considered and found clumsy - it seems to be much suitable for
the application to maintain the single structure within the
single pattern buffer.
4. Flex Item Configuration
The flex item configuration consists of the following parts:
- header field descriptors:
- next header
- next protocol
- sample to match
- input link descriptors
- output link descriptors
The field descriptors tell the driver and hardware what data should
be extracted from the packet and then control the packet handling
in the flow engine. Besides this, sample fields can be presented
to match with patterns in the flows. Each field is a bit pattern.
It has width, offset from the header beginning, mode of offset
calculation, and offset related parameters.
The next header field is special, no data are actually taken
from the packet, but its offset is used as a pointer to the next
header in the packet, in other words the next header offset
specifies the size of the header being parsed by flex item.
There is one more special field - next protocol, it specifies
where the next protocol identifier is contained and packet data
sampled from this field will be used to determine the next
protocol header type to continue packet parsing. The next
protocol field is like eth_type field in MAC2, or proto field
in IPv4/v6 headers.
The sample fields are used to represent the data be sampled
from the packet and then matched with established flows.
There are several methods supposed to calculate field offset
in runtime depending on configuration and packet content:
- FIELD_MODE_FIXED - fixed offset. The bit offset from
header beginning is permanent and defined by field_base
configuration parameter.
- FIELD_MODE_OFFSET - the field bit offset is extracted
from other header field (indirect offset field). The
resulting field offset to match is calculated from as:
field_base + (*offset_base & offset_mask) << offset_shift
This mode is useful to sample some extra options following
the main header with field containing main header length.
Also, this mode can be used to calculate offset to the
next protocol header, for example - IPv4 header contains
the 4-bit field with IPv4 header length expressed in dwords.
One more example - this mode would allow us to skip GENEVE
header variable length options.
- FIELD_MODE_BITMASK - the field bit offset is extracted
from other header field (indirect offset field), the latter
is considered as bitmask containing some number of one bits,
the resulting field offset to match is calculated as:
field_base + bitcount(*offset_base & offset_mask) << offset_shift
This mode would be useful to skip the GTP header and its
extra options with specified flags.
- FIELD_MODE_DUMMY - dummy field, optionally used for byte
boundary alignment in pattern. Pattern mask and data are
ignored in the match. All configuration parameters besides
field size and offset are ignored.
Note: "*" - means the indirect field offset is calculated
and actual data are extracted from the packet by this
offset (like data are fetched by pointer *p from memory).
The offset mode list can be extended by vendors according to
hardware supported options.
The input link configuration section tells the driver after
what protocols and at what conditions the flex item can follow.
Input link specified the preceding header pattern, for example
for GENEVE it can be UDP item specifying match on destination
port with value 6081. The flex item can follow multiple header
types and multiple input links should be specified. At flow
creation time the item with one of the input link types should
precede the flex item and driver will select the correct flex
item settings, depending on the actual flow pattern.
The output link configuration section tells the driver how
to continue packet parsing after the flex item protocol.
If multiple protocols can follow the flex item header the
flex item should contain the field with the next protocol
identifier and the parsing will be continued depending
on the data contained in this field in the actual packet.
The flex item fields can participate in RSS hash calculation,
the dedicated flag is present in the field description to specify
what fields should be provided for hashing.
5. Flex Item Chaining
If there are multiple protocols supposed to be supported with
flex items in chained fashion - two or more flex items within
the same flow and these ones might be neighbors in the pattern,
it means the flex items are mutual referencing. In this case,
the item that occurred first should be created with empty
output link list or with the list including existing items,
and then the second flex item should be created referencing
the first flex item as input arc, drivers should adjust
the item configuration.
Also, the hardware resources used by flex items to handle
the packet can be limited. If there are multiple flex items
that are supposed to be used within the same flow it would
be nice to provide some hint for the driver that these two
or more flex items are intended for simultaneous usage.
The fields of items should be assigned with hint indices
and these indices from two or more flex items supposed
to be provided within the same flow should be the same
as well. In other words, the field hint index specifies
the group of fields that can be matched simultaneously
within a single flow. If hint indices are specified,
the driver will try to engage not overlapping hardware
resources and provide independent handling of the field
groups with unique indices. If the hint index is zero
the driver assigns resources on its own.
6. Example of New Protocol Handling
Let's suppose we have the requirements to handle the new tunnel
protocol that follows UDP header with destination port 0xFADE
and is followed by MAC header. Let the new protocol header format
be like this:
struct new_protocol_header {
rte_be32 header_length; /* length in dwords, including options */
rte_be32 specific0; /* some protocol data, no intention */
rte_be32 specific1; /* to match in flows on these fields */
rte_be32 crucial; /* data of interest, match is needed */
rte_be32 options[0]; /* optional protocol data, variable length */
};
The supposed flex item configuration:
struct rte_flow_item_flex_field field0 = {
.field_mode = FIELD_MODE_DUMMY, /* Affects match pattern only */
.field_size = 96, /* three dwords from the beginning */
};
struct rte_flow_item_flex_field field1 = {
.field_mode = FIELD_MODE_FIXED,
.field_size = 32, /* Field size is one dword */
.field_base = 96, /* Skip three dwords from the beginning */
};
struct rte_flow_item_udp spec0 = {
.hdr = {
.dst_port = RTE_BE16(0xFADE),
}
};
struct rte_flow_item_udp mask0 = {
.hdr = {
.dst_port = RTE_BE16(0xFFFF),
}
};
struct rte_flow_item_flex_link link0 = {
.item = {
.type = RTE_FLOW_ITEM_TYPE_UDP,
.spec = &spec0,
.mask = &mask0,
};
struct rte_flow_item_flex_conf conf = {
.next_header = {
.tunnel = FLEX_TUNNEL_MODE_SINGLE,
.field_mode = FIELD_MODE_OFFSET,
.field_base = 0,
.offset_base = 0,
.offset_mask = 0xFFFFFFFF,
.offset_shift = 2 /* Expressed in dwords, shift left by 2 */
},
.sample = {
&field0,
&field1,
},
.nb_samples = 2,
.input_link[0] = &link0,
.nb_inputs = 1
};
Let's suppose we have created the flex item successfully, and PMD
returned the handle 0x123456789A. We can use the following item
pattern to match the crucial field in the packet with value 0x00112233:
struct new_protocol_header spec_pattern =
{
.crucial = RTE_BE32(0x00112233),
};
struct new_protocol_header mask_pattern =
{
.crucial = RTE_BE32(0xFFFFFFFF),
};
struct rte_flow_item_flex spec_flex = {
.handle = 0x123456789A
.length = sizeiof(struct new_protocol_header),
.pattern = &spec_pattern,
};
struct rte_flow_item_flex mask_flex = {
.length = sizeof(struct new_protocol_header),
.pattern = &mask_pattern,
};
struct rte_flow_item item_to_match = {
.type = RTE_FLOW_ITEM_TYPE_FLEX,
.spec = &spec_flex,
.mask = &mask_flex,
};
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Gregory Etelson [Wed, 20 Oct 2021 15:14:54 +0000 (18:14 +0300)]
ethdev: support flow elements with variable length
Flow API provides RAW item type for packet patterns of variable
length. The RAW item structure has fixed size members that describe the
variable pattern length and methods to process it.
There is the new Flow items with variable lengths coming - flex
item. In order to handle this item (and potentially other new ones
with variable pattern length) in flow copy and conversion routines
the helper function is introduced.
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Dapeng Yu [Tue, 19 Oct 2021 09:44:49 +0000 (17:44 +0800)]
net/ice: fix sideband queue initialization
Sideband queue need to be initialized when device is initialized.
Otherwise the calling to function "ice_init_ctrlq" may fail.
This patch fixes it.
Fixes:
97f4f78bbd9f ("net/ice/base: add functions for device clock control")
Cc: stable@dpdk.org
Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Satheesh Paul [Tue, 5 Oct 2021 02:55:27 +0000 (08:25 +0530)]
common/cnxk: improve MCAM entries management
This patch removes the MCAM preallocation scheme. The free
entry cache is removed and for every flow created, an MCAM
allocation request is made to the kernel. Each priority level
has a list of MCAM entries. For every flow rule added, the
MCAM entry obtained from kernel is checked if it is at the
correct user specified priority. If not, the existing rules
are moved across MCAM entries so that the user specified
priority is maintained.
Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Reviewed-by: Kiran Kumar K <kirankumark@marvell.com>
Gowrishankar Muthukrishnan [Tue, 19 Oct 2021 11:27:27 +0000 (16:57 +0530)]
net/cnxk: support telemetry
Add telemetry endpoints to ethdev.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Reviewed-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Gowrishankar Muthukrishnan [Tue, 19 Oct 2021 11:27:26 +0000 (16:57 +0530)]
mempool/cnxk: support telemetry
Adding telemetry endpoints to cnxk mempool driver.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Reviewed-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Gowrishankar Muthukrishnan [Tue, 19 Oct 2021 11:27:25 +0000 (16:57 +0530)]
common/cnxk: support telemetry for NIX
Add telemetry endpoints to NIX.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Reviewed-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Gowrishankar Muthukrishnan [Tue, 19 Oct 2021 11:27:24 +0000 (16:57 +0530)]
common/cnxk: support telemetry for network pool allocator
Add telemetry endpoints to NPA.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Sunil Kumar Kori [Tue, 12 Oct 2021 07:06:12 +0000 (12:36 +0530)]
net/cnxk: support meter action to flow destroy
Meters are configured per flow using rte_flow_create API.
Patch adds support for destroy operation for meter action
applied on the flow.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Sunil Kumar Kori [Tue, 12 Oct 2021 07:06:11 +0000 (12:36 +0530)]
net/cnxk: support meter action to flow create
Meters are configured per flow using rte_flow_create API.
Implement support for meter action applied on the flow.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Sunil Kumar Kori [Tue, 12 Oct 2021 07:06:10 +0000 (12:36 +0530)]
net/cnxk: support to read/update meter stats
Implement API to read and update stats corresponding to
given meter instance for CNXK platform.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Sunil Kumar Kori [Tue, 12 Oct 2021 07:06:09 +0000 (12:36 +0530)]
net/cnxk: support to update precolor DSCP table
Implement API to update DSCP table for pre-coloring for
incoming packet per nixlf for CNXK platform.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Sunil Kumar Kori [Tue, 12 Oct 2021 07:06:08 +0000 (12:36 +0530)]
net/cnxk: support to enable/disable meter
Implement API to enable or disable meter instance for
CNXK platform.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Sunil Kumar Kori [Tue, 12 Oct 2021 07:06:07 +0000 (12:36 +0530)]
net/cnxk: support to delete meter
Implement API to delete meter instance for CNXK platform.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Sunil Kumar Kori [Tue, 12 Oct 2021 07:06:06 +0000 (12:36 +0530)]
net/cnxk: support to create meter
Implement API to create meter instance for CNXK platform.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Sunil Kumar Kori [Tue, 12 Oct 2021 07:06:05 +0000 (12:36 +0530)]
net/cnxk: support to delete meter policy
Implement API to delete meter policy for CNXK platform.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>