Ivan Ilchenko [Fri, 22 Oct 2021 13:17:54 +0000 (16:17 +0300)]
net/virtio: fix link update in speed feature
Link update callback reports speed/duplex based on data
filled on device initialization. This is wrong in case of
VIRTIO_NET_F_SPEED_DUPLEX is negotiated since link could
be down at this time. Fix this function to actually
update the HW data in this case with respect to the fact
that specifying speed via devarg is a highest priority.
Fixes: 1357b4b36246 ("net/virtio: support Virtio link speed feature") Cc: stable@dpdk.org Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Miao Li [Mon, 25 Oct 2021 14:47:25 +0000 (14:47 +0000)]
examples/l3fwd-power: support virtio/vhost
In l3fwd-power, there is default port configuration which requires
RSS and IPv4/UDP/TCP checksum. Once device does not support these,
the l3fwd-power will exit and report an error.
This patch updates the port configuration based on device capabilities
after getting the device information to support devices like virtio
and vhost.
Signed-off-by: Miao Li <miao.li@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Acked-by: David Hunt <david.hunt@intel.com>
Miao Li [Mon, 25 Oct 2021 14:47:24 +0000 (14:47 +0000)]
power: support missing Rx queue info
Since some vdevs like virtio and vhost do not support rxq_info_get and
queue state inquiry, the error return value -ENOTSUP need to be ignored
when queue_stopped cannot get rx queue information and rx queue state.
This patch changes the return value of queue_stopped when
rte_eth_rx_queue_info_get return -ENOTSUP to support vdevs which cannot
provide rx queue information and rx queue state enable power management.
Fixes: 209fd585456c ("power: make ethdev power management thread unsafe") Cc: stable@dpdk.org Signed-off-by: Miao Li <miao.li@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Miao Li [Mon, 25 Oct 2021 14:47:23 +0000 (14:47 +0000)]
net/vhost: support power monitor
According to current semantics of power monitor, this commit adds a
callback function to decide whether aborts the sleep by checking
current value against the expected value and vhost_get_monitor_addr
to provide address to monitor. When no packet come in, the value of
address will not be changed and the running core will sleep. Once
packets arrive, the value of address will be changed and the running
core will wakeup.
Signed-off-by: Miao Li <miao.li@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Acked-by: David Hunt <david.hunt@intel.com>
Miao Li [Mon, 25 Oct 2021 14:47:22 +0000 (14:47 +0000)]
vhost: add power monitor API
This commit defines rte_vhost_power_monitor_cond which is used to pass
some information to vhost driver. The information is including the
address to monitor, the expected value, the mask to extract value read
from 'addr', the value size of monitor address, the match flag used to
distinguish the value used to match something or not match something.
Vhost driver can use these information to fill rte_power_monitor_cond.
Signed-off-by: Miao Li <miao.li@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Acked-by: David Hunt <david.hunt@intel.com>
Miao Li [Mon, 25 Oct 2021 14:47:21 +0000 (14:47 +0000)]
net/virtio: support power monitor
According to current semantics of power monitor, this commit adds a
callback function to decide whether aborts the sleep by checking
current value against the expected value and virtio_get_monitor_addr
to provide address to monitor. When no packet come in, the value of
address will not be changed and the running core will sleep. Once
packets arrive, the value of address will be changed and the running
core will wakeup.
Signed-off-by: Miao Li <miao.li@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Acked-by: David Hunt <david.hunt@intel.com>
Xuan Ding [Wed, 27 Oct 2021 10:00:26 +0000 (10:00 +0000)]
vhost: remove async DMA map status
Async DMA map status flag was added to prevent the unnecessary unmap
when DMA devices bound to kernel driver. This brings maintenance cost
for a lot of code. This patch removes the DMA map status by using
rte_errno instead.
This patch relies on the following patch to fix a partial
unmap check in vfio unmapping API.
[1] https://www.mail-archive.com/dev@dpdk.org/msg226464.html
Signed-off-by: Xuan Ding <xuan.ding@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Maxime Coquelin [Wed, 27 Oct 2021 14:22:13 +0000 (16:22 +0200)]
app/testpmd: add missing flow types in port info
This patch adds missing IPv6-Ex and GTPU flow types to port
info command. It also add the same definitions to
str2flowtype(), used to configure flow director.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
Maxime Coquelin [Wed, 27 Oct 2021 14:22:11 +0000 (16:22 +0200)]
app/testpmd: fix RSS type display
This patch fixes the display of the RSS hash types
configured in the port, which displayed "all" even
if only a single type was configured
Fixes: 3c90743dd3b9 ("app/testpmd: support more types for flow RSS") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Maxime Coquelin [Wed, 27 Oct 2021 14:22:10 +0000 (16:22 +0200)]
app/testpmd: fix RSS key length
port_rss_hash_key_update() initializes rss_conf with the
RSS key configuration provided by the user, but it calls
rte_eth_dev_rss_hash_conf_get() before calling
rte_eth_dev_rss_hash_update(), which overrides the parsed
RSS config.
While the RSS key value is set again after, this is not
the case of the key length. It could cause out of bounds
access if the key length parsed is smaller than the one
read from rte_eth_dev_rss_hash_conf_get().
This patch restores the key length before the
rte_eth_dev_rss_hash_update() call to ensure the RSS key
value/length pair is consistent.
Fixes: 8205e241b2b0 ("app/testpmd: add missing type to RSS hash commands") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Maxime Coquelin [Wed, 27 Oct 2021 14:22:09 +0000 (16:22 +0200)]
net/virtio: support RSS
Provide the capability to update the hash key, hash types
and RETA table on the fly (without needing to stop/start
the device). However, the key length and the number of RETA
entries are fixed to 40B and 128 entries respectively. This
is done in order to simplify the design, but may be
revisited later as the Virtio spec provides this
flexibility.
Note that only VIRTIO_NET_F_RSS support is implemented,
VIRTIO_NET_F_HASH_REPORT, which would enable reporting the
packet RSS hash calculated by the device into mbuf.rss, is
not yet supported.
Regarding the default RSS configuration, it has been
chosen to use the default Intel ixgbe key as default key,
and default RETA is a simple modulo between the hash and
the number of Rx queues.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Ajit Khaparde [Wed, 27 Oct 2021 17:26:20 +0000 (10:26 -0700)]
doc: remove obsolete option from bnxt guide
host-based-truflow devarg is not used anymore to enable host based
flow table management functionality TruFlow. Instead this feature is
now driven by a capability indicated by the firmware.
TruFlow is not in tech preview anymore. Update the doc accordingly.
Fixes: da3731e2ea00 ("net/bnxt: check FW capability to support TRUFLOW") Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Radu Nicolau [Thu, 28 Oct 2021 16:04:59 +0000 (17:04 +0100)]
net/iavf: add watchdog for VF FLR
Add watchdog to iAVF PMD which support monitoring the VFLR register. If
the device is not already in reset then if a VF reset in progress is
detected then notify user through callback and set into reset state.
If the device is already in reset then poll for completion of reset.
The watchdog is disabled by default, to enable it set
IAVF_DEV_WATCHDOG_PERIOD to a non zero value (microseconds)
Radu Nicolau [Thu, 28 Oct 2021 16:04:58 +0000 (17:04 +0100)]
net/iavf: support xstats for inline IPsec crypto
Add per queue counters for maintaining statistics for inline IPsec
crypto offload, which can be retrieved through the
rte_security_session_stats_get() with more detailed errors through the
rte_ethdev xstats.
Radu Nicolau [Thu, 28 Oct 2021 16:04:57 +0000 (17:04 +0100)]
net/iavf: support IPsec inline crypto
Add support for inline crypto for IPsec, for ESP transport and
tunnel over IPv4 and IPv6, as well as supporting the offload for
ESP over UDP, and in conjunction with TSO for UDP and TCP flows.
Implement support for rte_security packet metadata
Add definition for IPsec descriptors, extend support for offload
in data and context descriptor to support
Add support to virtual channel mailbox for IPsec Crypto request
operations. IPsec Crypto requests receive an initial acknowledgment
from physical function driver of receipt of request and then an
asynchronous response with success/failure of request including any
response data.
Add enhanced descriptor debugging
Refactor of scalar tx burst function to support integration of offload
Radu Nicolau [Thu, 28 Oct 2021 16:04:55 +0000 (17:04 +0100)]
net/iavf: rework Tx path
Rework the Tx path and Tx descriptor usage in order to
allow for better use of offload flags and to facilitate enabling of
inline crypto offload feature.
David Marchand [Thu, 28 Oct 2021 10:14:28 +0000 (12:14 +0200)]
net: fix build with sparse on L2TPv2 bitfields
An external project that wants to do additional checks on fields
endianness can remap rte_beXX types to instrumented types and use
sparse.
The current code breaks OVS build with sparse:
../../lib/ofp-packet.c: note: in included file (through
.../ovs/dpdk-dir/build/include/rte_flow.h, ../../lib/netdev-dpdk.h,
../../lib/dp-packet.h):
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:92:37:
error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:93:37:
error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:94:40:
error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:95:37:
error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:96:40:
error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:97:37:
error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:98:37:
error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:99:40:
error: invalid bitfield specifier for type restricted ovs_be16.
.../ovs/dpdk-dir/build/include/rte_l2tpv2.h:100:39:
error: invalid bitfield specifier for type restricted ovs_be16.
make[3]: *** [lib/ofp-packet.lo] Error 1
Use simple uint16_t types for bitfields in L2TPv2 struct.
Fixes: 3a929df1f286 ("ethdev: support L2TPv2 and PPP procotol") Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Andrew Rybchenko [Sun, 24 Oct 2021 16:42:37 +0000 (19:42 +0300)]
app/testpmd: fix MTU configuration before device start
There is no point to do rte_eth_dev_mtu_set() before configure since
set MTU value is overwritten on configure anyway. So, setting of MTU
before configure is rejected now on ethdev level.
If testpmd is going to do configure (e.g. just after testpmd start
with disabled devices start up or any configuration changes in stopped
state which require reconfigure), just save requested MTU in device
config to be applied on reconfigure.
Fixes: 1bb4a528c41f ("ethdev: fix max Rx packet length") Fixes: b26bee10ee37 ("ethdev: forbid MTU set before device configure") Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Jie Wang [Wed, 27 Oct 2021 02:01:52 +0000 (10:01 +0800)]
app/testpmd: fix L2TPv2 message type
In "msg_type |= 0xc800", wider "51200" has high-order bits (0xc800)
that don't affect the narrower left-hand side.
This patch fixes coverity issue by changing the definition type of
"msg_type" from uint8_t to uint16_t.
Coverity issue: 373651 Fixes: 748530f0354e ("app/testpmd: support L2TPv2 and PPP protocol pattern") Signed-off-by: Jie Wang <jie1x.wang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Kalesh AP [Thu, 28 Oct 2021 02:29:44 +0000 (07:59 +0530)]
net/bnxt: fix flow RSS failure handling
With commit 239695f754cb ("net/bnxt: enhance RSS action support"),
when bnxt_hwrm_vnic_rss_cfg() call fails, driver was not setting
flow error using "rte_flow_error_set".
Ajit Khaparde [Tue, 26 Oct 2021 05:14:55 +0000 (22:14 -0700)]
net/bnxt: refactor Rx ring cleanup for representors
Rx ring for representors does not use aggregation rings for Rx.
Instead they use simple software buffers for handling Rx packets.
So there is no need to use the same cleanup routine as done by
the non-representor code path.
Ajit Khaparde [Tue, 26 Oct 2021 05:14:32 +0000 (22:14 -0700)]
net/bnxt: fix RSS action parser
Minor fixes are needed in the RTE_FLOW RSS action parser.
1. Update the comment in the parser to indicate RSS level 1 implies RSS
on outer header.
2. RSS action will not be supported if level is > 1.
3. RSS action will not be supported if user or application specifies
MARK or COUNT action.
4. If RSS types is not specified i.e., is 0, the best effort RSS should
use IPv4 and IPv6 headers. Currently we are considering only IPv4.
Kalesh AP [Tue, 26 Oct 2021 05:14:30 +0000 (10:44 +0530)]
net/bnxt: fix RSS behavior on Thor
Move the Rx queue state update before bnxt_setup_one_vnic()
is called. For Thor, rxq->rx_started and eth_dev->data->rx_queue_state[]
needs to be set for all queues before bnxt_hwrm_vnic_cfg() or
bnxt_vnic_rss_configure() are called.
Gregory Etelson [Tue, 26 Oct 2021 09:25:43 +0000 (12:25 +0300)]
net/mlx5: fix integrity item validation and translation
Integrity item validation and translation must verify that integrity
item bits match L3 and L4 items in flow rule pattern.
For cases when integrity item was positioned before L3 header, such
verification must be split into two stages.
The first stage detects integrity flow item and makes initializations
for the second stage.
The second stage is activated after PMD completes processing of all
flow items in rule pattern. PMD accumulates information about flow
items in flow pattern. When all pattern flow items were processed,
PMD can apply that data to complete integrity item validation
and translation.
Gregory Etelson [Tue, 26 Oct 2021 09:25:42 +0000 (12:25 +0300)]
net/mlx5: fix integrity match on inner and outer headers
MLX5 PMD can match on integrity bits for inner and outer headers in
a single flow.
That means a single flow rule can reference both inner and outer
integrity bits. That is implemented by adding 2 flow integrity items
to a rule - one item for outer integrity bits and other for
inner integrity bits.
Integrity item `level` parameter specifies what part is being
targeted.
Current PMD treated integrity items for outer and inner headers as
the same.
The patch separates PMD verifications for inner and outer integrity
items.
Haifei Luo [Tue, 26 Oct 2021 03:56:32 +0000 (06:56 +0300)]
net/mlx5: enhance flow dump
Multiple rules could use the same encap_decap/modify_hdr/counter action.
The flow dump data could be duplicated.
To avoid redundancy, flow dump value is based on the actions' pointer
instead of previous rules' pointer.
For counter, the data is stored in cmng of priv->sh.
For encap_decap/modify_hdr, the data stored in encaps_decaps/modify_cmds.
Traverse the fields and get action's pointer and information.
Formats are same for information in the dump except "id" stands for
actions' pointer:
Counter: rec_type,id,hits,bytes
Modify_hdr: rec_type,id,actions_number,actions
Encap_decap: rec_type,id,buf
Signed-off-by: Haifei Luo <haifeil@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Jiawei Wang [Wed, 27 Oct 2021 10:35:10 +0000 (13:35 +0300)]
net/mlx5: optimize device spawn time with representors
During the device spawn process, mlx5 PMD queried the available flow
priorities by calling mlx5_flow_discover_priorities, queried
if the DR drop action was supported on the root table by calling
the mlx5_flow_discover_dr_action_support routine, and queried the
availability of metadata register C by calling mlx5_flow_discover_mreg_c
These functions created the test flows to get the supported fields, and
at the end destroyed the test flows. The test flows in the first two
functions was created on the root table.
If the device was spawned with multiple representors, these test flows
were created and destroyed on each representor as well. The above
operations took a significant amount of init time during the device
spawn.
This patch optimizes the device discover functions, if there is
the device with multiple representors (VF/SF) being spawned,
the priority and drop action and metadata register support check can be
done only ones and check results can be shared for all representors.
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Sean Zhang [Mon, 11 Oct 2021 08:46:03 +0000 (11:46 +0300)]
common/mlx5: optimize debug log
Remove debug log inside of mlx5_list_init to avoid flooding debug
messages when creating hash list with large actual size.
Fixes: 9c373c524bae ("common/mlx5: move list utility from net driver") Cc: stable@dpdk.org Signed-off-by: Sean Zhang <xiazhang@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Rongwei Liu [Tue, 26 Oct 2021 08:48:30 +0000 (11:48 +0300)]
net/mlx5: support socket direct mode bonding
In socket direct mode, it's possible to bind any two (maybe four
in future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore,
Kernel driver uses "system_image_guid" to identify if devices can
be bound together or not. Sysfs "phys_switch_id" is used to get
"system_image_guid" of each network interface.
OFED 5.4+ is required to support "phys_switch_id".
Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Dapeng Yu [Tue, 26 Oct 2021 01:55:42 +0000 (09:55 +0800)]
net/ice: fix function pointer in multi-process
This patch uses the index value to call the function, instead of the
function pointer assignment to save the selection of Receive Flex
Descriptor profile ID.
Otherwise the secondary process will run with wrong function address
from primary process.
Dapeng Yu [Tue, 26 Oct 2021 09:53:07 +0000 (17:53 +0800)]
net/ice: workaround DCF reset failure
After DCF is reset by PF, the DCF device un-initialization cannot
function normally, ignore the failure does not help since the kernel
does not clean up resource.
The patch workaround the issue by triggering an additional DCF enable/
disable cycle when a passive reset is detected.
David Marchand [Wed, 27 Oct 2021 12:01:44 +0000 (14:01 +0200)]
ethdev: warn once when using port not ready
Warning continuously is a pain when developping or if a unit test
is/gets broken.
It could also be a problem if application behaves badly only in some
corner cases and a DoS results of those logs being continuously displayed.
Let's warn once per port and per rx/tx.
Getting such a log is scary, but let's make it more eye catching by
dumping a backtrace with it.
Tested by introducing a bug in testpmd:
static int
eth_dev_start_mp(uint16_t port_id)
{
- if (is_proc_primary())
+ if (!is_proc_primary())
return rte_eth_dev_start(port_id);
lcore 0 called rx_pkt_burst for not ready port 0
8: [build/app/dpdk-testpmd() [0x59e839]]
7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
4: [build/app/dpdk-testpmd() [0x65e1be]]
3: [build/app/dpdk-testpmd() [0x65a996]]
2: [build/app/dpdk-testpmd() [0xa6cbc7]]
1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
lcore 0 called rx_pkt_burst for not ready port 1
8: [build/app/dpdk-testpmd() [0x59e839]]
7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
4: [build/app/dpdk-testpmd() [0x65e1be]]
3: [build/app/dpdk-testpmd() [0x65a996]]
2: [build/app/dpdk-testpmd() [0xa6cbc7]]
1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=2
port 0: RX queue number: 1 Tx queue number: 1
Rx offloads=0x0 Tx offloads=0x0
Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate structure") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Ferruh Yigit [Wed, 27 Oct 2021 09:14:29 +0000 (10:14 +0100)]
net/memif: fix driver init with default MTU
Driver is using 'ETH_FRAME_LEN' Linux defined value as max frame length,
which doesn't include FCS (4 bytes CRC). But ethdev by default uses
frame size with FCS when application doesn't define any explicit value.
As a result device configuration fails because device is tried to be
configured with a frame size length that is bigger than what device
reported as supported. Device reports as max supported frame size is
1514 but configured value is 1518.
Instead use DPDK macro, 'RTE_ETHER_MAX_LEN', that includes FCS in the
driver to report the max supported frame size, this matches to the
initial intention.
Fixes: 1bb4a528c41f ("ethdev: fix max Rx packet length") Reported-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: David Christensen <drc@linux.vnet.ibm.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Ferruh Yigit [Wed, 27 Oct 2021 09:14:28 +0000 (10:14 +0100)]
net/af_packet: fix driver init with default MTU
Driver is using 'ETH_FRAME_LEN' Linux defined value as max frame length,
which doesn't include FCS (4 bytes CRC). But ethdev by default uses
frame size with FCS when application doesn't define any explicit value.
As a result device configuration fails because device is tried to be
configured with a frame size length that is bigger than what device
reported as supported. Device reports as max supported frame size is
1514 but configured value is 1518.
Instead use DPDK macro, 'RTE_ETHER_MAX_LEN', that includes FCS in the
driver to report the max supported frame size, this matches to the
initial intention.
Fixes: 1bb4a528c41f ("ethdev: fix max Rx packet length") Reported-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Olivier Matz [Fri, 29 Oct 2021 09:53:10 +0000 (11:53 +0200)]
mem: fix dynamic hugepage mapping in container
Since its introduction in 2018, the SIGBUS handler was never registered,
and all related functions were unused.
A SIGBUS can be received by the application when accessing to hugepages
even if mmap() was successful, This happens especially when running
inside containers when there is not enough hugepages. In this case, we
need to recover. A similar scheme can be found in eal_memory.c.
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime") Cc: stable@dpdk.org Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com>
When using rte_malloc() from a thread which is not bound to a numa
socket (the typical case is a control thread, but it can also happen
on a dataplane thread if its cpu affinity is on cores attached to
several sockets), the used heap is the one from numa socket 0, which
may not have available memory.
Fix this by selecting the first socket which has available memory.
Note: malloc_get_numa_socket() is only used from one .c file, so move
it there, and remove the inline keyword.
David Hunt [Wed, 3 Nov 2021 14:32:28 +0000 (14:32 +0000)]
eal: suggest using --lcores option
If the user requests to use an lcore above 128 using -l,
the eal will exit with "EAL: invalid core list syntax" and
very little else useful information.
This patch adds some extra information suggesting to use --lcores
so that physical cores above RTE_MAX_LCORE (default 128) can be
used. This is achieved by using the --lcores option by mapping
the logical cores in the application to physical cores.
For example, if "-l 12-16,130,132" is used, we see the following
additional output on the command line:
EAL: lcore 132 >= RTE_MAX_LCORE (128)
EAL: lcore 133 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@12,1@13,2@14,3@15,4@16,5@132,6@133
The same is added to -c option parsing.
For example, if "-c 0x300000000000000000000000000000000" is
used, we see the following additional output on the command line:
EAL: lcore 128 >= RTE_MAX_LCORE (128)
EAL: lcore 129 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@128,1@129
Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Sean Zhang [Fri, 29 Oct 2021 05:52:51 +0000 (08:52 +0300)]
app/flow-perf: add destination ports parameter
Add optional destination ports parameter for port-id action.
The parameter is not must, and the value is 1 by default as before
if the parameter not provided.
This command means the rules created on both representor 0 of PF 0
and PF 1, the destination port for the first representor is PF 0,
and the destination port for the other one is PF 1.
Signed-off-by: Sean Zhang <xiazhang@nvidia.com> Reviewed-by: Wisam Jaddo <wisamm@nvidia.com>
David Marchand [Fri, 22 Oct 2021 06:55:28 +0000 (08:55 +0200)]
eal: promote non-EAL lcore API as stable
This API has been around for more than a year (and is in LTS 20.11).
It did not receive negative feedback and will be used in a next OVS
release.
Mark it stable.
Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
rte_bpf_convert() implementation depends on libpcap.
Right now it is defined only when this library is installed and
RTE_PORT_PCAP is defined.
Fix that by providing for such case stub rte_bpf_convert()
implementation that will always return an error.
To draw user attention, if proper implementation is disabled,
warning will be thrown at meson configure stage.
Also move stub for another function (rte_bpf_elf_load) into
the same place (bpf_stub.c).
Fixes: 2eccf6afbea9 ("bpf: add function to convert classic BPF to DPDK BPF") Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Implement PIE based congestion management based on rfc8033.
The Proportional Integral Controller Enhanced (PIE) algorithm works
by proactively dropping packets randomly.
PIE is implemented as more advanced queue management is required to
address the bufferbloat problem and provide desirable quality of
service to users.
Tests for PIE code added to test application.
Added PIE related information to documentation.
David Marchand [Wed, 3 Nov 2021 11:16:15 +0000 (12:16 +0100)]
bus/pci: fix use after free on unplug
rte_pci_unmap_device() needs intr_handle objects to unregister
callbacks.
Bugzilla ID: 845 Fixes: d61138d4f0e2 ("drivers: remove direct access to interrupt handle") Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Yan Xia <yanx.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
David Marchand [Tue, 2 Nov 2021 07:52:58 +0000 (08:52 +0100)]
eal/linux: fix device hotplug
The device event interrupt handler was always freed.
Bugzilla ID: 845 Fixes: c2bd9367e18f ("lib: remove direct access to interrupt handle") Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Yan Xia <yanx.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
David Marchand [Tue, 2 Nov 2021 18:40:20 +0000 (19:40 +0100)]
eal/linux: fix uevent message parsing
Caught with ASan:
==9727==ERROR: AddressSanitizer: stack-buffer-overflow on address
0x7f0daa2fc0d0 at pc 0x7f0daeefacb2 bp 0x7f0daa2fadd0 sp 0x7f0daa2fa578
READ of size 1 at 0x7f0daa2fc0d0 thread T1
#0 0x7f0daeefacb1 (/lib64/libasan.so.5+0xbacb1)
#1 0x115eba1 in dev_uev_parse ../lib/eal/linux/eal_dev.c:167
#2 0x115f281 in dev_uev_handler ../lib/eal/linux/eal_dev.c:248
#3 0x1169b91 in eal_intr_process_interrupts
../lib/eal/linux/eal_interrupts.c:1026
#4 0x116a3a2 in eal_intr_handle_interrupts
../lib/eal/linux/eal_interrupts.c:1100
#5 0x116a7f0 in eal_intr_thread_main
../lib/eal/linux/eal_interrupts.c:1172
#6 0x112640a in ctrl_thread_init
../lib/eal/common/eal_common_thread.c:202
#7 0x7f0dade27159 in start_thread (/lib64/libpthread.so.0+0x8159)
#8 0x7f0dadb58f72 in clone (/lib64/libc.so.6+0xfcf72)
Address 0x7f0daa2fc0d0 is located in stack of thread T1 at offset 4192
in frame
#0 0x115f0c9 in dev_uev_handler ../lib/eal/linux/eal_dev.c:226
This frame has 2 object(s):
[32, 48) 'uevent'
[96, 4192) 'buf' <== Memory access at offset 4192 overflows this
variable
HINT: this may be a false positive if your program uses some custom
stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
Thread T1 created by T0 here:
#0 0x7f0daee92ea3 in __interceptor_pthread_create
(/lib64/libasan.so.5+0x52ea3)
#1 0x1126542 in rte_ctrl_thread_create
../lib/eal/common/eal_common_thread.c:228
#2 0x116a8b5 in rte_eal_intr_init
../lib/eal/linux/eal_interrupts.c:1200
#3 0x1159dd1 in rte_eal_init ../lib/eal/linux/eal.c:1044
#4 0x7a22f8 in main ../app/test-pmd/testpmd.c:4105
#5 0x7f0dada7f802 in __libc_start_main (/lib64/libc.so.6+0x23802)
Bugzilla ID: 792 Fixes: 0d0f478d0483 ("eal/linux: add uevent parse and process") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Yan Xia <yanx.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Jim Harris [Fri, 29 Oct 2021 17:14:10 +0000 (17:14 +0000)]
eal/linux: remove unused variable for socket memory
clang-13 rightfully complains that the total_mem variable in
eal_parse_socket_arg is set but not used, since the final
accumulated total_mem result isn't used anywhere.
So just remove the total_mem variable.
Fixes: 0a703f0f36c1 ("eal/linux: fix parsing zero socket memory and limits") Signed-off-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>
Olivier Matz [Fri, 29 Oct 2021 12:15:44 +0000 (14:15 +0200)]
test/mbuf: fix access to freed memory
Seen by ASan.
In the external buffer mbuf test, we check that the buffer is freed
by checking that its refcount is 0.
This is not a valid condition, because it accesses to an already
freed area.
Fix this by setting a boolean flag in the callback when rte_free()
is actually called, and check this flag instead.
Bugzilla ID: 867 Fixes: 7b295dceea07 ("test/mbuf: add unit test cases") Cc: stable@dpdk.org Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: David Marchand <david.marchand@redhat.com>
test/thash: add performance tests for Toeplitz hash
This patch adds performance tests for the following Toeplitz hash
function implementations:
Scalar:
- rte_softrss()
- rte_softrss_be()
Vector using gfni:
- rte_thash_gfni()
- rte_thash_gfni_bulk()
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
config/ppc: fix native build with GCC 4.8.5 on RHEL 7
The POWER meson.build file incorrectly checks if the detected CPU is
"greater than" POWER8 when it should actually test for "greater than or
equal to" POWER8. Fixed the comparison operator.
Bugzilla ID: 875 Fixes: 750196880843 ("config/ppc: select instruction set for IBM Power10") Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Ady Agbarih [Fri, 22 Oct 2021 15:45:55 +0000 (15:45 +0000)]
regex/mlx5: move RXP to CrSpace
Add patch for programming the regex database through ROF file,
using the firmware instead of manually through the software.
No need to setup the DB anymore, the regex-daemon is responsible
for that always.
In the new flow the regex driver only has to program ROF rules
by using set params DevX cmd, requires ROF mkey creation.
The rules file has to be read into 4KB aligned memory.
Signed-off-by: Ady Agbarih <adypodoman@gmail.com> Acked-by: Ori Kam <orika@nvidia.com>
Ady Agbarih [Fri, 22 Oct 2021 15:45:53 +0000 (15:45 +0000)]
common/mlx5: update regex DevX commands
This patch modifies the SET_REGEXP_PARAMS DevX command as follows:
Remove DB setup DevX command. The command is no longer needed
in DPDK, it will always be invoked by the regex-daemon.
Add new DevX command, for programming ROF rules for a specific engine.
The command takes as an input an mkey of the ROF.
It also introduces a new field_select bit.
Signed-off-by: Ady Agbarih <adypodoman@gmail.com> Acked-by: Ori Kam <orika@nvidia.com>
Ori Kam [Fri, 22 Oct 2021 15:45:52 +0000 (15:45 +0000)]
regex/mlx5: add cleanup on stop
When stopping the device we should release all
data allocated.
After rte_regexdev_configure(), the QPs are pre-allocated,
and will be configured only in rte_regexdev_queue_pair_setup().
That's why the QP jobs array initialization is checked
before attempting to destroy the QP.
Signed-off-by: Ori Kam <orika@nvidia.com> Signed-off-by: Ady Agbarih <adypodoman@gmail.com>
Ady Agbarih [Fri, 22 Oct 2021 15:45:51 +0000 (15:45 +0000)]
common/mlx5: update PRM definitions for regex
Update PRM hca capabilities definitions as follows:
regexp_version field added - specifies whether BF2 or BF3
regexp field removed
regexp_params field moved
regexp_log_crspace_size field removed
regexp_mmo added - specifies if using regex mmo wqe is supported
Allow regex only if both regexp_params and regexp_mmo are set,
instead of checking regexp_mmo only.
Check version through the new capability field regexp_version instead
of reading crspace register.
Signed-off-by: Ady Agbarih <adypodoman@gmail.com> Acked-by: Ori Kam <orika@nvidia.com>
Dmitry Kozlyuk [Tue, 2 Nov 2021 10:08:17 +0000 (12:08 +0200)]
test/mempool: fix no-huge mode
Amount of locked memory for regular users is limited,
it is usually 64 KB by default.
Hitting this limit in rte_mempool_populate_anon()
resulted in not populating the mempool, and a test case failure:
EAL: Test assert test_mempool_events line 585 failed: Failed to populate mempool empty1: Success
test failed at test_mempool():1019
Test Failed
Decrease the amount of mapped anonymous memory to fit the limit.
While there, make all function-local constants lowercase.
Dmitry Kozlyuk [Tue, 2 Nov 2021 10:08:16 +0000 (12:08 +0200)]
test/mempool: fix test on FreeBSD
FreeBSD EAL does not implement rte_mem_virt2iova() causing an error:
EAL: Test assert test_mempool_flag_non_io_unset_when_populated_with_valid_iova
line 781 failed: Cannot get IOVA
test failed at test_mempool():1030
Test Failed
Change unit test to use rte_memzone_reserve() to allocate memory,
which allows to obtain IOVA directly.
Dmitry Kozlyuk [Tue, 2 Nov 2021 10:08:15 +0000 (12:08 +0200)]
eal/freebsd: fix IOVA mode selection
FreeBSD EAL selected IOVA mode PA even in --no-huge mode
where PA are not available. Memory zones were created with IOVA
equal to RTE_BAD_IOVA with no indication this field is not usable.
Change IOVA mode detection:
1. Always allow to force --iova-mode=va.
2. In --no-huge mode, disallow forcing --iova-mode=pa, and select VA.
3. Otherwise select IOVA mode according to bus requests, default to PA.
In case contigmem is inaccessible, memory initialization will fail
with a message indicating the cause.
Fixes: c2361bab70c5 ("eal: compute IOVA mode based on PA availability") Cc: stable@dpdk.org Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Feifei Wang [Mon, 1 Nov 2021 06:00:06 +0000 (14:00 +0800)]
bpf: use wait until scheme for Rx/Tx iteration
Instead of polling for cbi->use to be updated, use wait until scheme.
Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Feifei Wang [Mon, 1 Nov 2021 06:00:03 +0000 (14:00 +0800)]
eal: add a new helper for wait until scheme
Add a new generic helper which is a macro for wait until scheme.
Furthermore, to prevent compilation warning in arm:
----------------------------------------------
'warning: implicit declaration of function ...'
----------------------------------------------
Delete 'undef' constructions for '__LOAD_EXC_xx', '__SEVL' and '__WFE'.
And add ‘__RTE_ARM’ for these macros to fix the namespace.
This is because original macros are undefine at the end of the file.
If the new macro calls them in other files, they will be seen as
'not defined'.
Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Meson 0.60 switched the format of uninstalled static libraries
to thin archives, that is, they contain only paths to object files,
not the files themselves. Files cannot be extracted in this case,
resulting in build errors:
ar: `x' cannot be used on thin archives.
Handle thin archives when invoking pmdinfogen by directly using the
files referenced in the archive, when they already exist, and extracting
them if not.
Bugzilla ID: 836 Fixes: e6e9730c7066 ("buildtools: support object file extraction for Windows") Cc: stable@dpdk.org Reported-by: Michal Berger <michallinuxstuff@gmail.com> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
rte_pdump_init() always allocates new memzone for pdump_stats.
Though rte_pdump_uninit() never frees it.
So the following combination will always fail:
rte_pdump_init(); rte_pdump_uninit(); rte_pdump_init();
The issue was caught by pdump_autotest UT.
While first test run successful, any consecutive runs
of this test-case will fail.
Fix the issue by calling rte_memzone_free() for statistics memzone.
Fixes: 10f726efe26c ("pdump: support pcapng and filtering") Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Reshma Pattan <reshma.pattan@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Zhihong Peng [Wed, 20 Oct 2021 07:46:41 +0000 (15:46 +0800)]
mem: instrument allocator for ASan
This patch adds necessary hooks in the memory allocator for ASan.
This feature is currently available in DPDK only on Linux x86_64.
If other OS/architectures want to support it, ASAN_SHADOW_OFFSET must be
defined and RTE_MALLOC_ASAN must be set accordingly in meson.
Zhihong Peng [Wed, 20 Oct 2021 07:46:40 +0000 (15:46 +0800)]
build: enable AddressSanitizer
AddressSanitizer [1] a.k.a. ASan is a widely-used debugging tool to
detect memory access errors.
It helps to detect issues like use-after-free, various kinds of buffer
overruns in C/C++ programs, and other similar errors, as well as
printing out detailed debug information whenever an error is detected.
ASan is integrated with gcc and clang and can be enabled via a meson
option: -Db_sanitize=address
See the documentation for details (especially regarding clang).
Enabling ASan has an impact on performance since additional checks are
added to generated binaries.
Enabling ASan with Windows is currently not supported in DPDK.
David Marchand [Fri, 29 Oct 2021 07:38:19 +0000 (09:38 +0200)]
bus/pci: resize interrupt event list only for MSIX
Resizing event list only makes sense in MSIX case.
Besides, event list has always been RTE_MAX_RXTX_INTR_VEC_ID large.
Let's restore this assumption for code that might rely on this property
and only enlarge the event list when necessary.
Anatoly Burakov [Wed, 27 Oct 2021 15:37:14 +0000 (15:37 +0000)]
doc: clarify SRIOV activation with built-in VFIO
Currently, the documentation only contains instructions for enabling
SRIOV support for VFIO compiled as a module, but doesn't have any
instructions on how to do the same for cases where VFIO is built-in.
Add these instructions.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Anatoly Burakov [Tue, 26 Oct 2021 13:26:44 +0000 (13:26 +0000)]
vfio: fix partial unmap
Partial unmap support was introduced in commit c13ca4e81cac
("vfio: fix DMA mapping granularity for IOVA as VA"), and with it
was added a check that dereferenced the IOMMU type to determine whether
partial ummapping is supported for currently configured IOMMU type. In
certain circumstances (such as when VFIO is supported, but no devices
were bound to the VFIO driver), the IOMMU type pointer can be NULL.
However, dereferencing of IOMMU type was guarded by access to the user
maps list - that is, we were always checking the user map list first,
and then, if we found a memory region that encloses the one we're trying
to unmap, we would have performed the IOMMU type check.
This ensured that the IOMMU type check will not cause any NULL pointer
dereferences, because in order for an IOMMU type check to have been
performed, there necessarily must have been at least one memory region
that was previously mapped successfully, and that implies having a
defined IOMMU type.
When commit 56259f7fc010 ("vfio: allow partially unmapping adjacent
memory") was introduced, the IOMMU type check was moved to
before we were traversing the user mem maps list, thereby introducing a
potential NULL dereference, because the IOMMU type access was no longer
guarded by the user mem maps list traversal.
Fix the issue by moving the IOMMU type check to after the user mem maps
traversal, thereby ensuring that by the time the check happens, the
IOMMU type is always valid.
Fixes: 56259f7fc010 ("vfio: allow partially unmapping adjacent memory") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Xuan Ding <xuan.ding@intel.com>
Kevin Laatz [Tue, 26 Oct 2021 14:20:45 +0000 (14:20 +0000)]
dma/idxd: fix truncated error code in status check
When checking if the DMA device is active, the result of the operand will
always be zero since the err_code is truncated to 8 bits which makes
checking the 31st bit impossible.
This is fixed by changing the type of err_code to uint32_t so that it is
not truncated.
Coverity issue: 373657 Fixes: 9449330a8458 ("dma/idxd: create dmadev instances on PCI probe") Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Kevin Laatz [Tue, 26 Oct 2021 13:14:32 +0000 (13:14 +0000)]
examples/dma: rename ioat application example
Since the APIs have been updated from rawdev to dmadev, the application
should also be renamed to match. This patch also includes the documentation
updates for the renaming.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com>