David Marchand [Thu, 25 Jul 2019 19:24:17 +0000 (21:24 +0200)]
net/pcap: fix Rx with small buffers
If the pkt pool contains only buffers smaller than the default headroom,
then the driver will compute an invalid buffer size (negative value cast
to an uint16_t).
Rely on the mbuf api to check how much space is available in the mbuf.
We were trying to fill in more rx extended stats than the size allocated
for stats causing segfault. Fixed this by adding an explicit check.
Rearranged the code to return statistic values in xstats_get as per the
names returned in xstats_get_names.
This commit enables the creation of a dedicated completion
ring for asynchronous event handling instead of handling these
events on a receive completion ring.
For the stingray platform and other platforms needing tighter
control of resource utilization, we retain the ability to
process async events on a receive completion ring.
For Thor-based adapters, we use a dedicated NQ (notification
queue) ring for async events (async events can't currently
be received on a completion ring due to a firmware limitation).
Rename "def_cp_ring" to "async_cp_ring" to better reflect its
purpose (async event notifications) and to avoid confusion with
VNIC default receive completion rings.
Allow rxq 0 to be stopped when not being used for async events.
Xiaolong Ye [Wed, 24 Jul 2019 11:30:41 +0000 (19:30 +0800)]
net/i40e: replace license text with SPDX tag
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
net/mlx5: fix doorbell release on Rx queue release
Function mlx5_rxq_release() calls mlx5_release_dbr() to release the
doorbell allocated for this Rx queue.
This call is relevant only for Rx queue objects created using
DevX API.
This patch adds the required check, to call mlx5_release_dbr()
only when relevant.
It also updates mlx5_release_dbr() to use the input offset correctly.
This patch fixes X722 VF problem when received packet don't have
HASH value.
1) Packet classifier types update should support X722 VF, not only
for X722 PF;
2) MAC type is invalid for X722 VF when set packet classifier type,
so move it after MAC type is set correctly;
Tao Zhu [Wed, 24 Jul 2019 08:32:54 +0000 (16:32 +0800)]
net/i40e: fix request queue in VF
When the VF configuration is larger than the number of queues reserved
by PF, VF sends the request queue command through admin queue. When PF
received this command, it may reset the VF and send a notification
before resetting. If this notification is read by the timed task alarm,
Task request queue will lost notification. This patch prevents two
tasks from running simultaneously.
Fixes: ee653bd80044 ("net/i40e: determine number of queues per VF at run time") Cc: stable@dpdk.org Signed-off-by: Tao Zhu <taox.zhu@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
There was an issue with ice_and_bitmap and ice_or_bitmap when
dealing with bit array sizes that are not even multiples of 32,
where some of relevant bits in the highest 32 bits were being
cleared. This patch fixes those problems.
Fixes: c9e37832c95f ("net/ice/base: rework on bit ops") Cc: stable@dpdk.org Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Qiming Yang <qiming.yang@intel.com>
Cleanup hardware registers macros in ice_auto_generator.h.
Fixes: 51c7f09f3f81 ("net/ice/base: add registers for Intel E800 Series NIC") Cc: stable@dpdk.org Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Qiming Yang <qiming.yang@intel.com>
use __func__ instead of function name in ice_debug calls.
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Qiming Yang <qiming.yang@intel.com>
We don't free s_rule if ice_aq_sw_rules() returns a non-zero status. If
it returned a zero status, s_rule would be freed right after, so this
implies it should be freed within the scope of the function regardless.
Fixes: c7dd15931183 ("net/ice/base: add virtual switch code") Cc: stable@dpdk.org Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Qiming Yang <qiming.yang@intel.com>
net/ice/base: fix inner TCP and UDP support for GRE
The dummy packets for GRE were set up for IP, but not inner
TCP or UDP. There are some applications that want to be
able to parse on those inner L4 headers so add them to
the dummy packets.
Also, the GRE dummy packet was formatted differently from
the other dummy packets so change the formatting to match
all the other dummy packets.
Fixes: 839c0a4b77e6 ("net/ice/base: enable additional switch rules") Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Qiming Yang <qiming.yang@intel.com>
Unit hang may occur if multiple descriptors are available in the rings
during reset or close. This state can be detected by configure status
by bit 8 in register. If the bit is set and there are pending
descriptors in one of the rings, we must flush them before reset or
close.
Fixes: 805803445a02 ("e1000: support EM devices (also known as e1000/e1000e)") Cc: stable@dpdk.org Signed-off-by: Xiao Zhang <xiao.zhang@intel.com> Reviewed-by: Xiaolong Ye <xiaolong.ye@intel.com>
Improve logic:
* to get list of valid devices based on driver id so that to
eliminate unnecessary if check for driver id match in device loop
* loop till 1st device supporting asymmetric feature is found unlike
previous logic which breaks on 1st device
Adam Dybkowski [Wed, 24 Jul 2019 08:42:46 +0000 (10:42 +0200)]
compress/zlib: fix error handling
Add missing return after setting the error status in case of
invalid flush_flag in the operation.
The issue was found by the coverity scan as the fin_flush variable,
not initialized in such case, was used later in the flow.
Coverity issue: 340859 Fixes: c7b436ec95fd ("compress/zlib: support burst enqueue/dequeue") Cc: stable@dpdk.org Signed-off-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Adam Dybkowski [Tue, 23 Jul 2019 09:53:28 +0000 (11:53 +0200)]
app/compress-perf: prevent output buffer overflow
This patch fixes the issue of memory overwrite after the end of
the output buffer by calculating its size as the number of all
segments multipled by the output segment size.
Additionally buffer overflow errors returned by PMD driver are
detected and shown, ending the test.
Also the output buffer size multiplier was increased from 105%
to 110% to allow running the tests on noncompressible files that
expand to over 107% of original size during the compression.
The changes were made in the verification part of the flow and
they don't affect the benchmark results.
Fixes: 424dd6c8c1 ("app/compress-perf: add weak functions for multicore test") Signed-off-by: Adam Dybkowski <adamx.dybkowski@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Artur Trybula [Wed, 24 Jul 2019 13:55:15 +0000 (15:55 +0200)]
app/compress-perf: improve results report
This patch adds extra features to the compress performance
test. Some important parameters (memory allocation,
number of ops, number of segments) are calculated and
printed out.
Information about threads, cores, devices and queue-pairs
is also printed.
Signed-off-by: Artur Trybula <arturx.trybula@intel.com> Signed-off-by: Adam Dybkowski <adamx.dybkowski@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Bruce Richardson [Tue, 23 Jul 2019 16:26:48 +0000 (17:26 +0100)]
raw/ioat: fix include quotes
Some builds with clang report an error because '<>' rather than '""' were
used for including the ioat spec header file.
Target: x86_64-native-bsdapp-clang
error: 'rte_ioat_spec.h' file not found with <angled> include; use "quotes" instead
#include <rte_ioat_spec.h>
^~~~~~~~~~~~~~~~~
"rte_ioat_spec.h"
1 error generated.
Since this file should always be in the same directory as the main header,
we can safely change the include line to fix this error.
Fixes: abff4333ec20 ("raw/ioat: create device on probe and destroy on release") Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
When using IOVA as VA mode, there is no need to map segments
page by page. This normally isn't a problem, but it becomes one
when attempting to use DPDK in no-huge mode, where VFIO subsystem
simply runs out of space to store mappings.
Fix this for x86 by triggering different callbacks based on whether
IOVA as VA mode is enabled.
Fixes: 73a639085938 ("vfio: allow to map other memory regions") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Andrius Sirvys <andrius.sirvys@intel.com>
Andrew Rybchenko [Tue, 23 Jul 2019 12:11:21 +0000 (13:11 +0100)]
ethdev: avoid getting uninitialized info for bad port
rte_eth_dev_info_get() returns void and caller does know if the function
does its job or not. Changing of the return value to int would be
API/ABI breakage which requires deprecation process and cannot be
backported to stable branches. For now, make sure that device info is
initialized even in the case of invalid port ID.
Fixes: a30268e9a2d0 ("ethdev: reset whole dev info structure before filling") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
NVGRE has a GRE header with c_rsvd0_ver value 0x2000 and protocol
value 0x6558.
These should be matched when item_nvgre is provided.
This patch adds validation function of NVGRE item.
It also updates the translate function of NVGRE item, to add the
required values, if they were not specified.
Original work by Xiaoyu Min <jackmin@mellanox.com>
Fixes: fc2c498ccb94 ("net/mlx5: add Direct Verbs translate items") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Xiaoyu Min <jackmin@mellanox.com>
LRO message is contained in the MPRQ strides.
While the LRO message size cannot be bigger than 65280 according to the
PRM, the strides which contain it may be bigger than the maximum buffer
size allowed in dpdk mbuf - 0xFFFF.
Adjust the maximum LRO message size to avoid buffer length overflow.
As an arrangement to the LRO support when a packet can consume all the
stride memory, the external mbuf shared information cannot be anymore
in the end of the stride, because the HW may write the packet data to
all the stride memory.
Move the shared information memory from the stride to the control
memory of the external mbuf.
Function mlx5_rxq_obj_new(), previously called mlx5_rxq_ibv_new(),
supports creating Rx queue objects using verbs.
This patch expands the relevant functions, to support creating
verbs or DevX Rx queue objects:
Function mlx5_rxq_obj_new() updated to create RQ object using DevX.
Function mlx5_ind_table_obj_new() updated to create RQT object using DevX.
Function mlx5_hrxq_new() updated to create TIR object using DevX.
New utility functions added to perform specific operations:
mlx5_devx_rq_new(), mlx5_devx_wq_attr_fill(),
mlx5_devx_create_rq_attr_fill().
net/mlx5: store protection domain number on create
Function mlx5_alloc_shared_ibctx() allocates Protection Domain using
verbs API, as part of shared IB device context.
This patch adds reading and storing of pdn value from the created PD
object, using DV API.
The pdn value is required when creating WQ using DevX API.
This patch also updates function flow_dv_create_counter_stat_mem_mng()
which uses the pdn value as well.
Prepare for introducing use of DevX TIR object.
Hash Rx queue is currently created using verbs QP only.
The next patches will add the option to create it with a TIR object
using DevX.
This patch renames hrxq_ibv to hrxq wherever relevant, and adds
the DevX items to relevant structs.
Prepare for introducing of DevX RQT object.
Rx indirection table object is currently created using verbs only.
The next patches will add the option to create an RQT object using
DevX.
This patch renames ind_table_ibv to ind_table_obj wherever relevant,
and adds the DevX items to relevant structs.
Prepare for introducing of DevX RxQ object.
RxQ object is currently created using verbs only.
The next patches will add the option to create RxQ object using DevX.
This patch renames rxq_ibv to rxq_obj wherever relevant, and adds the
DevX items to relevant structs.
When using DevX API, memory for door-bell records should be allocated
by PMD and registered using DevX API.
This patch implements the utility functions to support it:
- Add struct mlx5_devx_dbr_page, containing door-bells page data.
- Add list of struct mlx5_devx_dbr_page door-bell pages to device
private data.
- Implement function mlx5_alloc_dbr_page() to allocate page for
door-bell records, and register it using DevX API.
- Implement function mlx5_get_dbr(). to acquire a door-bell record
from the door-bells page, allocating a new page if needed.
- Implement function mlx5_release_dbr() to release a door-bell
record that is no longer needed, freeing the containing page if
it becomes empty.
Update function mlx5_txq_ibv_new(), query and store the TIS
transport domain value.
It is required later on Rx side when creating matching TIR.
Add field in mlx5 data structure to store Transport Domain ID.
Use DevX API to read device LRO capabilities.
Check if LRO is supported and can be enabled.
Check if MPRQ is supported and can be used.
Enable MPRQ for LRO use if not enabled by user.
Added note for mlx5_mprq_enabled(), to emphasize that LRO
enables MPRQ.
Disable CQE compression and CRC stripping if LRO is enabled.
Add compile option HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR, and matching
dest_tir flag in device configuration structure.
Add glue function pointer dv_create_flow_action_dest_devx_tir, and
function mlx5_glue_dv_create_flow_action_dest_devx_tir(),
to invoke API mlx5dv_dr_action_create_dest_devx_tir();
Add command-line argument to set LRO session timeout.
Add LRO settings struct in PMD configuration struct.
Add support of LRO offload in port configuration.
Add macros and function to check if LRO is supported and enabled.
MAC address parsing was causing failure [1],
this patch partially reverts the commit
commit b5ddce8959b2 ("app/testpmd: use new ethernet address parser")
[1]
testpmd> flow validate 0 priority 2 ingress group 0 pattern eth dst
is 98:03:9B:5C:D9:00 / end actions queue index 0 / end
Bad arguments
Fixes: b5ddce8959b2 ("app/testpmd: use new ethernet address parser") Reported-by: Raslan Darawsheh <rasland@mellanox.com> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: Raslan Darawsheh <rasland@mellanox.com>
Xiaolong Ye [Mon, 22 Jul 2019 12:06:37 +0000 (20:06 +0800)]
net/i40e: fix ethernet flow rule
i40e FDIR doesn't allow to create flow with empty spec and mask for
ethertype pattern. Without this patch, below flow would be created
successfully which is unexpected.
> flow create 0 ingress pattern eth / end actions drop / end
Fixes: 7d83c152a207 ("net/i40e: parse flow director filter") Cc: stable@dpdk.org Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com> Acked-by: Beilei Xing <beilei.xing@intel.com>
Krzysztof Kanas [Mon, 22 Jul 2019 14:58:51 +0000 (16:58 +0200)]
net/octeontx2: fix driver reconfiguration
When configure returns error, e.g. in case not supported offloads
(outer ip and sctp) driver released Rx,Tx queues. Then in case of
correct configuration the driver could not start due to queues already
released but the driver thought it was configured correctly.
Secondly if driver returns error from configuration librte_ethdev will
release, rx queues and tx queues, without chaining driver configured
state.
Fix that by 'releasing' configuration and changing driver state when
error is returned from otx2_nix_configure.
Fixes: 548b5839a32b ("net/octeontx2: add device configure operation") Signed-off-by: Krzysztof Kanas <kkanas@marvell.com> Reviewed-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
This patch resets frc and ctrl in sg tx fd to avoid corruption.
Fixes: 774e9ea91992 ("net/dpaa2: add support for multi seg buffers") Cc: stable@dpdk.org Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
The associated device index is retrieved via Netlink request to
underlying Infiniband device driver. This network device index
is permanent throughout the lifetime of device. We do not
spawn the rte_eth_dev ports without associated network device, and
if network device is being unbound we get the remove notification
message and rte_eth_dev port is also detached. So, we may store
the ifindex in mlx5_device_spawn() routine at rte_eth_dev port
creation and initialization time and use the cached value further
instead of doing actual Netlink request.
This patch fills the tx_desc_lim.nb_seg_max and
tx_desc_lim.nb_mtu_seg_max fields of rte_eth_dev_info
structure to report thee maximal number of packet
segments, requested inline data configuration is
taken into account in conservative way.
This patch adds the implementation of tx_burst routine template.
The template supports all Tx offloads and multiple optimized
tx_burst routines can be generated by compiler from this one.
Mellanox NICs support the wide set of Tx offloads. The supported
offloads are reported by the mlx5 PMD in rte_eth_dev_info
tx_offload_capa field.
An application may choose any combination of supported offloads
and configure the device appropriately. Some of Tx offloads may be
not requested by application, or ever all of them may be omitted.
Most of the Tx offloads require some code branches in tx_burst routine
to support ones. If Tx offload is not requested the tx_burst routine
code may be significantly simplified and consume less CPU cycles.
For example, if application does not engage TSO offload this code
can be omitted, if multi-segment packet is not supposed the tx_burst
may assume single mbuf packets only, etc.
Currently, the mlx5 PMD implements multiple tx_burst subroutines
for most common combinations of requested Tx offloads, each branch
has its own dedicated implementation. It is not very easy to update,
support and develop such kind of code - multiple branches impose
the multiple points to process. Also many of frequently requested
offload combinations are not supported yet. That leads to selecting of
not completely matching tx_burst routine and harms the performance.
This patch introduces the new approach for tx_burst code. It is proposed
to develop the unified template for tx_burst routine, which supports
all the Tx offloads and takes the compile time defined parameter
describing the supposed set of supported offloads. On the base
of this template, the compiler is able to generate multiple tx_burst
routines highly optimized for the statically specified set of
Tx offloads.
Next, in runtime, at Tx queue configuration the best matching optimized
implementation of tx_burst is chosen.
This patch intentionally omits the template internal implementation,
but just introduces the template itself to emboss the approach of
the multiple specially tuned tx_burst routines.
This patch extends the NIC attributes query via DevX.
The appropriate interface structures are borrowed from
kernel driver headers and DevX calls are added to
mlx5_devx_cmd_query_hca_attr() routine.
This patch updates Tx datapath definitions, mostly hardware related.
The Tx descriptor structures are redefined with required fields,
size definitions are renamed to reflect the meanings in more
appropriate way. This is a preparation step before introducing
the new Tx datapath implementation.
This patch introduces new mlx5 PMD devarg options:
- txq_inline_min - specifies minimal amount of data to be inlined into
WQE during Tx operations. NICs may require this minimal data amount
to operate correctly. The exact value may depend on NIC operation
mode, requested offloads, etc.
- txq_inline_max - specifies the maximal packet length to be completely
inlined into WQE Ethernet Segment for ordinary SEND method. If packet
is larger the specified value, the packet data won't be copied by the
driver at all, data buffer is addressed with a pointer. If packet
length is less or equal all packet data will be copied into WQE.
- txq_inline_mpw - specifies the maximal packet length to be completely
inlined into WQE for Enhanced MPW method.
This patch removes the existing Tx datapath code
as preparation step before introducing the new
implementation. The following entities are being
removed:
The following devargs are deprecated and ignored:
- "txq_inline" is going to be converted to "txq_inline_max"
for compatibility issue
- "tx_vec_en"
- "txqs_max_vec"
- "txq_mpw_hdr_dseg_en"
- "txq_max_inline_len" is going to be converted
to "txq_inline_mpw" for compatibility issue
The deprecated devarg keys are recognized by PMD
and ignored/converted to the new ones in order not
to block device probing.
In functions flow_dv_translate() and flow_dv_validate(), the flow
items are scanned and each item is marked in item_flags bitmap.
The code handling some of the items was ported from another project,
where items are marked in a slightly different manner.
This patch fixes the setting of items in bitmap, adapting it to the
required manner.
Fixes: d53aa89aea91 ("net/mlx5: support matching on ICMP/ICMP6") Fixes: 5865955ad994 ("net/mlx5: match GRE key and present bits") Fixes: 2e4c987aad91 ("net/mlx5: validate Direct Rule E-Switch") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Xiaoyu Min <jackmin@mellanox.com>
net/e1000: fix buffer overrun while i219 processing DMA
IntelĀ® 100/200 Series Chipset platforms reduced the round-trip
latency for the LAN Controller DMA accesses, causing in some high
performance cases a buffer overrun while the I219 LAN Connected
Device is processing the DMA transactions. I219LM and I219V devices
can fall into unrecovered Tx hang under very stressfully UDP traffic
and multiple reconnection of Ethernet cable. This Tx hang of the LAN
Controller is only recovered if the system is rebooted. Slightly slow
down DMA access by reducing the number of outstanding requests.
This workaround could have an impact on TCP traffic performance
on the platform. Disabling TSO eliminates performance loss for TCP
traffic without a noticeable impact on CPU performance.
Please, refer to I218/I219 specification update:
https://www.intel.com/content/www/us/en/embedded/products/networking/
ethernet-connection-i218-family-documentation.html
net/bnxt: disable vector mode Tx with VLAN offload
The vector mode transmit path does not currently support VLAN tag
insertion, so we need to disable vector transmit when transmit
VLAN insertion offload is enabled.