To provide the packet send schedule on mbuf timestamp the Tx
queue must be attached to the same UAR as Clock Queue is.
UAR is special hardware related resource mapped to the host
memory and provides doorbell registers, the assigning UAR
to the queue being created is provided via DevX API only.
The dedicated Rearm Queue is needed to fire the work requests to
the Clock Queue in realtime. The Clock Queue should never stop,
otherwise the clock synchronization might be broken and packet
send scheduling would fail. The Rearm Queue uses cross channel
SEND_EN/WAIT operations to provides the requests to the
Clock Queue in robust way.
This is preparation step before moving the Tx queue creation
to the DevX approach. Some features require the shared UAR
for Tx queues and scheduling completion queues, the patch
manages the shared UAR.
net/mlx5: fix UAR lock sharing for multiport devices
The master and representors might be created over the multiport
Infiniband devices and the UAR resource allocated for sibling
ports might belong to the same underlying Infiniband device.
Hardware requires the write access to the UAR must be performed
as atomic 64-bit write, on 32-bit systems this is two sequential
writes, protected by lock. Due to possibility to share the same
UAR between sibling devices the locks must be moved to shared
context.
Fixes: f048f3d479a6 ("net/mlx5: switch to the shared IB device context") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
tx_pp - enables accurate packet send scheduling on mbuf timestamps
in the PMD. On the device start if "rte_dynflag_timestamp"
dynamic flag is registered and this devarg non-zero value is
specified, the driver initializes all necessary internal
infrastructure to provide packet scheduling. The parameter
value specifies scheduling granularity in nanoseconds.
tx_skew - the parameter adjusts the send packet scheduling on
timestamps and represents the average delay between beginning
of the transmitting descriptor processing by the hardware and
appearance of actual packet data on the wire. The value should
be provided in nanoseconds and is valid only if tx_pp parameter
is specified. The default value is zero.
Thomas Monjalon [Mon, 13 Jul 2020 15:37:10 +0000 (17:37 +0200)]
common/mlx5: fix link with ibverbs glue dlopen option
In case the ibverbs glue is a separate library to dlopen,
the PMD library must allocate a glue structure to be filled by dlopen.
The glue management was in mlx5_common.c and moved to mlx5_common_os.c,
but the variable allocation was not removed from the original file.
The consequence was a link failure, if ibverbs dlopen option is enabled,
because of the redefinition of the variable (with GCC 10):
multiple definition of 'mlx5_glue'
The original definition is removed to keep only the one moved
in the Linux sub-directory.
Fixes: 79aa430721b1 ("common/mlx5: split common file under Linux directory") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Matan Azrad <matan@mellanox.com>
Jeff Guo [Thu, 16 Jul 2020 09:24:38 +0000 (17:24 +0800)]
net/e1000: fix crash on Tx done clean up
As tx mbuf is not set for some advanced descriptors, if there is no
mbuf checking before rte_pktmbuf_free_seg() function be called on
the process of tx done clean up, that will cause a segfault. So add
a NULL pointer check to fix it.
Jeff Guo [Mon, 20 Jul 2020 04:00:20 +0000 (12:00 +0800)]
net/iavf: fix GTPU L4 hash
When the configure pattern involve GTPU inner l3 and l4, even the
configure input set only l3 but not l4, the different l4 protocol
header should also be configured for the different l4 protocol.
Fixes: 215a247b5f33 ("net/iavf: refactor hash flow") Fixes: 642f20195015 ("net/iavf: support RSS for IPv4 IPv6 mix of GTP") Signed-off-by: Jeff Guo <jia.guo@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Jeff Guo [Tue, 21 Jul 2020 02:32:46 +0000 (10:32 +0800)]
net/ice: fix GTPU L4 hash
When the configure pattern involve GTPU inner l3 and l4, even the
configure input set only l3 but not l4, the different l4 protocol
header should also be configured for the different l4 protocol.
Fixes: 0b952714e9c1 ("net/ice: refactor PF hash flow") Fixes: de32fa2ba27b ("net/ice: support RSS for IPv6 prefix") Signed-off-by: Jeff Guo <jia.guo@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
The rte_eth_dev_set_vlan_offload function will check vlan rx offload
capability, the i350/i210/i211 nics have vlan extend feature but
DEV_RX_OFFLOAD_VLAN_EXTEND is not set into the capability, that will
cause setting fail. So need to add this capability in
igb_get_rx_port_offloads_capa function.
Fixes: ef990fb56e55 ("net/e1000: convert to new Rx offloads API") Cc: stable@dpdk.org Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com> Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
The rte_eth_dev_set_vlan_offload function will check vlan rx offload
capability, the i40e vf has vlan filter feature but
DEV_RX_OFFLOAD_VLAN_FILTER is not set into the capability, that will
cause setting fail. So need to add this capability in
i40e_vf_representor_dev_infos_get function.
Fixes: e0cb96204b71 ("net/i40e: add support for representor ports") Cc: stable@dpdk.org Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com> Acked-by: Jeff Guo <jia.guo@intel.com>
Andrew Rybchenko [Mon, 13 Jul 2020 14:22:34 +0000 (15:22 +0100)]
net: check fragmented headers in non-debug as well
Pseudo-header checksum calculation requires contiguous headers.
There is no any formal requirements on data location and mbuf
structure which could be used by the application.
Since
commit dfc6b2fd8da3 ("mbuf: remove Intel offload checks from generic API")
fragmented headers checks are done inside
rte_net_intel_cksum_flags_prepare() in RTE_LIBRTE_ETHDEV_DEBUG build
because it is moved from rte_validate_tx_offload() which is called
under debug only.
Make corresponding check to be done in non-debug build as well
to avoid bad accesses, incorrect checksum calculation and to
return appropriate error from Tx prepare.
Make no-offloads check more precise and do it in non-debug build
as well to avoid contiguous headers check and Tx prepare failure
if it is not actually required.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Weifeng Li [Sat, 18 Jul 2020 04:35:38 +0000 (12:35 +0800)]
net/bonding: change state machine to defaulted
A dpdk bonding 802.3ad network as follows:
+----------+ +-----------+
|dpdk lacp |bond1.1 <------> bond2.1|switch lacp|
| |bond1.2 <------> bond2.2| |
+----------+ +-----------+
If a fiber optic go wrong about single pass during normal running like
this:
bond1.2 -----> bond2.2 ok
bond1.2 <--x-- bond2.2 error: bond1.2 receive no LACPDU Some packets
from switch to dpdk will choose bond2.2
and lost.
DPDK lacp state machine will transits to the expired state if no LACPDU
is received before the current_while_timer expires. But if no LACPDU is
received before the current_while_timer expires again, DPDK lacp state
machine has no change. Bond2.2 can not change to inactive depend on the
received LACPDU.
According to IEEE 802.3ad, if no lacpdu is received before the
current_while_timer expires again, the state machine should transits
from expired to defaulted. Bond2.2 will change to inactive depend on the
LACPDU with defaulted state.
This patch adds a state machine change from expired to defaulted when no
lacpdu is received before the current_while_timer expires again
according to IEEE 802.3ad:
If no LACPDU is received before the current_while timer expires again,
the state machine transits to the DEFAULTED state. The record Default
function overwrites the current operational parameters for the Partner
with administratively configured values. This allows configuration of
aggregations and individual links when no protocol partner is present,
while still permitting an active partner to override default settings.
The update_Default_Selected function sets the Selected variable FALSE
if the Link Aggregation Group has changed. Since all operational
parameters are now set to locally administered values there can be no
disagreement as to the Link Aggregation Group, so the Matched variable
is set TRUE.
The relevant description is in the chapter 43.4.12 of the link below:
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=850426
Signed-off-by: Weifeng Li <liweifeng96@126.com> Acked-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Dongyang Pan [Sat, 4 Jul 2020 01:15:26 +0000 (09:15 +0800)]
net/bonding: delete redundant code
The function valid_bonded_port_id() has already contains function
rte_eth_dev_is_valid_port(), so delete redundant check.
Fixes: 588ae95e7983 ("net/bonding: fix port ID check") Cc: stable@dpdk.org Signed-off-by: Dongyang Pan <197020236@qq.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
The protocol header are implicitly matched based on the proto
field data. For instance, if ether type is set as 0x800 in the
ether header then ipv4 protocol header is assumed to be present
for template matching even if ipv4 header is not present in the
given flow pattern.
OVS-DPDK is accumulating the flow counters that are returned as part of
the flow_query API and it is being issued at least 3 times every second.
So there is no need to accumulate the counts internally in the driver.
Farah Smith [Fri, 17 Jul 2020 14:14:39 +0000 (19:44 +0530)]
net/bnxt: initialize table scope parameter
Initialize table scope resource manager parameter.
Clear out rm_is_allocated parms before calling as base_index was added
and used incorrectly in this instance.
Add support for new resource manager to manage CFA resources.
TCAM is split into high and low regions now and CFA resource types
are being updated accordingly.
Signed-off-by: Peter Spreadborough <peter.spreadborough@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Farah Smith <farah.smith@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Shougang Wang [Wed, 15 Jul 2020 08:08:10 +0000 (08:08 +0000)]
net/i40e: fix filter pctype
The i40e_filter_pctype TCP_SYN_NO_ACK, UNICAST_IPV4_UDP and
MULTICAST_IPV4_UDP for x722 were missing when translating RSS type to
i40e_filter_pctype. This patch fixes it.
Fixes: da7018ec29d4 ("net/i40e: fix queue region in RSS flow") Cc: stable@dpdk.org Signed-off-by: Shougang Wang <shougangx.wang@intel.com> Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Simei Su [Fri, 17 Jul 2020 03:27:36 +0000 (11:27 +0800)]
net/ice: fix GTPU RSS
Because of incomplete protocol header fields, GTPU_INNER_IPV4_UDP
and GTPU_INNER_IPV4_TCP profile aren't included in inner ipv4 group.
This patch complements header fields for GTPU/GTPU_EH ipv4 rss config.
Besides, after configuring L4 port, GTPU and GTPU_EH packets don't do
hash for UDP/TCP/SCTP. This patch also enables L4 hash for GTPU and GTPU
extension packets.
Fixes: d117de460035 ("net/ice: fix GTPU/PPPoE packets with no hash value") Signed-off-by: Simei Su <simei.su@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Chenmin Sun [Fri, 17 Jul 2020 17:36:58 +0000 (01:36 +0800)]
net/i40e: optimize flow director update rate
This patch optimized the fdir update rate for i40e PF, by tracking
whether the fdir rule being inserted into the guaranteed space
or shared space.
For the flows that are inserted to the guaranteed space, we assume
that the insertion will always succeed as the hardware only report
the "no enough space left" error. In this case, the software can
directly return success and no need to retrieve the result from
the hardware. When destroying a flow, we also assume the operation
will succeed as the software has checked the flow is indeed in
the hardware.
See the fdir programming status descriptor format in the datasheet
for more details.
Signed-off-by: Chenmin Sun <chenmin.sun@intel.com> Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
Chenmin Sun [Fri, 17 Jul 2020 17:36:57 +0000 (01:36 +0800)]
net/i40e: optimize TPID fetching
This patch moves the fetching the device tpid to where it really needs,
rather than fetching it every time when entered the functions.
This is because this operation costs too many cycles and it is used only
when matching the ethernet header.
Signed-off-by: Chenmin Sun <chenmin.sun@intel.com> Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
Chenmin Sun [Fri, 17 Jul 2020 17:36:56 +0000 (01:36 +0800)]
net/i40e: optimize flow director memory management
This patch allocated some memory pool for flow management to avoid
calling rte_zmalloc/rte_free every time.
This patch also improves the hash table operation. When adding/removing
a flow, the software will directly add/delete it from the hash table.
If any error occurs, it then roll back the operation it just done.
Signed-off-by: Chenmin Sun <chenmin.sun@intel.com> Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
Chenmin Sun [Fri, 17 Jul 2020 17:36:55 +0000 (01:36 +0800)]
net/i40e: support flow director space tracking
This patch introduces a FDIR flow management for guaranteed/shared
space tracking.
The fdir space is reported by the
i40e_hw_capabilities.fd_filters_guaranteed and fd_filters_best_effort.
The fdir space is managed by hardware and now is tracking in software.
The management algorithm is controlled by the GLQF_CTL.INVALPRIO.
Detailed implementation please check in the datasheet and the
description of struct i40e_fdir_info.fdir_invalprio.
This patch changes the global register GLQF_CTL. Therefore, when devarg
``support-multi-driver`` is set, the patch will not take effect to
avoid affecting the normal behavior of other i40e drivers, e.g., Linux
kernel driver.
Signed-off-by: Chenmin Sun <chenmin.sun@intel.com> Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
Farah Smith [Wed, 15 Jul 2020 13:50:32 +0000 (19:20 +0530)]
net/bnxt: support two table scopes
Need to remap the table scope ids allocated from HCAPI RM from high
to low value because for legacy devices a table scope is a set of base
addresses. The PCIe addresses must map to a PCIe PF which exists in
the hardware.
Signed-off-by: Farah Smith <farah.smith@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
- The mapping of kernel pages for EEM sysmem operation takes
a significant amount of time. This change give the build option
to delay the sysmem mapping until the first write to EEM
Signed-off-by: Peter Spreadborough <peter.spreadborough@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Louise Kilheeney [Fri, 10 Jul 2020 10:10:47 +0000 (11:10 +0100)]
add python2 deprecation notice
Prepare for python2 removal in 20.11.
Signed-off-by: Louise Kilheeney <louise.kilheeney@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Nicolas Chautru <nicolas.chautru@intel.com>
Yunjian Wang [Fri, 17 Jul 2020 10:50:17 +0000 (18:50 +0800)]
bus/fslmc: fix memory leak in secondary process
In fslmc_process_mcp(), we allocate memory for 'dev_name' but not
released before return in secondary process. And it is not used
since commit a69f79300262 ("bus/fslmc: support multi VFIO group"),
so it can be removed.
Fixes: e55d0494ab98 ("bus/fslmc: support secondary process") Cc: stable@dpdk.org Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Ruifeng Wang [Thu, 16 Jul 2020 15:49:19 +0000 (23:49 +0800)]
lpm: report error when defer queue overflows
Coverity complains about unchecked return value of rte_rcu_qsbr_dq_enqueue.
By default, defer queue size is big enough to hold all tbl8 groups. When
enqueue fails, return error to the user to indicate system issue.
Coverity issue: 360832 Fixes: 8a9f8564e9f9 ("lpm: implement RCU rule reclamation") Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Two new sync modes were introduced into rte_ring:
relaxed tail sync (RTS) and head/tail sync (HTS).
This change provides user with ability to select these
modes for ring based mempool via mempool ops API.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Gage Eads <gage.eads@intel.com>
Ori Kam [Mon, 20 Jul 2020 06:26:17 +0000 (06:26 +0000)]
regex/mlx5: add empty start/stop/close
Add the start, stop and close functions.
In current implementation they are empty functions
and are only exists in order that when called
from rte level, the function will return with success code.
Phil Yang [Fri, 17 Jul 2020 04:36:51 +0000 (12:36 +0800)]
doc: announce removal of mbuf legacy refcnt field
refcnt_atomic member in structures rte_mbuf and rte_mbuf_ext_shared_info
will be removed in 20.11 release.
Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Signed-off-by: Phil Yang <phil.yang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: David Marchand <david.marchand@redhat.com>
blockcipher cases are either returning TEST_SUCCESS
or TEST_FAILED as status, but the test may not be
supported by the PMD which is also a success case
for the PMD. Hence checking for status == TEST_FAILED
for setting the overall status as failed.
Anoob Joseph [Fri, 17 Jul 2020 12:31:10 +0000 (18:01 +0530)]
examples/ipsec-secgw: enable flow based distribution
RTE_FLOW API allows hardware parsing and steering of packets to specific
queues which helps in distributing ingress traffic across various cores.
Adding 'flow' rules allows user to specify the distribution required.
Signed-off-by: Anoob Joseph <anoobj@marvell.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
If a VF request PF to allocate more number of queue pairs, the PF will
free the queue pairs which have been allocated and reset the VF. So,
VF should stop to work until all the process is done. This patch modify
the process of the request queue pairs. To improve efficiency and
eliminate code redundancy, the promiscuous ops were also updated.
Fixes: c48eb308ed13 ("net/i40e: support VF request more queues") Cc: stable@dpdk.org Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com> Acked-by: Jeff Guo <jia.guo@intel.com>
Simei Su [Thu, 16 Jul 2020 03:24:54 +0000 (11:24 +0800)]
net/ice: fix RSS type
When a RSS rule with only SRC/DST_ONLY or IPV6 prefix RSS type,
it should return failure. Besides, when a RSS rule with symmetric
hash function, the RSS type shouldn't carry with SRC/DST_ONLY.
This patch adds invalid RSS type check for the two cases.
When function ice_get_tun_type_for_recipe() get tunnel type,
for ICE_NON_TUN we need to include gtp-c and some gtp-u ptype
with no payload, as they do not have tunnel packet as paylod.
Fixes: 418d2563d10b ("net/ice/base: get tunnel type for recipe") Signed-off-by: Wei Zhao <wei.zhao1@intel.com> Tested-by: Nannan Lu <nannan.lu@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
The current bonding PMD driver call mac_address_slaves_update function
to modify the MAC address of all slaves devices. In
mac_address_slaves_update function, the rte_eth_dev_default_mac_addr_set
API function is called to set the MAC address of the slave devices in
turn in the for loop statement.
When one port reset, calling rte_eth_dev_default_mac_addr_set API fails
because the firmware will not respond to the commands from the driver,
and exit the loop, so other slave devices cannot continue to update the
MAC address.
This patch fixes the issue by avoid exiting the loop when calling
rte_eth_dev_default_mac_addr_set fails.
Fixes: 2efb58cbab6e ("bond: new link bonding library") Cc: stable@dpdk.org Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Chunsong Feng <fengchunsong@huawei.com> Signed-off-by: Xuan Li <lixuan47@hisilicon.com>
net/bonding: fix MAC address when switching active port
Currently, based on a active-backup bond device, when the link status of
the primary port changes from up to down, one slave port changes to the
primary port, but the new primary port's MAC address cannot change to
the bond device's MAC address. And we can't continue receive packets
whose destination MAC addresses are the same as the bond devices's MAC
address.
The current bonding PMD driver call mac_address_slaves_update function
to modify the MAC address of all slaves devices: the primary port using
bond device's MAC address, and other slaves devices using the respective
MAC address. We found that one error using primary_port instead of
current_primary_port in mac_address_slaves_update function.
On the other hand, The current bonding PMD driver sets slave devices's
MAC address according to the variable named current_primary_port. The
variable named current_primary_port changes in the following scenario:
1. Add the slave devices to bond, the first slave port will be regarded
as the current_primary_port. If changing the order of adding the
slave devices, the value of the variable named current_primary_port
will be different.
2. The upper application specifies primary_port via calling the
rte_eth_bond_primary_set API function.
3. Delete the primary slave device.
4. The link status of the primary port changes from up to down.
We have tested the above 4 cases and found that there are problems that
the new primary port's MAC address didn't change to the bond device's
MAC address when running case 3 and 4. When current_primary_port
changes, the new primary port's MAC address should change at the same
time. We also need to call mac_address_slaves_update function to update
MAC addresses in case
3 and 4.
Bugzilla ID: 256 Fixes: 2efb58cbab6e ("bond: new link bonding library") Cc: stable@dpdk.org Signed-off-by: Chunsong Feng <fengchunsong@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
When TRUFLOW is not enabled ulp_ctx is not allocated.
In non-vector Tx datapath we are accessing this invalid pointer
resulting in a segfault. Check if TRUFLOW is enabled before
accessing ulp_ctx to avoid this.
Fixes: 1e46b3962620 ("net/bnxt: fill cfa action in Tx descriptor") Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Because the return value of rte_vdev_init() has multiple non-zero
values, when rte_vdev_init() return non-zero in the
rte_eth_bond_create() function, it should return the actual error code
rather than -ENOMEM.
Fixes: 68451eb6698c ("net/bonding: call through EAL on create/free") Cc: stable@dpdk.org Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Chenxu Di [Tue, 14 Jul 2020 01:37:21 +0000 (01:37 +0000)]
app/testpmd: fix output format in flow query
This patch fix the error line break in the output format of flow query
Fixes: bdb1d61690f7 ("app/testpmd: support RSS config in flow query") Signed-off-by: Chenxu Di <chenxux.di@intel.com> Tested-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Lijun Ou [Tue, 14 Jul 2020 06:16:09 +0000 (14:16 +0800)]
net/hns3: fix RSS configuration on empty RSS type
According to the definition of RSS types of action attributes from
testpmd, the driver will not disable RSS but instead requests the
unspecified "best-effort" settings when upper application call
rte_flow_create API function to create flow using empty RSS types.
As a result, here use the default RSS types when RSS types is empty.
Fixes: c37ca66f2b27 ("net/hns3: support RSS") Cc: stable@dpdk.org Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>