This patch extends the NIC attributes query via DevX.
The appropriate interface structures are borrowed from
kernel driver headers and DevX calls are added to
mlx5_devx_cmd_query_hca_attr() routine.
This patch updates Tx datapath definitions, mostly hardware related.
The Tx descriptor structures are redefined with required fields,
size definitions are renamed to reflect the meanings in more
appropriate way. This is a preparation step before introducing
the new Tx datapath implementation.
This patch introduces new mlx5 PMD devarg options:
- txq_inline_min - specifies minimal amount of data to be inlined into
WQE during Tx operations. NICs may require this minimal data amount
to operate correctly. The exact value may depend on NIC operation
mode, requested offloads, etc.
- txq_inline_max - specifies the maximal packet length to be completely
inlined into WQE Ethernet Segment for ordinary SEND method. If packet
is larger the specified value, the packet data won't be copied by the
driver at all, data buffer is addressed with a pointer. If packet
length is less or equal all packet data will be copied into WQE.
- txq_inline_mpw - specifies the maximal packet length to be completely
inlined into WQE for Enhanced MPW method.
This patch removes the existing Tx datapath code
as preparation step before introducing the new
implementation. The following entities are being
removed:
The following devargs are deprecated and ignored:
- "txq_inline" is going to be converted to "txq_inline_max"
for compatibility issue
- "tx_vec_en"
- "txqs_max_vec"
- "txq_mpw_hdr_dseg_en"
- "txq_max_inline_len" is going to be converted
to "txq_inline_mpw" for compatibility issue
The deprecated devarg keys are recognized by PMD
and ignored/converted to the new ones in order not
to block device probing.
In functions flow_dv_translate() and flow_dv_validate(), the flow
items are scanned and each item is marked in item_flags bitmap.
The code handling some of the items was ported from another project,
where items are marked in a slightly different manner.
This patch fixes the setting of items in bitmap, adapting it to the
required manner.
Fixes: d53aa89aea91 ("net/mlx5: support matching on ICMP/ICMP6") Fixes: 5865955ad994 ("net/mlx5: match GRE key and present bits") Fixes: 2e4c987aad91 ("net/mlx5: validate Direct Rule E-Switch") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Xiaoyu Min <jackmin@mellanox.com>
net/e1000: fix buffer overrun while i219 processing DMA
IntelĀ® 100/200 Series Chipset platforms reduced the round-trip
latency for the LAN Controller DMA accesses, causing in some high
performance cases a buffer overrun while the I219 LAN Connected
Device is processing the DMA transactions. I219LM and I219V devices
can fall into unrecovered Tx hang under very stressfully UDP traffic
and multiple reconnection of Ethernet cable. This Tx hang of the LAN
Controller is only recovered if the system is rebooted. Slightly slow
down DMA access by reducing the number of outstanding requests.
This workaround could have an impact on TCP traffic performance
on the platform. Disabling TSO eliminates performance loss for TCP
traffic without a noticeable impact on CPU performance.
Please, refer to I218/I219 specification update:
https://www.intel.com/content/www/us/en/embedded/products/networking/
ethernet-connection-i218-family-documentation.html
net/bnxt: disable vector mode Tx with VLAN offload
The vector mode transmit path does not currently support VLAN tag
insertion, so we need to disable vector transmit when transmit
VLAN insertion offload is enabled.
We were adding the VLAN filters to all the VNICs of the function.
Also, we were adding these VLANs to all the existing MAC only filters.
This was resulting in fewer VLANs getting added. By default we should
allocate MAC+VLAN filter only to the default VNIC of the function using
the default mac address.
Similar logic was followed in the VLAN deletion code. This patch fixes
it. Use inner VLAN fields instead of outer VLAN during filter deletion
to be in sync with VLAN addition code.
Fixes: 246c5cc5f05e ("net/bnxt: use correct flags during VLAN configuration") Cc: stable@dpdk.org Signed-off-by: Santoshkumar Karanappa Rastapur <santosh.rastapur@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
The current implementation erroneously passes the address of the
beginning of RSS table for each 64-entry context instead of the
address of the appropriate suitable for the context. This results
in only the first 64 receive queues being used. Fix by passing the
correct address for each context.
BCM57500-based adapters use a variable number of RSS contexts
depending upon the number of receive rings in use. The current
implementation is erroneously using the maximum possible number
of RSS contexts instead of the actual number allocated when
setting up RSS tables in the adapter. Fix by using the actual
number of allocated contexts.
In bnxt_hwrm_vnic_rss_cfg_thor, we were exiting if hash_type is 0.
This was preventing RSS getting disabled. Fixing it by removing the
check for hash_type while configuring RSS.
Kalesh AP [Thu, 18 Jul 2019 03:36:05 +0000 (09:06 +0530)]
net/bnxt: fix error checking of FW commands
HWRM_CHECK_RESULT() checks the return value of HWRM command and returns
in case the command fails. There is no need of return value check after
HWRM_CHECK_RESULT().
Fixes: 49947a13ba9e ("net/bnxt: support Tx loopback, set VF MAC and queues drop") Cc: stable@dpdk.org Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Lance Richardson <lance.richardson@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
rte_intr_callback_unregister() can fail if the handler happens to
be active at the time of the call. Add logic to retry a reasonable
number of times to help ensure that the callback is unregistered
on uninit.
Fixes: 7bc8e9a227cc ("net/bnxt: support async link notification") Cc: stable@dpdk.org Signed-off-by: Lance Richardson <lance.richardson@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Kalesh AP [Thu, 18 Jul 2019 03:36:01 +0000 (09:06 +0530)]
net/bnxt: reset filters before registering interrupts
If interrupt registration fails during device init, driver invokes
uninit which in turn causes error messages while trying to free
vnic filters. Fix this by moving filter initialization call before
interrupt registration.
Fixes: 1b533790f44e ("net/bnxt: avoid invalid vnic id in set L2 Rx mask") Cc: stable@dpdk.org Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Kalesh AP [Thu, 18 Jul 2019 03:36:00 +0000 (09:06 +0530)]
net/bnxt: fix device init error path
1. bnxt_dev_init() invokes bnxt_dev_uninit() on failure. So there is
no need to do individual function cleanups in failure path.
2. rearrange the check for primary process to remove an unwanted goto.
3. fix to invoke bnxt_hwrm_func_buf_unrgtr() in bnxt_dev_uninit() when
it is needed.
Fixes: b7778e8a1c00 ("net/bnxt: refactor to properly allocate resources for PF/VF") Cc: stable@dpdk.org Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Kalesh AP [Thu, 18 Jul 2019 03:35:59 +0000 (09:05 +0530)]
net/bnxt: fix setting primary MAC address
1. Default filter is tied to VNIC 0 at index 0. After finding the filter
with mac_index 0 and set the new MAC address, looping through
remaining filters is unnecessary.
2. Added a check for NULL MAC address.
3. bnxt_hwrm_set_l2_filter() clears the existing filter configuration
first before applying new filter settings. Hence there is no need to
invoke bnxt_hwrm_clear_l2_filter() explicitly in
bnxt_set_default_mac_addr_op().
Fixes: d69851df12b2 ("net/bnxt: support multicast filter and set MAC addr") Cc: stable@dpdk.org Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Kalesh AP [Thu, 18 Jul 2019 03:35:55 +0000 (09:05 +0530)]
net/bnxt: fix error handling in port start
1. during port start, if bnxt_init_chip() return error
bnxt_dev_start_op() invokes bnxt_shutdown_nic() which in turn calls
bnxt_free_all_hwrm_resources() to free up resources. Hence remove the
bnxt_free_all_hwrm_resources() from bnxt_init_chip() failure path.
2. fix to check the return value of rte_intr_enable() as this call
can fail.
3. set bp->dev_stopped to 0 only when port start succeeds.
4. handle failure cases in bnxt_init_chip() routine to do proper
cleanup and return correct error value.
Fixes: b7778e8a1c00 ("net/bnxt: refactor to properly allocate resources for PF/VF") Cc: stable@dpdk.org Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
net: be more restrictive with ethernet address format
The current ether_unformat_addr code was based off of
BSD ether_aton. That version changed what was allowed
by the cmdline ether address parser.
For example, it allows dropping leading zeros.
Change the code to be more restrictive and only allow the fully
expanded standard formats.
Bugzilla ID: 324 Fixes: 596d31092d32 ("net: add function to convert string to ethernet address") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
When NVM API version is 1.7 or above adminq operation to set TPID is
set as supported. This cause using adminq instead of registers.
For SFP X722 FW4.16, reported NVM API version is 1.8, and this cause
adminq operation to set as supported but it is not supported on FW4.16
Additional check added for SFP X722 to not enable adminq operation.
Fixes: 73cd7d6dc8e1 ("net/i40e: use set switch AQ instead of register setting") Cc: stable@dpdk.org Signed-off-by: Xiao Zhang <xiao.zhang@intel.com> Reviewed-by: Haiyue Wang <haiyue.wang@intel.com>
Ying A Wang [Thu, 18 Jul 2019 01:38:42 +0000 (09:38 +0800)]
net/ice: fix flow action validation
Action is a list. We should check each element of the action
rather than the first one.
This patch fixes this issue.
Fixes: d76116a4678f ("net/ice: add generic flow API") Signed-off-by: Ying A Wang <ying.a.wang@intel.com> Acked-by: Qiming Yang <qiming.yang@intel.com> Reviewed-by: Xiaolong Ye <xiaolong.ye@intel.com>
Some configuration options can not be tested properly with testpmd
because it automatically starts all ports. This makes it harder
to test driver handling of configuration options:
(for example rx_deferred_start).
Add new command line flag --disable-device-start which skips
the device start. The port can then be started manually later.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Qingmin Liu [Wed, 17 Jul 2019 10:41:38 +0000 (16:11 +0530)]
net/bnxt: fix RxQ count if ntuple filtering is disabled
If ntuple filtering is disabled, FW will return max_vnics=1.
Due to this only single Rxq is created.
Change to max_rx_rings = RTE_MIN(bp->max_rx_rings, bp->max_stat_ctx) to
fix it.
Fixes: 6d8109bcb398 ("net/bnxt: check VF resources if resource manager is enabled") Cc: stable@dpdk.org Signed-off-by: Qingmin Liu <qingmin.liu@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Kalesh AP [Wed, 17 Jul 2019 10:41:36 +0000 (16:11 +0530)]
net/bnxt: check invalid VNIC in cleanup path
The cleanup/rollback operation post rte_eth_dev_start failure might end
up invoking an HWRM cmd even on an invalid vNIC resulting in error
messages being logged needlessly.
Fix to check for the same before issuing the HWRM cmd.
Kalesh AP [Wed, 17 Jul 2019 10:41:35 +0000 (16:11 +0530)]
net/bnxt: fix enabling/disabling interrupts
1. Disable interrupts in dev_stop_op()
2. Enable interrupts in dev_start_op()
3. Clean queue intr-vector mapping in dev_stop_op() and thus
fix a possible memory leak.
Avoid null pointer dereference when allocating an insulated
completion ring by basing nq ring allocation on whether an
nq ring was requested instead of whether the device supports
nq rings.
net/bnxt: fix doorbell register offset for Tx ring
For Tx-ring # 104 and higher, the doorbell register was incorrectly
configured due to which FW was not able to receive the notification
of packet to transmit.
With this fix, user can run traffic upto 256 rings.
Initialize the state of the completion valid indicator
when a completion ring is freed, otherwise completions may
not be processed when a new ring is allocated.
Fixes: 5735eb241947 ("net/bnxt: support Tx batching") Cc: stable@dpdk.org Signed-off-by: Lance Richardson <lance.richardson@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Kalesh AP [Wed, 17 Jul 2019 10:41:27 +0000 (16:11 +0530)]
net/bnxt: fix VF probe when MAC address is zero
VF driver should not fail probe if the host PF driver has not assigned
any MAC address for the VF. It should generate a random MAC address and
configure the MAC and then continue probing the device.
Fixes: be160484a48d ("net/bnxt: check if MAC address is all zeros") Cc: stable@dpdk.org Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Kalesh AP [Wed, 17 Jul 2019 10:41:25 +0000 (16:11 +0530)]
net/bnxt: fix extended port counter statistics
1. refactor stats allocation code to new routine
2. check for extended statistics support depends on "hwrm_spec_code"
which is set in bnxt_hwrm_ver_get called later. Hence we were never
querying extended port stats as flags field was not updated. Fixed
this by moving the stats allocation after the call to
bnxt_hwrm_ver_get.
3. we were incorrectly passing the host address used for port
statistics to PORT_QSTATS_EXT command. Fixed this by passing the
correct extended stats address.
Fixes: f55e12f33416 ("net/bnxt: support extended port counters") Cc: stable@dpdk.org Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
With the latest published interface of
rte_eal_hotplug_[add,remove](), and rte_eth_dev_close(),
rte_eth_dev_close() would cleanup all the data structures of
port's eth dev leaving the device common resource intact
if RTE_ETH_DEV_CLOSE_REMOVE is set in dev flags.
So a new command "detach device" (~hotplug remove) to work,
with device identifier like "port attach" is added
to be able to detach closed devices.
Also to display currently probed devices, another command
"show device info <identifier>|all" is also added as a
part of this change.
Xiaoyu Min [Wed, 17 Jul 2019 12:27:08 +0000 (20:27 +0800)]
app/testpmd: support raw encap/decap actions
This patch intend to support
action_raw_encap/decap [1] in a generic and convenient way.
Two new commands - set raw_encap, set raw_decap are introduced just
like the other commands for encap/decap, i.e. set vxlan.
These two commands have corresponding global buffers
which can be used by PMD as the input buffer for raw encap/decap.
The commands use the rte_flow pattern syntax to help user build the
raw buffer in a convenient way.
A common way to use it:
- encap matched egress packet with VxLAN tunnel:
testpmd> set raw_encap eth src is 10:11:22:33:44:55 / vlan tci is 1
inner_type is 0x0800 / ipv4 / udp dst is 4789 / vxlan vni
is 2 / end_set
testpmd> flow create 0 egress pattern eth / ipv4 / end actions
raw_encap / end
- decap l2 header and encap GRE tunnel on matched egress packet:
testpmd> set raw_decap eth / end_set
testpmd> set raw_encap eth dst is 10:22:33:44:55:66 / ipv4 / gre
protocol is 0x0800 / end_set
testpmd> flow create 0 egress pattern eth / ipv4 / end actions
raw_decap / raw_encap / end
- decap VxLAN tunnel and encap l2 header on matched ingress packet:
testpmd> set raw_encap eth src is 10:11:22:33:44:55 type is 0x0800 /
end_set
testpmd> set raw_decap eth / ipv4 / udp / vxlan / end_set
testpmd> flow create 0 ingress pattern eth / ipv4 / udp dst is 250 /
vxlan vni is 0x1234 / ipv4 / end actions raw_decap /
raw_encap / queue index 1 / mark id 0x1234 / end
Xiaoyu Min [Wed, 17 Jul 2019 12:27:07 +0000 (20:27 +0800)]
app/testpmd: move VXLAN/NVGRE help in filters section
The help string of set vxlan*, set nvgre* are in "config" section.
But they actually do not alter NIC or testpmd's configuration and
they will be used by "flow" command later.
Put them in "filters" section along with "flow" command seems more
reasonable.
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Previously in the PCAP PMD queues has to be defined as RxQ and TxQ
pairs, even if the need is only Rx or only Tx:
"--vdev net_pcap0,tx_pcap=tx.pcap,rx_pcap=rx.pcap"
Following commit enabled only providing Rx queue, and if Tx queue is
not provided PMD drops the Tx packets automatically:
Commit a3f5252e5cbd ("net/pcap: enable infinitely Rx a pcap file")
"--vdev net_pcap0,rx_pcap=rx.pcap"
This commit enables same thing for Rx queue, user no more have to
provide a Rx queue (rx_iface or rx_pcap), for this case a dummy Rx
burst function is used which doesn't return any packet at all:
"--vdev net_pcap0,tx_pcap=tx.pcap"
This makes only saving packets to a pcap file use case easy.
When both Rx and Tx queues are missing PMD will return an error.
(Single interface is still supported: "--vdev net_pcap0,iface=eth0")
David Harton [Fri, 12 Jul 2019 17:35:43 +0000 (13:35 -0400)]
net/ena: fix admin CQ polling for 32-bit
Recent modifications to admin command queue polling logic
did not support 32-bit applications. Updated the driver to
work for 32 or 64 bit applications
Fixes: 3adcba9a8987 ("net/ena: update HAL to the newer version") Cc: stable@dpdk.org Signed-off-by: David Harton <dharton@cisco.com> Acked-by: Michal Krawczyk <mk@semihalf.com>
This patch updates "show port info [port_id]" command to display
the tx_desc_lim.nb_seg_max and tx_desc_lim.nb_mtu_seg_max fields
of rte_eth_dev_info structure.
In case the asynchronous devx commands are not supported in RDMA core
fallback to use a basic counter management.
Here, the PMD counters cashe is redundant and the host thread doesn't
update it. hence, each counter operation will go to the FW and the
acceleration reduces.
All the DV counters are cashed in the PMD memory and are contained in
pools which are contained in containers according to the counters
allocation type - batch or single.
Currently, the flow counter query is done synchronously in pool
resolution means that on the user request a FW command is triggered to
read all the counters in the pool.
A new feature of devX to asynchronously read batch of flow counters
allows to accelerate the user query operation.
Using the DPDK host thread, the PMD periodically triggers asynchronous
query in pool resolution for all the counter pools and an interrupt is
triggered by the FW when the values are updated.
In the interrupt handler the pool counter values raw data is replaced
using a double buffer algorithm (very fast).
In the user query, the PMD just returns the last query values from the
PMD cache - no system-calls and FW commands are triggered from the user
control thread on query operation!
More synchronization is added with the host thread:
Container resize uses double buffer algorithm.
Pools growing in container uses atomic operation.
Pool query buffer replace uses a spinlock.
Pool minimum devX counter ID uses atomic operation.
The DevX interface exposes a new feature to the PMD that can allocate a
batch of counters by one FW command. It can improve the flow
transaction rate (with count action).
Add a new counter pools mechanism to manage HW counters in the PMD.
So, for each flow with counter creation the PMD will try to find a free
counter in the PMD pools container and only if there is no a free
counter, it will allocate a new DevX batch counters.
Currently we cannot support batch counter for a group 0 flow, so
create a 2 container types, one which allocates counters one by
one and one which allocates X counters by the batch feature.
The allocated counters objects are never released back to the HW
assuming the flows maximum number will be close to the actual value of
the flows number.
Later, it can be updated, and dynamic release mechanism can be added.
The counters are contained in pools, each pool with 512 counters.
The pools are contained in counter containers according to the
allocation resolution type - single or batch.
The cache memory of the counters statistics is saved as raw data per
pool.
All the raw data memory is allocated for all the container in one
memory allocation and is managed by counter_stats_mem_mng structure
which registers all the raw memory to the HW.
Each pool points to one raw data structure.
The query operation is in pool resolution which updates all the pool
counter raw data by one operation.
Michal Krawczyk [Tue, 16 Jul 2019 11:13:32 +0000 (13:13 +0200)]
net/ena: update version to 2.0.1
In 2.0.1 ENA, there were patches for:
* assigning NUMA node to the IO queue
commit 4217cb0b7d2c ("net/ena: fix assigning NUMA node to IO queue")
* statistics counters (Rx checksum errors and per-queue number of the
Tx packets)
commit ef74b5f7b69b ("net/ena: fix Rx checksum errors statistics")
commit 5673e285a633 ("net/ena: fix Tx statistics")
* SMP support
commit 117ba4a60488 ("net/ena: get device info statically")
* setting Rx checksum support
commit ef538c1a7f56 ("net/ena: fix checksum feature flag")
The iavf driver crashes when forwarding packets with TSO
enabled. The reason is that the tx context descriptor
configuration is not transferred to tx-ring. This step is
added in this patch.
Because of the commit mentioned below the default case was changed and
this broke single_iface support. This patch adds a check to fix
single_iface support.
In the eth_pcap_tx() and eth_pcap_tx_dumper() functions mbufs were freed
without incrementing num_tx.
This may lead application also try to free or use invalid mbuf.
net/bnxt: create ring group array only when needed
Fix an overrun of the ring group array with BCM5750X-based
adapters by ensuring that the ring group array is not allocated
or accessed for adapters that do not support ring groups.
The conditional used to determine whether freeing RSS
contexts for thor vs. non-thor controller was reversed.
Fix this, also reset number of active RSS contexts to
zero after release in the thor case.
John Daley [Tue, 16 Jul 2019 05:37:20 +0000 (22:37 -0700)]
net/enic: remove PMD log type references
Don't use RTE_LOGTYPE_PMD as it is too general.
Also, just use 1 log type for all of enic PMD (pmd.net.enic)
Signed-off-by: John Daley <johndale@cisco.com> Reviewed-by: Hyong Youb Kim <hyonkim@cisco.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
net/octeontx2: support flow API flags based extraction
Adding support for flags based extraction in octeontx2 Flow.
Patch supports extracting data greater than 32 bytes using lflags.
When flags based extraction is enabled, lower 4 bits will be
considered (16 flags) for indexing the flags, and will be used
for extraction.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
The rte_vdev_driver is declared twice.
The first one is not necessary.
Fixes: 740feaf349b1 ("ethdev: remove driver name from device private data") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The rte_vdev_drivers are declared twice.
The first one is not necessary.
Fixes: 740feaf349b1 ("ethdev: remove driver name from device private data") Fixes: 204d026a3922 ("net/tap: support tun") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Keith Wiles <keith.wiles@intel.com>