Matan Azrad [Tue, 19 Dec 2017 17:14:29 +0000 (17:14 +0000)]
net/failsafe: improve Rx sub-devices iteration
Connecting the sub-devices each other by cyclic linked list can help to
iterate over them by Rx burst functions because there is no need to
check the sub-devices ring wraparound.
Create the aforementioned linked-list and change the Rx burst functions
iteration accordingly.
Matan Azrad [Tue, 19 Dec 2017 17:14:28 +0000 (17:14 +0000)]
net/failsafe: mitigate data plane atomic operations
Fail-safe uses atomic operations to protect sub-device close operation
calling by host thread in removal time while the removed sub-device
burst functions are still in process by application threads.
Using "set" atomic operations is a little bit more efficient than "add"
or "sub" atomic operations because "set" shouldn't read the value and
in fact, it does not need a special atomic mechanism in x86 platforms.
Replace "add 1" and "sub 1" atomic operations by "set 1" and "set 0"
atomic operations.
Matan Azrad [Tue, 19 Dec 2017 17:14:27 +0000 (17:14 +0000)]
net/failsafe: fix Rx safe check compiler hint
failsafe_rx_burst function is used when there are no sub-devices or at
least one of them has been removed, on the other hand, when all the
sub-devices are present, failsafe_rx_burst_fast function is used.
So it's really expected that some of the sub-devices will be unsafe for
Rx burst in failsafe_rx_burst execution.
Remove unlikely compiler hint from fs_rx_unsafe calling.
Sharmila Podury [Thu, 11 Jan 2018 19:12:44 +0000 (11:12 -0800)]
net/bonding: add ethdev ops function for MTU set
Set the MTU for bonding device by calling .mtu_set for all
the slaves. Set the MTU only if all slaves support .mtu_set,
and there is no error returned from any slave.
Hyong Youb Kim [Wed, 10 Jan 2018 09:17:08 +0000 (01:17 -0800)]
net/enic: refill only the address of the RQ descriptor
Once the RQ descriptors are initialized (enic_alloc_rx_queue_mbufs),
their length_type does not change during normal RX
operations. rx_pkt_burst only needs to reset their address field for
newly allocated mbufs.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
Hyong Youb Kim [Wed, 10 Jan 2018 09:17:07 +0000 (01:17 -0800)]
net/enic: remove a couple unnecessary statements
No need to zero ol_flags as it is overwritten at the end of the
function. No need to check for EOP as the caller (enic_recv_pkts) has
already checked it.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
Hyong Youb Kim [Wed, 10 Jan 2018 09:17:04 +0000 (01:17 -0800)]
net/enic: fix L4 Rx ptype comparison
For non-UDP/TCP packets, enic may wrongly set PKT_RX_L4_CKSUM_BAD in
ol_flags. The comparison that checks if a packet is UDP or TCP assumes
that RTE_PTYPE_L4 values are bit flags, but they are not. For example,
the following evaluates to true because NONFRAG is 0x600 and UDP is
0x200, and causes the current code to think the packet is UDP.
!!(RTE_PTYPE_L4_NONFRAG & RTE_PTYPE_L4_UDP)
So, fix this by comparing the packet type against UDP and TCP
individually.
Fixes: 453d15059b58 ("net/enic: use new Rx checksum flags") Cc: stable@dpdk.org Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
For enic, the required changes are mechanical. Use the new 'offloads'
field in rxmode instead of the bit fields. And, no changes required
with respect to txq_flags, as enic does not use it at all.
Per-queue RX offload capabilities are not set, as all offloads are
per-port at the moment.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
Thomas Monjalon [Thu, 4 Jan 2018 16:01:11 +0000 (17:01 +0100)]
ethdev: add notifications for probing and removal
When a PMD finishes probing, it creates the new port by calling
the function rte_eth_dev_allocate().
A notification of the new port is sent there to the upper layer.
When a PMD finishes removal of a port, it calls the function
rte_eth_dev_release_port().
A notification of the destroyed port is sent there to the upper layer.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Thomas Monjalon [Thu, 4 Jan 2018 16:01:08 +0000 (17:01 +0100)]
ethdev: remove useless parameter in callback process
The pointer to the user parameter of the callback registration is
automatically pass to the callback function.
There is no point to allow changing this user parameter by a caller.
That's why this parameter is always set to NULL by PMDs and set only
in ethdev layer before calling the callback function.
The history is that the user parameter was initially used
by the callback implementation to pass some information
between the application and the driver: c1ceaf3ad056 ("ethdev: add an argument to internal callback function")
Then a new parameter has been added to leave the user parameter
to its standard usage of context given at registration: d6af1a13d7a1 ("ethdev: add return values to callback process API")
The NULL parameter in the internal callback processing function
is now removed. It makes clear that the callback parameter is user
managed and opaque from a DPDK point of view.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Thomas Monjalon [Fri, 5 Jan 2018 17:38:55 +0000 (18:38 +0100)]
ethdev: fix link autonegotiation value
There are 3 kind of link data in ethdev:
- capabilities (rte_eth_dev_info)
- configuration (rte_eth_conf)
- status (rte_eth_link)
A bit-field is used for capabilities (rte_eth_dev_info.speed_capa) and
configuration (rte_eth_conf.link_speeds).
Bits are defined in ETH_LINK_SPEED_*.
Some numerical (ETH_SPEED_NUM_*) and boolean (ETH_LINK_*) values
are used for the link status (rte_eth_link.*).
There was a mistake in the comment of rte_eth_link.link_autoneg,
suggesting ETH_LINK_SPEED_[AUTONEG/FIXED] which are 0/1,
instead of ETH_LINK_[AUTONEG/FIXED] which are 1/0.
The drivers are fixed to use ETH_LINK_[AUTONEG/FIXED].
Fixes: 82113036e4e5 ("ethdev: redesign link speed config") Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Olivier Matz [Wed, 3 Jan 2018 10:32:25 +0000 (11:32 +0100)]
net/bnxt: fix headroom initialization
When allocating a new mbuf for Rx, the value of m->data_off should not
be reset to its default value (RTE_PKTMBUF_HEADROOM), instead of reusing
the previous undefined value, which could cause the packet to have a too
small or too high headroom.
Chas Williams [Thu, 28 Dec 2017 02:12:31 +0000 (21:12 -0500)]
net/bonding: fix setting slave MAC addresses
Use rte_eth_dev_default_mac_addr_set() to change a slave MAC address.
mac_address_set() only updates the software copy and does nothing to
update the hardware.
Signed-off-by: Chas Williams <chas3@att.com> Acked-by: Declan Doherty <declan.doherty@intel.com>
Ajit Khaparde [Mon, 8 Jan 2018 20:24:37 +0000 (12:24 -0800)]
net/bnxt: check on-chip resources
Check for availability of on-chip resources - like Queue count,
number stat context, number of ring groups before inheriting and
initializing as per application requirements.
Also check before creating a Tx or Rx queue make sure there are
enough resources to complete the request.
Somnath Kotur [Mon, 8 Jan 2018 20:24:36 +0000 (12:24 -0800)]
net/bnxt: free the aggregation ring
bnxt_free_all_hwrm_rings() was freeing all the Rx Rings including
zero-ing out the memory for the Aggregation rings, but was not issuing
the FW cmd to destroy the AGG ring(s) from HW. This would manifest in
problems when port stop/port start would be issued as there would be a
HW ring leak every time port stop was issued.
Fixes: daef48efe5e5 ("net/bnxt: support set MTU") Cc: stable@dpdk.org Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Somnath Kotur [Mon, 8 Jan 2018 20:24:35 +0000 (12:24 -0800)]
net/bnxt: fix duplicate pattern for 5tuple filter
When user re-issues same 5 tuple filter pattern cmd with different
destination queue, it would flag it as an existing match.
However, when deletion on this filter was attempted, it would crash
as the 'vnic' from which the filter was being removed from would be
different. Fix by updating the filter in the scenario where there
is a pattern match and only the destination queue varies.
If the attribute/pattern for a flow is the same, with only the 'action'
i.e the destination queue index changing, allow it by cleaning up
the older ntuple filter and updating the existing flow with
the new filter rule having the new destination queue ID.
Also, clear the L2 filter during flow_destroy after destroying
the ntuple filter, otherwise the flow record is not completely purged
from the HW.
Ajit Khaparde [Mon, 8 Jan 2018 20:24:33 +0000 (12:24 -0800)]
net/bnxt: remove addition of a temporary filter
filter1 which is used to just get the l2 filter FW id and not used
later was unnecessarily being inserted into a list and was not being
freed after it's use was done.
Fix it by not doing the list insertion and releasing it back to the free
filter pool.
Ajit Khaparde [Mon, 8 Jan 2018 20:24:32 +0000 (12:24 -0800)]
net/bnxt: fix check for ether type
As per documentation, While supporting ethertype_filters matching
ether_types 0x0800 (IPv4) and 0x86DD (IPv6) is invalid.
But we were wrongly doing that. This patch fixes it.
Ajit Khaparde [Mon, 8 Jan 2018 20:24:31 +0000 (12:24 -0800)]
net/bnxt: check initialization before accessing stats
Maintain state of PMD initialization and check it before checking stats.
In certain cases, we might end up accessing stats before the required
HWRM commands are processed by FW.
Ajit Khaparde [Mon, 8 Jan 2018 20:24:30 +0000 (12:24 -0800)]
net/bnxt: add check for multi host PF per port
Certain SKUs of NIC can support features like NPAR, Multi Host PFs per
port. We need to check for such features in order to restrict certain
HWRM commands from being sent to the FW.
For the single PF per port model, allow commands like hwrm_port_phy_cfg
from the PF driver. In NPAR and MH environments with multiple PFs per
port, we should not allow HWRM commands like hwrm_port_phy_cfg to be
sent to the FW.
Ajit Khaparde [Mon, 8 Jan 2018 20:24:29 +0000 (12:24 -0800)]
net/bnxt: return proper error code
If the FW fails bnxt_hwrm_func_reset() with an error status,
instead of returning -1, return a more standard value of -EIO.
Similarly sometimes the status returned by certain FW commands
may not be generic. Return a more standard value of -EIO in
that case as well.
Ajit Khaparde [Mon, 8 Jan 2018 20:24:25 +0000 (12:24 -0800)]
net/bnxt: check return values in init
We are not checking for return values of functions like
bnxt_hwrm_queue_qportcfg and bnxt_hwrm_func_qcfg in bnxt_dev_init
thereby preventing a cleanup in case of a HWRM command failure.
This patch fixes that.
Hemant Agrawal [Wed, 10 Jan 2018 10:46:39 +0000 (16:16 +0530)]
bus/dpaa: support static queues
DPAA hardware support two kinds of queues:
1. Pull mode queue - where one needs to regularly pull the packets.
2. Push mode queue - where the hw pushes the packet to queue. These are
high performance queues, but limited in number.
This patch add the driver support for push mode queues.
Signed-off-by: Sunil Kumar Kori <sunil.kori@nxp.com> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Hemant Agrawal [Wed, 10 Jan 2018 10:46:28 +0000 (16:16 +0530)]
net/dpaa: add frame count based tail drop with CGR
Replace the byte based tail queue congestion support
with frame count based congestion groups.
It can easily map to number of RX descriptors for a queue.
net/sfc: make Tx free threshold check datapath specific
EFX_TXQ_LIMIT is libefx-specifics and it should not be used
for other Tx datapaths implementations (e.g. EF10 native).
EF10 native Tx datapath has its own understanding of the maximum
TxQ fill level imposed by EvQ clear strategy and space reserved
for Tx error and flush events.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andy Moreton <amoreton@solarflare.com>
net/sfc: make refill threshold check Rx datapath specific
EFX_RXQ_LIMIT is libefx-specifics and it should not be used
for other Rx datapaths implementations (e.g. EF10 native).
EF10 native Rx datapath has its own understanding of the maximum
RxQ fill level imposed by EvQ clear strategy and space reserved
for Rx error and flush events.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Shahaf Shuler [Wed, 10 Jan 2018 09:16:58 +0000 (11:16 +0200)]
net/mlx5: add device configuration structure
Move device configuration and features capabilities to its own structure.
This structure is filled by mlx5_pci_probe(), outside of this function
it should be treated as *read only*.
This configuration struct will be used for the Tx/Rx queue setup to
select the Tx/Rx queue parameters based on the user configuration and
device capabilities.
In addition it will be used by the burst selection function to decide
on the best pkt burst to be used.
Shahaf Shuler [Wed, 10 Jan 2018 09:16:57 +0000 (11:16 +0200)]
net/mlx5: change pkt burst select function prototype
Change the function prototype to return the function pointer of the
selected Tx/Rx burst function instead of assigning it directly to the
device context.
Such change will enable to use those select functions to query the burst
function that will be selected according to the device configuration.
Yongseok Koh [Tue, 9 Jan 2018 17:38:50 +0000 (09:38 -0800)]
net/mlx5: fix overwriting bit-fields in SW Rx queue
Bit-fields in mlx5_rxq_data can be changed on the fly by a control plane -
e.g. rxq->mark. However, vectorized Rx uses a bit-field to mark pending
errors. Even if one bit is written, consequence is to write the whole
integer and this can cause a synchronization issue - two entities write to
a same block without locking. As the pending_err bit is entirely internal
use for the datapath, this can be replaced with a local variable.
Moti Haimovsky [Thu, 4 Jan 2018 16:12:03 +0000 (18:12 +0200)]
net/mlx4: verify Tx max sges
Max number of Tx scatter-gather entries is a property of the device
and is queried at init. This value was not changed in a while and
most probably will not be changed in the future, Therefore and
in order to enhance Tx performance, the Tx max-sge value is hardcoded
in mlx4 PRM code.
This patch adds a verification that the above assumption still holds
and that the hardcoded value is still supported by the mlx4 hardware.
Add description of raw flow type mode for flow_director_filter
command in testpmd. Modify description of flow type parameter
for functions set_hash_global_config, set_hash_input_set and
set_fdir_input_set.
Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Olivier Matz [Wed, 3 Jan 2018 14:29:23 +0000 (15:29 +0100)]
net/i40e: fix VSI MAC filter on primary address change
When primary address mac is changed, the mac filters were not updated in
the VSI with the new mac addr and incoming packets with this destination
address are dropped by the hardware filters.
This patch removes the VSI mac filter for the previous mac address and
adds a new one for new mac address.
Add a new Rx function using AVX2 instructions for higher
performance. For now, this functionality is limited to platforms
with Intel Xeon Scalable Processor(SP). The function to be used
is selected at runtime, not just at compile-time.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Add a new Tx function using AVX2 instructions for higher
performance. For now, this functionality is limited to platforms
with Intel Xeon Scalable Processor(SP). The function to be used
is selected at runtime, not just at compile-time.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>