- Remove an additional wrapper function ecore_mcp_nvm_command and
instead
use ecore_mcp_nvm_wr_cmd, ecore_mcp_nvm_rd_cmd or ecore_mcp_cmd APIs
directly as appropriate.
- Remove struct ecore_mcp_nvm_params
- Add new NVM command ECORE_EXT_PHY_FW_UPGRADE and fix the expected
management FW responses in ecore_mcp_nvm_write()
- Fail the NVM write process on any failing partial write
Revise the manamgement FW mbox access locking scheme for the access to the
MFW mailbox:
- add a new linked list called cmd_list to ecore_mcp_info that tracks all
the mailbox commands sent to management FW and ones waiting for
response.
- add a mutex lock called cmd_lock to ecore_mcp_info, a spinlock used to
serialize the access to this cmd_list and makes sure that the mbox is
not a pending one before sending a new mbox request. It protects the
access to the mailbox commands list and sending of the commands.
- add ecore_mcp_cmd_add|del|get_elem() APIs for new access scheme
- remove ecore_mcp_mb_lock() and ecore_mcp_mb_unlock()
- add a mutex lock called link_lock to ecore_mcp_info, a spinlock used for
syncing SW link-changes and link-changes originating from attention
context. This locking scheme prevents possible race conditions that may
occur, such as during link status reporting.
- Surround OSAL_{MUTEX,SPIN_LOCK}_{ALLOC,DEALLOC} with
'#ifdef CONFIG_ECORE_LOCK_ALLOC'. In case memory has to be allocated for
lock primitives, then compile driver with CONFIG_ECORE_LOCK_ALLOC flag.
There's a possible race in multiple VF scenarios for base driver users
that use the optional APIs ecore_iov_pf_get_and_clear_pending_events,
ecore_iov_pf_add_pending_events. If the client doesn't synchronize the two
calls, it's possible for the PF to clear a VF pending message indication
without ever getting it [as 'get & clear' isn't atomic], leading to VF
timeout on the command.
The solution is to switch into a per-VF indication rather than having a
bitfield for the various VFs with pending events. As part of the solution,
the setting/clearing of the indications is done internally by base driver.
As a result, ecore_iov_pf_add_pending_events is no longer needed and
ecore_iov_pf_get_and_clear_pending_events loses the 'and_clear' from its
name as its now a proper getter.
A VF would be considered 'pending' [I.e., get_pending_events() should
have '1' for it in its bitfield] beginning with the PF's base driver
recognizing a message sent by that VF [in SP_DPC] and ending only when
that VF message is processed.
Use the ptt[PF translation table] handler that is passed rather than using
main ptt from the HW function.
In ecore_hw_get_resc()'s error flow, release the MFW generic resource lock
only if needed.
Change the verbosity level of GRC timeout from DP_INFO() to DP_NOTICE().
Reduce verbosity of print in ecore_hw_bar_size().
- Base driver EEE (Energy efficient ethernet) support.
- Provide supported-speed mask to driver though shared memory.
- Read/use eee-supported capabilities value from the shared memory.
- Update qed_fill_link() to advertise the EEE capabilities.
- Add support to retain/clear data for crash dump by introducing the mdump
GET_RETAIN/CLR_RETAIN sub commands, new APIs
ecore_mcp_mdump_get_retain() and ecore_mcp_mdump_clr_retain()
- Avoid checking for mdump logs and data in case of an emulator
- Fix "deadbeaf" returned value in case of pcie status command read
fails (prevent false detection)
- Add an option to override the default force load behavior.
- PMD will set the override force load parameter to
ECORE_OVERRIDE_FORCE_LOAD_ALWAYS.
- Modify the printout when a force load is required to include the loaded
value
- No need for 'default' when switching over enums and covering all the
values.
Add SmartAN feature that automatically detects peer switch capabilities
which relieves users from fumbling with adapter and switch configuration
Add new cmd DRV_MSG_CODE_GET_MFW_FEATURE_SUPPORT. Add new SmartLinQ config
method using NVM cfg options 239.
net/qede/base: interchangeably use SB between PF and VF
Status Block reallocation - allow a PF and its child VF to change SB
between them using new base driver APIs.
The changes that are inside base driver flows are:
New APIs ecore_int_igu_reset_cam() and ecore_int_igu_reset_cam_default()
added to reset IGU CAM.
a. During hw_prepare(), driver would re-initialize the IGU CAM.
b. During hw_stop(), driver would initialize the IGU CAM to default.
Use igu_sb_id instead of sb_idx [protocol index] to allow setting of
the timer-resolution in CAU[coalescing algorithm unit] for all SBs,
sb_idx could limit SBs 0-11 only to be able change their timer-resolution.
Allow opening Multiple Tx queues on a single qzone for VFs.
This is supported by Rx/Tx TLVs now having an additional extended TLV that
passes the `qid_usage_idx', a unique number per each queue-cid that was
opened for a given queue-zone.
Fix to overcome TX timeout issue due to more than 16 CIDs by adding an
additional VF legacy mode. This will detach the CIDs from the original
only-existing legacy mode suited for older releases.
Following this change, only VFs that would publish VFPF_ACQUIRE_CAP_QIDS
would have the new CIDs scheme applied. I.e., the new 'legacy' mode is
actually whether this capability is published or not.
Changed the logic to clear doorbells for legacy and non-legacy VFs, so
the PF is cleaning the doorbells for both cases.
Change the order by which we allocate the resources to align with
management FW by first allocating the VF l2 queues and only
afterwards use what's left for the PF.
net/qede/base: update management FW supported features
- Add transceivers temperature monitoring/reporting feature
- Add new mbox command DRV_MSG_CODE_FEATURE_SUPPORT to exchange info
between drivers and management FW regarding features supported
- Add EEE to Link Flap Avoidance check, etc.
rx/tx_queue_setup functions are shared between PF and VF
drivers. So the var 'pf' should not be assigned at the beginning.
This patch fixes the issue, and also corrects the return err code.
Yong Wang [Tue, 12 Sep 2017 12:44:00 +0000 (08:44 -0400)]
net/igb: fix memcpy length
The size of "flex_filter.filter_info.mask" and "filter->mask" are 16
bytes, but the length of memcpy--"RTE_ALIGN(filter->len, sizeof(char))
/ sizeof(char)" may reach 128 bytes which may cause array access out
of bound.
Fix it by replacing "sizeof(char)" by "CHAR_BIT".
Fixes: 231d43909a31 ("igb: migrate flex filter to new API") Cc: stable@dpdk.org Signed-off-by: Yong Wang <wang.yong19@zte.com.cn> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This commit fixes two bugs related to tap devices. The first bug occurs
when executing in testpmd the following flow rule assuming tap device has
4 rx and tx pair queues
"flow create 0 ingress pattern eth / end actions queue index 5 / end"
This command will report on success and will print ""Flow rule #0 created"
although it should have failed as queue index number 5 does not exist
The second bug occurs when executing in testpmd "port start all" following
a port configuration. Assuming 1 pair of rx and tx queues an error is
reported: "Fail to start port 0"
Before this commit a fixed max number (16) of rx and tx queue pairs were
created on startup where the file descriptors (fds) of rx and tx pairs were
identical. As a result in the first bug queue index 5 existed because the
tap device was created with 16 rx and tx queue pairs regardless of the
configured number of queues. In the second bug when tap device was started
tx fd was closed before opening it and executing ioctl() on it. However
closing the sole fd of the device caused ioctl to fail with "No such
device".
This commit creates the configured number of rx and tx queue pairs (up to
max 16) and assigns a unique fd to each queue. It was written to solve the
first bug and was found as the right fix for the second bug as well.
David Harton [Thu, 14 Sep 2017 12:50:41 +0000 (08:50 -0400)]
net/ixgbe: eliminate duplicate filterlist symbols
Some compilers generate warnings for duplicate symbols for the
set of filter lists current defined in ixgbe_ethdev.h.
This commits moves the definition and declaration to the source
file that actually uses them and provides a function to
initialize the values akin to its flush function.
Signed-off-by: David Harton <dharton@cisco.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Radu Nicolau <radu.nicolau@intel.com>
VFs rely on config BAR for getting the MAC address, although a
random one is created a valid address is not found.
A PF port has a fixed MAC which is currently acquired using the NSPU
interface. Some NFP firmwares require the MAC being writing back
to the config BAR for doing proper MAC filtering.
Qi Zhang [Sun, 20 Aug 2017 20:05:35 +0000 (04:05 +0800)]
net/i40e: fix packet count for PF
Previously, for PF statistics we use VSI register for packet count
but use port's register for packet bytes, that cause inconsistent
situation of PF statistics when some VF is active, since it will
cover VF's packet bytes but not packet count.
The patch will take port register for PF packet count back, but still
exclude main vsi's discard packet count.
Just like previous fix, its still not perfect, (since RX packet number
is over counted when there is VF discard packet) but seems it make the
overall better.
Mellanox NICs has a limitation on the number of mbuf segments a multi
segment mbuf can have. The max number depends on the Tx offloads
requested.
The current code not enforce such limitation, which might cause
malformed work requests to be written to the device.
This commit adds verification for the number of mbuf segments posted
to the device. In case of overflow the packet will not be sent.
In addition update the nic documentation with the limitation.
Considering device limitation is 63 data segments in a work request, the
maximum number of segment in mbuf was calculated taking TSO as the worst
case:
max_nb_segs = 63 - (control_segment + ethernet segment +
TSO headers inline + inline segment +
extra inline to align to cacheline)
The current Tx error counter counts, according to its description,
the total number of packets not sent when TX ring full. It is reported
to application as part of oerrors field.
The drop due to full ring is not the statistic that should be set on
oerrors field. Such number can be counted by the application using the
return value of the Tx burst function.
The number that should be set there is the number of packets the device
could not transmit in any way, even when it has resources.
Therefore, replace this counter to count the total number of failed
transmitted packets.
Yongseok Koh [Thu, 31 Aug 2017 16:27:06 +0000 (09:27 -0700)]
net/mlx5: fix calculating TSO inline size
Tx descriptor for TSO embeds packet header to be replicated. If Tx
inline is enabled, there could be additional packet data inlined with
4B inline header ahead. And between the header and additional inlined
packet data, there may be padding to make the inline part aligned to
MLX5_WQE_DWORD_SIZE. In calculating the total size of inlined data,
the size of inline header and padding is missing.
David Harton [Wed, 13 Sep 2017 03:21:10 +0000 (23:21 -0400)]
net/i40e: fix i40evf MAC filter table
The i40e maintains a single MAC filter table for both
unicast and multicast addresses. The i40e_validate_mac_addr
function was preventing multicast addresses from being added
to the table via i40evf_add_mac_addr. Fixed the issue by
adjusting the check in i40evf_add_mac_addr.
Fixes: 4861cde46116 ("i40e: new poll mode driver") Fixes: 97ac72aa71a9 ("i40e: support setting VF MAC address") Cc: stable@dpdk.org Signed-off-by: David Harton <dharton@cisco.com> Acked-by: Beilei Xing <beilei.xing@intel.com>
The previous stats code returned only the current TX sub
device stats.
This enhancement extends it to return the sum of all sub
devices stats with history of removed sub-devices.
Dedicated stats accumulator saves the stat history of all
sub device remove events.
Each failsafe sub device contains the last stats asked by
the user and updates the accumulator in removal time.
I would like to implement ultimate snapshot on removal time.
The stats_get API needs to be changed to return error in the
case it is too late to retrieve statistics.
By this way, failsafe can get stats snapshot in removal interrupt
callback for each PMD which can give stats after removal event.
Extend the LSC event handling to support the device removal as well.
The mlx5 event handling has been made capable of receiving and
signaling several event types at once.
This support includes next:
1. Removal event detection according to the user configuration.
2. Calling to all registered mlx5 removal callbacks.
3. Capabilities extension to include removal interrupt handling.
Link status is sometimes inconsistent during a LSC event.
When it occurs, the PMD refrains from immediately notifying
the application; instead, an alarm is scheduled to check
link status later and notify the application once it has settled.
In the previous code the alarm callback calls to the interrupt
handler for link status recheck and may cause to unnecessary
interrupt events check.
This patch separates the link status update and the interrupt event
handler to avoid the unnecessary check and arranges the interrupt
handler for more interrupt supports in the future.
Comment was added in the new function to explain the inconsistent
link status reason.
David Harton [Fri, 25 Aug 2017 15:22:11 +0000 (11:22 -0400)]
net/vmxnet3: replenish ring buffers in Rx
vmxnet3 Rx processing should replenish ring buffers after new buffers
are available to prevent the interface from getting stuck in a state
that no new work is processed.
Signed-off-by: David Harton <dharton@cisco.com> Acked-by: Shrikrishna Khare <skhare@vmware.com>
Since interrupt handler is the only function relying on it, merging them
simplifies the code as there is no need for an API to return collected
events.
Link status is sometimes inconsistent during a LSC event. When it occurs,
the PMD refrains from immediately notifying the application; instead, an
alarm is scheduled to check link status later and notify the application
once it has settled.
The problem is that subsequent link status checks are only performed if
additional LSC events occur in the meantime, which is not always the case.
Worse, since support for removal events was added, rescheduled link status
checks may consume them as well without notifying the application. With the
right timing, a link loss occurring just before a device removal event may
hide it from the application.
Fixes: 6dd7b7056d7f ("net/mlx4: support device removal event") Fixes: 2d449f7c52de ("net/mlx4: fix assertion failure on link update") Cc: stable@dpdk.org Reported-by: Matan Azrad <matan@mellanox.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
When LSC or RMV events are received by the PMD but are not requested by the
application, a misleading debugging message implying the PMD does not
support them is shown.
doc: create different features files for NFP drivers
NFP PMD implement now PF and VF drivers. Although the driver
functionality is the same by now, except for initialization, it
will change with future PF additions.
A new feature is required for describing the firmware upload
capability coming with the NFP PF now, so the PF file will be
updated soon in another patch.
SRIOV is not supported by the PF yet, and it is wrong to include it
as a VF driver feature, so none of the files have such a feature.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Acked-by: John McNamara <john.mcnamara@intel.com>
PMD has to configure the hardware port: link up when port started and
link down when port stopped. This is not required for VFs but it is
for PF ports.
A minor refactoring in PMD stop and close functions is done because the
Link down needs to happen just when device is stopped.
net/nfp: add NSP support for HW link configuration
Adding a new NSPU command for being able to read and write the ethernet
port table from/to the NFP. This will allow the PMD to put the Link up
or down when a port is started or stopped. Until now, this was performed
by the firmware independently of PMD functionality.
The ethernet port table has also some other useful information that will
be used in further commits.
Usually NSPU is used at device probe time and that is sequential code
execution. However, reading and writing the NFP eth table can be done at
different times and from different cores, and it implies it could happen
a concurrent access. A spinlock is added to the global nspu object for
protecting the NFP and avoiding the concurrent access.
A NFP PF PCI device can have several physical ports, up to 8. Because
DPDK core creates one eth_dev per PCI device, nfp pf probe function
is used. Number of PF ports is obtained from firmware symbol using
NSPU API. Inside PF probe function an eth_dev per port is created and
nfp_net_init invoked for each port.
There are some limitations regarding multiport: rx interrupts and
device hotplug are not supported.
Interrupts are handled with the VFIO or UIO drivers help. Those
drivers just know about PCI devices, so it is not possible, without
changing how DPDK handles interrupts, manage interrupts assigned to
different PF ports.
About hotplug, the problem is this functionality is based on a PCI
device, and although device plugin is possible, which would add as
many ports as supported by firmware, unplug is based on device name
linked to a eth_dev, and device name has a suffix now (_portX, with X
being the port index) which DPDK core is not aware of. While rx
interrupts with multiport could be likely solved with some layer of
indirection, hotplug would require changes to DPDK core.
net/nfp: allocate ethernet device from PF probe function
NFP can support several physical ports per PF device. Depending on
firmware info, one or more eth_dev objects will need to be created.
This patch adds the call to create just one eth_dev by now with future
commits supporting the multiport option. Once the eth_dev has been
created, probe function invokes pmd initialization with the new eth_dev.
net/nfp: support PF devices inside PMD initialization
nfp_net_init is where a dpdk port related to a eth_dev is initialized.
NFP VF vNICs use VF PCI BARs as they come after SRIOV is enabled. But for
NFP PF vNIC just a subset of PF PCI BARs are used.
This patch adds support for mapping the right PCI BAR subsets for the PF
vNIC. It uses the NSPU API functions introduced previously for configuring
NFP expansion bars.
NFP vNICs use a subset of PCI device BARs. vNIC rx/tx bars point to
NFP hardware queues unit. Unlike vNIC config bar, the NFP address is
always the same so the NFP expansion bar configuration always uses
the same hardcoded physical address.
This patch adds a NSPU API function for getting vNIC rx/tx bars
mapped through a expansion bar using that specific physical address.
The PMD will use the PCI bar offset returned for mapping the vNIC
rx/tx bars.
NFP vNICs use a subset of PCI device BARs. vNIC config bar depends on
firmware symbol defining how to map it through a NFP expansion bar.
This patch adds a NSPU API function for getting a vNIC config bar
mapped through a expansion bar giving a firmware symbol. The PMD will
use the PCI bar offset returned for accessing the vNIC bar.
PMD will use this function for uploading the firmware. First, a
symbol resolution is done for finding out if there is a firmware
already there. If not, a NFP reset is called before using NSPU
fw upload code.
Firmware has symbols helping to configure things like number of
PF ports, vNIC BARs addresses inside NFP memories, or ethernet
link state. Different firmware apps have different things to map
and likely different internal NFP addresses to use.
Host drivers can use the NSPU interface for getting symbol data
regarding different hardware configurations. Once the driver has
the information about a specific object, a mapping is required
configuring an NFP expansion bar creating a device PCI bar window.
NSPU interface declares a buffer controlled by the NFP NSP service
processor. It is possible to send commands to the NSP using the NSPU
and this buffer for data related to the command. A command can imply
buffer read, buffer write, both or none.
Initial command for resetting the firmware is added as well which
does not require the buffer at all.
Commands will allow firmware upload, symbol resolution and ethernet
link configuration. Future commands will allow specific offloads like
flow offloads and eBPF offload.
Configuring the NFP PMD for using the PF requires access through the
NSPU interface for device configuration. This patch adds a specific probe
function for the PF which uses the NSPU interface. Just basic NSPU access
is done by now reading the NSPU ABI version.
Working with the PF requires access to the NFP for basic configuration.
NSP is the NFP Service Processor helping with hardware and firmware
configuration. NSPU is the NSP user space interface for working with the
NSP.
Configuration through NSPU allows to create PCI BAR windows for accessing
different NFP hardware units, including the BAR window for the NSPU
interface access itself. NFP expansion bar registers are used for creating
those PCI BAR windows. NSPU uses a specific expansion bar which is
reprogrammed for accessing/doing different things.
Other expansion bars will be configured later for configuring the PF vNIC
bars, a subset of PF PCI BARs.
In function t4_wr_mbox_meat_timeout(), dynamic memory stored
in 'temp' variable and it is not freed when the function return,
this is a possible memory leak.
Fixes: 3bd122eef2cc ("cxgbe/base: add hardware API for Chelsio T5 series adapters") Cc: stable@dpdk.org Signed-off-by: Congwen Zhang <zhang.congwen@zte.com.cn> Acked-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Xueming Li [Mon, 4 Sep 2017 11:43:51 +0000 (19:43 +0800)]
net/mlx5: fix tunnel offload detection
PMD driver got random tunnel_en value on ConnextX-4LX NIC, depends on
compile optimization level. The variable was not initialized and
detection logic was absent.
Allocation and management of Tx/Rx queue arrays is done by wrappers at the
ethdev level. The resulting information is copied to the private structure
while configuring the device, where it is managed separately by the PMD.
This is redundant and consumes space in the private structure.
Relying more on ethdev also means there is no need to protect the PMD
against burst function calls while closing the device anymore.
Considering the remaining functionality, the only difference between
isolated and non-isolated mode is that a default MAC flow rule is present
with the latter.
The restriction on enabling isolated mode before creating any queues can
therefore be lifted.
Link status (LSC) and removal (RMV) interrupts share a common handler and
are toggled simultaneously from common install/uninstall functions.
Four additional wrapper functions (two for each interrupt type) are
currently necessary because the PMD maintains an internal configuration
state for interrupts (priv->intr_conf).
This complexity can be avoided entirely since the PMD does not disable
interrupts configuration parameters in case of error anymore.
With this commit, only two functions are necessary to toggle interrupts
(including Rx) during start/stop cycles.
The naming scheme for these functions is overly verbose and not accurate
enough, with too many "handler" functions that are difficult to
differentiate (e.g. mlx4_dev_link_status_handler(),
mlx4_dev_interrupt_handler() and priv_dev_status_handler()).
This commit renames them and removes the unnecessary dev argument which can
be retrieved through the private structure where needed. Documentation is
updated accordingly.
File descriptors used for interrupts processing must be made non-blocking.
Doing so as soon as they are opened instead of waiting until they are
needed is more efficient as it avoids performing redundant system calls and
run through their associated error-handling code later on.