git.droids-corp.org - dpdk.git/log

app/testpmd: fix crash

Testpmd tries to calculate mbuf size based on "max Rx packet size" and
"max MTU segment number".
When driver set a "nb_mtu_seg_max" to zero, it causes division by zero
segmentation fault in testpmd.

If the PMD set "nb_mtu_seg_max" to zero, testpmd shouldn't try to
calculate the mbuf size.

Fixes: 33f9630fc23d ("app/testpmd: create mbuf based on max supported segments")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>

bus/pci: fix TOCTOU for sysfs access

Using access followed by open causes a static analysis warning
about Time of check versus Time of use. Also, access() and
open() have different UID permission checks.

This is not a serious problem; but easy to fix by using errno instead.

Coverity issue: 300870
Fixes: 4a928ef9f611 ("bus/pci: enable write combining during mapping")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>

net/bnxt: fix build with some compilers

The driver was defining its own version of roundup which was
conflicting with another version defined elsewhere.

Change the local definition of roundup to avoid compilation errors.

Fixes: f8168ca0e690 ("net/bnxt: support thor controller")
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/mlx5: fix master device Netlink socket sharing

There is the patch [1] that uses master device Netlink socket
to retrieve master device link settings. This is not thread safe
because this resource may be in use by other call to the master
device itself. Using the same Netlink socket concurrently from
the multiple threads causes Netlink requests malfunction and
must be eliminated. The patch replaces master Netlink socket
with the socket from representor device.

[1] http://patches.dpdk.org/patch/53120/

Fixes: 0333b2f584d9 ("net/mlx5: inherit master link settings for representors")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: recover secondary process Tx errors

The SQ errors recovery mechanism in the PMD invokes a Verbs
functions to modify the RQ states in order to reset the SQ and to
reactivate it.

These Verbs functions are not allowed to be invoked from a secondary
process, hence the PMD skips the recovery when the error is captured
by secondary processes queues.

Using the DPDK IPC mechanism the secondary process can request Verbs
queues state modifications to be done synchronically by the primary
process.

Add support for secondary process Tx errors recovery.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: recover secondary process Rx errors

The RQ errors recovery mechanism in the PMD invokes a Verbs functions to
modify the RQ states in order to reset the RQ and to reactivate it.

These Verbs functions are not allowed to be invoked from a secondary
process, hence the PMD skips the recovery when the error is captured by
secondary processes queues.

Using the DPDK IPC mechanism the secondary process can request Verbs
queues state modifications to be done synchronically by the primary
process.

Add support for secondary process Rx errors recovery.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: handle Tx completion with error

When WQEs are posted to the HW to send packets, the PMD may get a
completion report with error from the HW, aka error CQE which is
associated to a bad WQE.

The error reason may be bad address, wrong lkey, bad sizes, etc.
that can wrongly be configured by the PMD or by the user.

Checking all the optional mistakes to prevent error CQEs doesn't make
sense due to performance impacts and huge complexity.

The error CQEs change the SQ state to error state what causes all the
next posted WQEs to be completed with CQE flush error forever.

Currently, the PMD doesn't handle Tx error CQEs and even may crashed
when one of them appears.

Extend the Tx data-path to detect these error CQEs, to report them by
the statistics error counters, to recover the SQ by moving the state
to ready again and adjusting the management variables appropriately.

Sometimes the error CQE root cause is very hard to debug and even may
be related to some corner cases which are not reproducible easily, hence
a dump file with debug information will be created for the first number
of error CQEs, this number can be configured by the PMD probe
parameters.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: extend Rx completion with error handling

When WQEs are posted to the HW to receive packets, the PMD may receive
a completion report with error from the HW, aka error CQE which is
associated to a bad WQE.

The error reason may be bad address, wrong lkey, small buffer size,
etc. that can wrongly be configured by the PMD or by the user.

Checking all the optional mistakes to prevent error CQEs doesn't make
sense due to performance impacts, moreover, some error CQEs can be
triggered because of the packets coming from the wire when the DPDK
application has no any control.

Most of the error CQE types change the RQ state to error state what
causes all the next received packets to be dropped by the HW and to be
completed with CQE flush error forever.

The current solution detects these error CQEs and even reports the
errors to the user by the statistics error counters but without
recovery, so if the RQ inserted to the error state it never moves to
ready state again and all the next packets ever will be dropped.

Extend the error CQEs handling for recovery by moving the state to
ready again, and rearranging all the RQ WQEs and the management
variables appropriately.

Sometimes the error CQE root cause is very hard to debug and even may
be related to some corner cases which are not reproducible easily,
hence a dump file with debug information will be created for the first
number of error CQEs, this number can be configured by the PMD probe
parameters.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: separate Rx queue initialization

Move the RQ WQEs initialization code to separate function as an
arrangement to CQE error recovering for code reuse.

CC: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: mitigate Rx doorbell memory barrier

The RQ WQEs must be written in the memory before the HW gets the RQ
doorbell, hence a memory barrier should be triggered after the WQEs
writing and before the doorbell writing.

The current code used rte_wmb barrier which ensures that all the memory
stores were done while it is enough to use rte_cio_wmb barrier for the
local memory stores because the WQEs are in local memory.

CC: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: fix device arguments error detection

When bad device arguments are added to the DPDK command line, the PMD
ignores all the command line arguments specified by the user and uses
the default values instead.

This behavior doesn't make sense because the user intention is to force
some device parameters and expects to get an error in case of
problematic issues with the arguments.

Stop probing and report an error in case of problematic command line
arguments.

Fixes: e72dd09b614e ("net/mlx5: add support for configuration through kvargs")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: add log file procedure for debug data

Add a global function in the PMD which dumps debug information to
specific file.

The data can be printed in hexadecimal format or as regular string.

The number of debug files per PMD entity should be limited by a new PMD
probe parameter called max_dump_files_num.

The files will be created in the /var/log directory or in the current
directory.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: remove Rx queues indexes correlation

There is a full correlation between the CQE indexes to the WQE indexes
in the vectorized Rx queues management.

When the RQ is inserted to the reset state, the correlation may break
because the HW starts the RQ polling from index 0 while the CQ polling
continues regularly.

As an arrangement to CQE errors handling, when the RQ can be reset,
the correlation dependence should be removed from all the Rx queues
index managements.

Remove the aforementioned dependence from the vectorized Rx burst
functions.

Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/ixgbevf: add full link status check option

To get the VF's link status by calling 'rte_eth_link_get_nowait()', the
VF not only check PF's physical link status, but also check the mailbox
running status. And mailbox checking will generate mailbox interrupt in
PF, it will be worse if many VFs are running in the system, the PF will
have to handle many interrrupts.

Normally, checking the PF's physical link status is enough for nowait.
For different scenarios, adding an 'pflink_fullchk' option to control
whether to check the link fully or not.

Fixes: 91546fb62e67 ("net/ixgbevf: fix link state")
Cc: stable@dpdk.org
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Scott Daniels <daniels@research.att.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/bnxt: fix icc build

Address build errors reported by intel compiler while compiling
on Windows. Instead of typeof() using the actual type in ALLOW_FUNC

Cc: stable@dpdk.org
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>

net/bnxt: fix interrupt vector initialization

Initialize the vector array when it is valid, thereby
preventing a case were it may be accessed when
the array is unallocated

Fixes: 1fe427fd08ee ("net/bnxt: support enable/disable interrupt")
Cc: stable@dpdk.org
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>

net/bnxt: fix xstats

If the HWRM_PORT_QSTATS_EXT fails to initialize
fw_rx_port_stats_ext_size or fw_tx_port_stats_ext_size,
the driver can end up passing junk statistics to the application.

Instead of relying on the application to initialize the xstats
buffer before calling the xstats_get dev_op, memset xstats
with zeros to avoid returning or displaying incorrect statistics.

Also fixed the buffer starting offset.

Fixes: f55e12f33416 ("net/bnxt: support extended port counters")
Cc: stable@dpdk.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>

net/bnxt: use configured MTU during load

The MTU value of a port can be (re)configured out-of-band.
FW will be returning this configured MTU as part of func_qcfg cmd.
Driver to use this value during load time.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: check for null completion ring doorbell

It is observed that sometimes during init, the bnxt_int_handler() gets
invoked while the cpr->cp_db.doorbell is not yet initialized. Check for
the same and return.

Fixes: f7ecea911ec5 ("net/bnxt: fix interrupt handler")
Cc: stable@dpdk.org
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: support redirecting tunnel packets to VF

Add code to redirect GRE, NVGRE and VXLAN tunnel packets
to the specified VF.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/enic: remove flow locks

There is no requirement for thread safety in the flow PMD code and no
need for the locks.

Fixes: 6ced137607d0 ("net/enic: flow API for NICs with advanced filters enabled")
Cc: stable@dpdk.org
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Hyong Youb Kim <hyonkim@cisco.com>

net/enic: remove flow count action support

The firmware in 1400 series VIC adapters which would support COUNT
flow action was postponed and reworked. The capability will be
re-added in a future release when the firmware is available.

This reverts the following commits.
commit 86df6c4e2fce ("net/enic: support flow counter action")
commit 1b4ce87dc5e6 ("net/enic: fix counter action")

Cc: stable@dpdk.org
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Hyong Youb Kim <hyonkim@cisco.com>

net/enic: report speed capabilities

Available link speeds are based on VIC adapter model, which is encoded
in PCI subsystem device ID.

Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>

net/enic: set min and max MTU

These values correspond to those used in the MTU handler (enic_set_mtu).

Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>

net/sfc: support Rx interrupts for ef10 datapath

Similar to support for efx datapath, Rx interrupt disabling
just avoids rearming the next time.

Signed-off-by: Georgiy Levashov <georgiy.levashov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>

net/sfc: support Rx interrupts for efx datapath

When Rx interrupts are disabled, we simply disable rearm when
the interrupt fires the next time. So, the next packet will
trigger interrupt (if it is not happened yet after previous Rx
burst processing).

Signed-off-by: Georgiy Levashov <georgiy.levashov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>

net/vmxnet3: fix Tx prepare to set positive rte_errno

Fixes: baf3bbae5556 ("net/vmxnet3: add Tx preparation")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yong Wang <yongwang@vmware.com>

net/qede: fix Tx prepare to set positive rte_errno

Fixes: 29540be7efce ("net/qede: support LRO/TSO offloads")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/ixgbe: fix Tx prepare to set positive rte_errno

Fixes: 7829b8d52be0 ("net/ixgbe: add Tx preparation")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/ice: fix Tx prepare to set positive rte_errno

Fixes: 17c7d0f9d6a4 ("net/ice: support basic Rx/Tx")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/iavf: fix Tx prepare to set positive rte_errno

Fixes: a2b29a7733ef ("net/avf: enable basic Rx Tx")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/i40e: fix Tx prepare to set positive rte_errno

Fixes: 3f33e643e5c6 ("net/i40e: add Tx preparation")
Fixes: bfeed0262b0c ("net/i40e: check illegal packets")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/fm10k: fix Tx prepare to set positive rte_errno

Fixes: 9b134aa39716 ("net/fm10k: add Tx preparation")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>

net/enic: fix Tx prepare to set positive rte_errno

Fixes: 1e81dbb5321b ("net/enic: add Tx prepare handler")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/e1000: fix Tx prepare to set positive rte_errno

Fixes: 2b76648872c9 ("net/e1000: add Tx preparation")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/atlantic: fix Tx prepare to set positive rte_errno

Fixes: 2b1472d7150c ("net/atlantic: implement Tx path")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: fix Tx prepare documentation to use positive errno

Tx prepare documentation was incorrectly suggesting to use
negative rte_errno.

Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation")
Fixes: 439a90b5f2a7 ("ethdev: reorder inline functions")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/bnx2x: fix supported max Rx/Tx descriptor count

Driver does not provide limit on number Rx and Tx descriptors per queue,
this may result in application configuring 64k descriptors (default set
by rte_eth_dev_info_get()) and further result in issues in PMD and HW
flows due to unsupported number.

Fixes: 540a211084a7 ("bnx2x: driver core")
Cc: stable@dpdk.org
Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
Acked-by: Rasesh Mody <rmody@marvell.com>

net/bnx2x: fix link state

Don't call bnx2x_link_status_update() from bnx2x_link_update().
Actual use case of bnx2x_link_status_update() is to update the link
status in shared memory between driver and MFW, and not to get the
link status from HW.

So ideally, bnx2x_link_status_update() should be called when there
is an actual link event or change in link status.

Calling bnx2x_link_status_update() from bnx2x_link_update() may
corrupt the data of link status in shared memory and result
in inconsistent state of link.

Fixes: 540a211084a7 ("bnx2x: driver core")
Cc: stable@dpdk.org
Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
Acked-by: Rasesh Mody <rmody@marvell.com>

net/bnx2x: fix memory leak

bnx2x_free_hsi_mem() does not free DMA memory.
Fix it here.

Fixes: 540a211084a7 ("bnx2x: driver core")
Cc: stable@dpdk.org
Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
Acked-by: Rasesh Mody <rmody@marvell.com>

net/bnx2x: fix interrupt flood

PMD sets up and clears the slow path interrupt status block in dev_start
and dev_stop flow and slow path interrupt status block DMA memory for
device is allocated in dev_configure flow.

This situation creates a state where, after dev_stop is called, and if
there is a slow path interrupt from device, PMD sees the old value of
status block consumer in dev_start flow, since DMA memory for status
block belongs to old configuration and dev_start will result in
new slow path interrupt status block configuration.
And since PMD fails to ack new slow path interrupt with correct status
block consumer value, device continues to trigger interrupt causing an
interrupt flood.

Fix is to create and destroy status block DMA memory in dev_start and
dev_stop flow instead of dev_configure and dev_close flow.

Fixes: 540a211084a7 ("bnx2x: driver core")
Cc: stable@dpdk.org
Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
Acked-by: Rasesh Mody <rmody@marvell.com>

net/bnx2x: fix packet drop

Patch "8bd31421c593 ("net/bnx2x: fix ramrod timeout")"
introduced a regression where sc->scan_fp flags is
set for unexpectedly long time. So the slow path completion
handler flow is run unnecessarily which walks over receive
descriptor ring of fast path and drops the data packets while looking
for slow path completion descriptor out of fast path ring.

This issue is seen under heavy traffic with link events happening
in background.

Fixes: 8bd31421c593 ("net/bnx2x: fix ramrod timeout")
Cc: stable@dpdk.org
Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
Acked-by: Rasesh Mody <rmody@marvell.com>

net/ena: fix assigning NUMA node to IO queue

Previous solution was using memzones in invalid way in hope to assign
IO queue to the appropriate NUMA zone.

The right way is to use socket_id from the rx/tx queue setup function
and then pass it to the IO queue.

Fixes: 3d3edc265fc8 ("net/ena: make coherent memory allocation NUMA-aware")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

app/testpmd: create mbuf based on max supported segments

Configuring buffer size based following parameters:
- max-pkt-len
- max supported segments per MTU

Buffer size are configured as given below:
- If platform supports infinite segments per packet then default
buffer size is used.
- If platform supports nb_mtu_seg_max segments then buffer size
is configured as (max-pkt-len / nb_mtu_seg_max) + headroom

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: add default value for max segment

rte_eth_dev_info structure exposes, nb_seg_max & nb_mtu_seg_max
to provide maximum number of supported segments for a given platform.

Defining UINT16_MAX as default value of above mentioned variables to
expose support of infinite/maximum segments.

Based on above values, application can decide best size for buffers
while creating mbuf pool.

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

net/af_xdp: remove unused struct member

Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>

net/af_xdp: support multi-queue

This patch adds two parameters `start_queue` and `queue_count` to
specify the range of netdev queues used by AF_XDP pmd.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>

net/af_xdp: enable zero copy by external mbuf

Implement zero copy of af_xdp pmd through mbuf's external memory
mechanism to achieve high performance.

This patch also provides a new parameter "pmd_zero_copy" for user, so they
can choose to enable zero copy of af_xdp pmd or not.

To be clear, "zero copy" here is different from the "zero copy mode" of
AF_XDP, it is about zero copy between af_xdp umem and mbuf used in dpdk
application.

Suggested-by: Vipin Varghese <vipin.varghese@intel.com>
Suggested-by: Tummala Sivaprasad <sivaprasad.tummala@intel.com>
Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>

net/failsafe: fix reported device info

The failsafe driver device info had several issues in the
info it reported in dev_info_get:
  - it cleared dev_info->device set in rte_eth_dev_info_get
  - many fields (for example max_rx_queue) should be the minimum
    of all sub devices
  - it reported tx capa for the active transmit device, but
    the device may change.

There was enough messed up that ended up reworking the info_get
handler. There is no need to save current values or have a
template for defaults.

Fixes: 4e31ee26ed51 ("net/failsafe: report actual device capabilities")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>

net/i40e: fix uninitialized value

Initializes mark_spec pointer to NULL.

Coverity issue: 341075
Fixes: 0bbcfc706a2b ("net/i40e: support MARK and RSS flow action")
Signed-off-by: Mesut Ali Ergin <mesut.a.ergin@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>

net/bnxt: enable RSS for thor-based controllers

Make changes needed to support rss for thor-based controllers.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: enable completion coalescing for thor

Enable completion coalescing for Thor-based adapters.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: support thor controller

This commit adds support to the bnxt PMD for devices
based on the BCM57508 "thor" Ethernet controller.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>

net/bnxt: refactor ring allocation

Reduce code duplication and prepare for supporting hardware with
different ring allocation requirements by refactoring ring
allocation code.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: refactor doorbell handling

Reduce code duplication and prepare for newer controllers that
use different doorbell protocols by refactoring doorbell handling
code.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: support extended HWRM request sizes

Enable support for extended request sizes.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: reset function earlier in initialization

Move function reset to beginnng of initialization sequence.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: use consistent values for VNIC RSS rule

Use consistent values for vnic->rss_rule. No functional change,
these all equate to uint16_t 0xffff.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix variable width in endian conversion

Use 32-bit conversion width when converting to 32-bit values.

Fixes: 6371b91fb66d ("net/bnxt: add ring alloc/free")
Cc: stable@dpdk.org
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix ring type macro name

Use consistent macro names for ring type values. (There is no
functional change, the "alloc" and "free" values are identical.)

Fixes: 6371b91fb66d ("net/bnxt: add ring alloc/free")
Cc: stable@dpdk.org
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: fix endianness in ring macros

Descriptor fields in CP ring are in little-endian form, convert
to CPU endian before performing arithmetic operations.

Also use more general comparison when checking for ring
index wrap.

Fixes: f2a768d4d186 ("net/bnxt: add completion ring")
Cc: stable@dpdk.org
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/sfc: advertise offload capabilities by Tx datapaths

Tx datapath feature bits were useful on migration from the old offload API
to the new one. However, right now it just adds indirection which
complicates code reading and understanding. Also addition of a new
offloads requires addition of a new feature bits and makes patches longer
and harder to understand. So, remove feature bits which correspond to Tx
offloads and simply advertise device and per-queue offloads directly.
Generic code could still mask some offloads if running HW or FW does not
support it.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>

net/sfc: advertise offload capabilities by Rx datapaths

Rx datapath feature bits were useful on migration from the old offload API
to the new one. However, right now it just adds indirection which
complicates code reading and understanding. Also addition of a new
offloads requires addition of a new feature bits and makes patches longer
and harder to understand. So, remove feature bits which correspond to Rx
offloads and simply advertise device and per-queue offloads directly.
Generic code could still mask some offloads if running HW or FW does not
support it.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>

ethdev: add a check on mempool during RxQ setup

We currently have no check on the mempool pointer passed to
rte_eth_rx_queue_setup.
Let's avoid a plain crash when dereferencing it.

Suggested-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

net/ena: fix Rx checksum errors statistics

Rx checksum flags and input errors shouldn't be updated on Tx, as it
would work only for packets forwarding.

The ierrors statistic should be updated on Rx, right after checking
Rx checksum flags if the Rx checksum offload is enabled.

Fixes: 1173fca25af9 ("ena: add polling-mode driver")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>

net/ena: fix Tx statistics

Instead of counting number of used NIC Tx bufs just count number
of Tx packets.

Fixes: 45b6d86184fc ("net/ena: add per-queue software counters stats")
Cc: stable@dpdk.org
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>

net/memif: introduce memory interface PMD

Shared memory packet interface (memif) PMD allows for DPDK and any other
client using memif (DPDK, VPP, libmemif) to communicate using shared
memory. The created device transmits packets in a raw format. It can be
used with Ethernet mode, IP mode, or Punt/Inject. At this moment, only
Ethernet mode is supported in DPDK memif implementation. Memif is Linux
only.

Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/virtio: remove useless pointer checks

This patch removes useless checks on 'prev' pointer, as it
is always set before with a valid value.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/virtio: fix segment length in mergeable packed Rx

Head segment data_len field is wrongly summed with the length
of all the segments of the chain, whereas it should be the
length of the first segment only.

Fixes: a76290c8f1cf ("net/virtio: implement Rx path for packed queues")
Cc: stable@dpdk.org
Reported-by: Yaroslav Brustinov <ybrustin@cisco.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/virtio: fix mergeable Rx with segmented packet

After having dequeued a burst of descriptors, there may be a
need to dequeue a few more if the last packet was segmented
and not complete. When it happens, the extra segments were
not properly attached to the mbuf chain, and so were lost.

Also, head segment data_len field is wrongly summed with
the length of all the segments of the chain.

This patch fixes both the mbuf chaining and head segment's
data_len field

Fixes: bcac5aa207f8 ("net/virtio: improve batching in mergeable path")
Cc: stable@dpdk.org
Reported-by: Yaroslav Brustinov <ybrustin@cisco.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/virtio: fix in-order Rx with segmented packet

After having dequeued a burst of descriptors, there may be a
need to dequeue a few more if the last packet was segmented
and not complete. When it happens, the extra segments were
not properly attached to the mbuf chain, and so were lost.

Also, head segment data_len field is wrongly summed with
the length of all the segments of the chain.

This patch fixes both the mbuf chaining and head segment's
data_len field.

Fixes: e5f456a98d3c ("net/virtio: support in-order Rx and Tx")
Cc: stable@dpdk.org
Reported-by: Yaroslav Brustinov <ybrustin@cisco.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/vhost: release port upon close

Set RTE_ETH_DEV_CLOSE_REMOVE upon probe so all the private
resources for the port can be freed by rte_eth_dev_close().

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

eal/x86: force inlining of all memcpy and mov helpers

Some helpers in the header file are forced inlined other are
only inlined, this patch forces inline for all.

It will avoid it to be embedded as functions when called multiple
times in the same object file. For example, when we added packed
ring support in vhost-user library, rte_memcpy_generic got no
more inlined.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

vhost: simplify descriptor buffer prefetching

Now that we have a single function to map the descriptors
buffers, let's prefetch them there as it is the earliest
place we can do it.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

vhost: do not inline unlikely fragmented buffers code

Handling of fragmented virtio-net header and indirect descriptors
tables was implemented to fix CVE-2018-1059. It should never
happen with healthy guests and so is already considered as
unlikely code path.

This patch moves these bits into non-inline dedicated functions
to reduce the I-cache pressure.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

vhost: do not inline packed and split functions

At runtime either packed Tx/Rx functions will always be called,
or split Tx/Rx functions will always be called.

This patch removes the forced inlining in order to reduce
the I-cache pressure.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

vhost: un-inline dirty pages logging functions

In order to reduce the I-cache pressure, this patch removes
the inlining of the dirty pages logging functions, that we
can consider as cold path.

Indeed, these functions are only called while doing live
migration, so not called most of the time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

net/virtio: remove useless check on mempool

This .rx_queue_setup devop is called after ethdev already dereferenced
the mempool pointer.
No need to check and we can remove this rte_exit.

Fixes: 48cec290a3d2 ("net/virtio: move queue configure code to proper place")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/i40e: eliminate weak symbols in data path

Use of weak symbols can hide makefile errors especially when
custom makefiles are used. Removing the use of weak symbols
to avoid a stub function being linked in production code.

Signed-off-by: David Harton <dharton@cisco.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

net/af_xdp: fix remove path

When users call rte_eth_dev_close() and rte_dev_remove(), the af_xdp
pmd return -1 (EPERM) due to eth_dev == NULL.

Since the af_xdp pmd driver advertises RTE_ETH_DEV_CLOSE_REMOVE, all
the resources are freed on rte_eth_dev_close(). rte_dev_remove() tries
to detach device and subsequently calls rte_pmd_af_xdp_remove() that
tries to free already freed resources and fails.
Fix it by return success.

Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
Cc: stable@dpdk.org
Reported-at: https://patchwork.ozlabs.org/patch/1106528/
Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Xiaolong Ye <xiaolong.ye@intel.com>

net/mlx5: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/ixgbe: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/i40e: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/enic: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/ena: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/e1000: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/cxgbe: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/bonding: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/bnxt: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Lance Richardson <lance.richardson@broadcom.com>

net/axgbe: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/ark: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/atlantic: remove unnecessary cast

The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

net/bnxt: update HWRM API to version 1.10.0.74

Update HWRM API to version 1.10.0.74

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>

net/bnxt: update HWRM API to version 1.10.0.48

Update HWRM version to 1.10.0.48

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>

net/bnxt: update HWRM API to version 1.10.0.19

Update HWRM API to version 1.10.0.19

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Lance Richardson <lance.richardson@broadcom.com>
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>

net/bnxt: fix RSS RETA indirection table ops

We are trying to update the indirection table for all the VNICs.
We should update the table only for the default vnic0.

Fix the reta update function to only update table entries that are
selected by the update mask. Translate queue number to firmware
group ID when updating an entry.

Fix reta query op to only return table entries as identfied by the
provided mask. Translate firmware group IDs to queue numbers.

Removed extraneous code from bnxt_reta_query_op().

Fixes: d819382543f3 ("net/bnxt: add RSS redirection table operations")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Rahul Gupta <rahul.gupta@broadcom.com>

net/bnxt: implement SSE vector mode

Introduce SSE vector mode support for the bnxt pmd.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

net/bnxt: compute and store scattered Rx status

In preparation for a bnxt vector-mode driver, compute and store
scattered_rx status for the device when started.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/bnxt: move Tx bd checking to header file

To allow sharing of tx_bds_in_hw() and bnxt_tx_avail() between
vector-mode and non-vector transmit functions, move these functions
into bnxt_txr.h.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/bnxt: update release notes

Update release doc briefly describing updates to bnxt PMD done during
19.05 release, including transmit optimization changes in the commits
identified by the "Fixes:" tags below.

Fixes: 5ef3592c97b9 ("net/bnxt: support bulk free of Tx mbufs")
Fixes: 220de9869bc3 ("net/bnxt: optimize Tx batching")
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>