git.droids-corp.org - dpdk.git/log

dma/idxd: fix non-AVX builds with old compilers

When building without AVX2 support using an older compiler e.g. gcc 4.8
on Centos/RHEL 7, we get build errors due to the use of AVX2 intrinsics.
This is because the compiler does not support
"__attribute__((target(AVX2)))" function attribute. Disable build of
this driver such edge cases.

Generic builds using recent compilers, and all builds with a minimum
baseline of AVX2 are unaffected by this change.

Fixes: aa802b10237c ("dma/idxd: fix AVX2 in non-datapath functions")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Yu Jiang <yux.jiang@intel.com>

raw/ioat: fix build when ioat dmadev enabled

The build of the raw/ioat driver only occurs when the equivalent dmadev
drivers are disabled. Complications occur when the ioat dmadev is being
built but not the idxd. In this case, only the idxd part of raw/ioat
gets built, but the definition of the logtype is in the ioat part,
causing build errors.

.../raw_ioat_idxd_bus.c.o: In function `idxd_vdev_mmap_wq':
idxd_bus.c:(.text+0x116): undefined reference to `ioat_pmd_logtype'

Fix this by moving the logtype definition to the common C file, and
renaming it to avoid conflicts with a similarly named value in the
dma/ioat driver.

Fixes: ff06fa2cf3ba ("raw/ioat: probe idxd PCI")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

raw/ioat: fix build missing errno include

The inline functions in rte_idxd_rawdev_fns.h make use of rte_errno, but
the header with its definition is not included by that file leading to
build errors.

Fixes: f82c87eb14a4 ("raw/ioat: move idxd functions to separate file")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

remove unnecessary null checks

Found by nullfree.cocci.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[David: for lpm parts:]
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
[David: for vdpa/mlx5 parts:]
Acked-by: Matan Azrad <matan@nvidia.com>
[David: for dma/dpaa2, raw/ifpga, vdpa/mlx5:]
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
[David: reran cocci.sh and updated common/mlx5 and cryptodev asym test]
Signed-off-by: David Marchand <david.marchand@redhat.com>

cocci/nullfree: add more functions

There are more functions in DPDK which have the semantics
as free() when passed NULL pointer. Also, put the checks
in alphabetical order.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

lib: document free functions

Make sure all functions which use the convention that XXX_free(NULL)
is a nop are all documented.

The wording is chosen to match the documentation of free(3).
"If ptr is NULL, no operation is performed."

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
[David: squashed with other series updates, unified wording]

remove passive voice in function description

Remove extraneous phrase "This API is used to" and use
active instead of passive voice when describing a function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
[David: for raw/ioat and dmadev parts:]
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Conor Walsh <conor.walsh@intel.com>

doc: fix flow integrity hardware support in mlx5 guide

Current MLX5 PMD documentation says that entire `ConnectX-6` family
supports flow integrity feature.

Flow integrity offload is not supported on vanilla `ConnectX-6`.
It is available on `ConnectX-6 Dx`, `ConnectX-6 Lx` and
`BlueField 2`.

Fixes: 79f8952783d0 ("net/mlx5: support integrity flow item")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: fix stack buffer overflow in drop action

The mlx5_drop_action_create function use mlx5_malloc for allocating
'hrxq', but don't allocate for 'rss_key'. This is wrong and it can
cause buffer overflow.

Detected with address sanitizer:
0 (/usr/lib64/libasan.so.4+0x7b8e2)
1 in mlx5_devx_tir_attr_set ../drivers/net/mlx5/mlx5_devx.c:765
2 in mlx5_devx_hrxq_new ../drivers/net/mlx5/mlx5_devx.c:800
3 in mlx5_devx_drop_action_create ../drivers/net/mlx5/mlx5_devx.c:1051
4 in mlx5_drop_action_create ../drivers/net/mlx5/mlx5_rxq.c:2846
5 in mlx5_dev_spawn ../drivers/net/mlx5/linux/mlx5_os.c:1743
6 in mlx5_os_pci_probe_pf ../drivers/net/mlx5/linux/mlx5_os.c:2501
7 in mlx5_os_pci_probe ../drivers/net/mlx5/linux/mlx5_os.c:2647
8 in mlx5_os_net_probe ../drivers/net/mlx5/linux/mlx5_os.c:2722
9 in drivers_probe ../drivers/common/mlx5/mlx5_common.c:657
10 in mlx5_common_dev_probe ../drivers/common/mlx5/mlx5_common.c:711
11 in mlx5_common_pci_probe ../drivers/common/mlx5/mlx5_common_pci.c:150
12 in rte_pci_probe_one_driver ../drivers/bus/pci/pci_common.c:269
13 in pci_probe_all_drivers ../drivers/bus/pci/pci_common.c:353
14 in pci_probe ../drivers/bus/pci/pci_common.c:380
15 in rte_bus_probe ../lib/eal/common/eal_common_bus.c:72
16 in rte_eal_init ../lib/eal/linux/eal.c:1286
17 in main ../app/test-pmd/testpmd.c:4112

Fixes: 0c762e81da9b ("net/mlx5: share Rx queue drop action code")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: fix metering on E-Switch Manager

When meter is used by E-Switch Manager port, there's an error that
cannot get correct port ID.

This patch fixes this by using specific parsing process to get port
ID for E-Switch Manager.

Fixes: 3c481324baf3 ("net/mlx5: fix meter flow direction check")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: add limitation for E-Switch Manager match

For BF with old FW which doesn't expose the E-Switch Manager vport ID,
E-Switch Manager port matching works correctly only when BF is in
embedded CPU mode.

This patch adds the limitation description.

Fixes: a564038699f9 ("net/mlx5: support E-Switch manager egress traffic match")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix RSS expansion for patterns with ICMP item

MLX5 PMD RSS expansion implementation added L4 UDP or TCP
headers after ICMP.
For example:
ETH / IPv4 / ICMP expanded into ETH / IPv4 / ICMP / {UDP | TCP}
ETH / IPv6 / ICMPv6 expanded into ETH / IPv6 / ICMPv6 / {UDP | TCP}

The patch updates PMD expansion scheme to handle ICMP and ICMPv6 types
as non-expandable for RSS.

Fixes: c7870bfe09dc ("ethdev: move RSS expansion code to mlx5 driver")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

app/testpmd: add host shaper command

Add command line options to support host shaper configure.
- Command syntax:
mlx5 set port <port_id> host_shaper avail_thresh_triggered <0|1> rate
<rate_num>

- Example commands:
To enable avail_thresh_triggered on port 1 and disable current host
shaper:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 1 rate 0

To disable avail_thresh_triggered and current host shaper on port 1:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 0 rate 0

The rate unit is 100Mbps.
To disable avail_thresh_triggered and configure a shaper of 5Gbps on
port 1:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 0 rate 50

Add sample code to handle rxq available descriptor threshold event, it
delays a while so that rxq empties, then disables host shaper and
rearms available descriptor threshold event.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: add API to configure host port shaper

Host port shaper can be configured with QSHR (QoS Shaper Host Register).
Add check in build files to enable this function or not.

The host shaper configuration affects all the ethdev ports belonging to the
same host port.

Host shaper can configure shaper rate and lwm-triggered for a host port.
The shaper limits the rate of traffic from host port to wire port.
If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically
when one of the host port's Rx queues receives available descriptor
threshold event.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: support Rx descriptor threshold event

Add mlx5 specific available descriptor threshold configuration
and query handler.
In mlx5 PMD, available descriptor threshold is also called
LWM (limit watermark).
While the Rx queue fullness reaches the LWM limit, the driver catches
an HW event and invokes the user callback.
The query handler finds the next Rx queue with pending LWM event
if any, starting from the given Rx queue index.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: handle Rx descriptor LWM event

When LWM meets RQ WQE, the kernel driver raises an event to SW.
Use devx event_channel to catch this and to notify the user.
Allocate this channel per shared device.
The channel has a cookie that informs the specific event port and queue.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

common/mlx5: share interrupt management

There are many duplicate code of creating and initializing rte_intr_handle.
Add a new mlx5_os API to do this, replace all PMD related code with this
API.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: support descriptor LWM for Rx queue

Add LWM (Limit WaterMark) field to Rxq object which indicates the percentage
of Rx queue size used by HW to raise descriptor event to the user.
Allow LWM setting in modify_rq command.
Allow the LWM configuration dynamically by adding RDY2RDY state change.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: fix build with clang 14

Use fgets instead of fscanf to resolve the following warning
reported by clang 14.0.0 in Fedora 37 (Rawhide):

drivers/net/mlx5/linux/mlx5_ethdev_os.c:1137:52: error:
  'fscanf' may overflow; destination buffer in argument 3 has size 16,
  but the corresponding specifier may require size 17
  [-Werror,-Wfortify-source]
  ret = fscanf(file, "%" RTE_STR(IF_NAMESIZE) "s", port_name);

Fixes: 63d1db710fbc ("net/mlx5: fix unlimited parsing of switch info")
Cc: stable@dpdk.org
Signed-off-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

common/mlx5: update log for DevX object creation failure

Application can fetch syndrome value after FW operation failure
starting from Mellanox OFED-5.6.
The patch updates log data after devx_obj_create error.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

common/mlx5: update log for DevX general command failure

Application can fetch syndrome value after FW operation failure
starting from Mellanox OFED-5.6.
The patch updates log data issued after devx_general_cmd error.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

net/mlx5: support field modification in meter rules

This patch introduces MODIFY_FIELD action support in meter. User can
create meter policy with MODIFY_FIELD action in green/yellow action.

For example:

testpmd> add port meter policy 0 21 g_actions modify_field op set
dst_type ipv4_ecn src_type value src_value 3 width 2 / ...

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: support modifying ECN field

This patch is to support modify ECN field in IPv4/IPv6 header.

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

common/mlx5: check ECN modification capability

Flag outer_ip_ecn in header modify capabilities properties layout is
added in order to check if the firmware supports modification of ecn
field.

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: support represented port item in flow rules

Add support for represented_port item in pattern. And if the spec and mask
both are NULL, translate function will not add source vport to matcher.

For example, testpmd starts with PF, VF-rep0 and VF-rep1, below command
will redirect packets from VF0 and VF1 to wire:
testpmd> flow create 0 ingress transfer group 0 pattern eth /
represented_port / end actions represented_port ethdev_id is 0 / end

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

eal/linux: allocate worker lcore stacks in hugepages

Add support for using hugepages for worker lcore stack memory. The
intent is to improve performance by reducing stack memory related TLB
misses and also by using memory local to the NUMA node of each lcore.

EAL option '--huge-worker-stack[=stack-size-in-kbytes]' is added to allow
the feature to be enabled at runtime. If the size is not specified,
the system pthread stack size will be used.

Signed-off-by: Don Wallwork <donw@xsightlabs.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>

ip_frag: fix build with GCC 12

GCC 12 raises warnings on usage of rte_memcpy with IPv4 options handling
in fragments for both the ip_frag library and unit tests.

For example in the library:
In function ‘_mm256_storeu_si256’,
    inlined from ‘rte_mov32’ at
        ../lib/eal/x86/include/rte_memcpy.h:347:2,
    inlined from ‘rte_mov128’ at
        ../lib/eal/x86/include/rte_memcpy.h:369:2,
    inlined from ‘rte_memcpy_generic’
        at ../lib/eal/x86/include/rte_memcpy.h:445:4,
    inlined from ‘rte_memcpy’
        at ../lib/eal/x86/include/rte_memcpy.h:851:10,
    inlined from ‘__create_ipopt_frag_hdr’
        at ../lib/ip_frag/rte_ipv4_fragmentation.c:68:4,
    inlined from ‘rte_ipv4_fragment_packet’
        at ../lib/ip_frag/rte_ipv4_fragmentation.c:242:16:
/usr/lib/gcc/x86_64-redhat-linux/12/include/avxintrin.h:935:8: error:
    array subscript ‘__m256i_u[1]’ is partly outside array bounds of
    ‘uint8_t[60]’ {aka ‘unsigned char[60]’} [-Werror=array-bounds]
  935 |   *__P = __A;
      |   ~~~~~^~~~~
../lib/ip_frag/rte_ipv4_fragmentation.c: In function
    ‘rte_ipv4_fragment_packet’:
../lib/ip_frag/rte_ipv4_fragmentation.c:122:17: note: at offset [52, 60]
    into object ‘ipopt_frag_hdr’ of size 60
  122 |         uint8_t ipopt_frag_hdr[IPV4_HDR_MAX_LEN];
      |                 ^~~~~~~~~~~~~~

To resolve the compilation warning, replace the rte_memcpy with memcpy.

Fixes: b50a14a853aa ("ip_frag: add IPv4 options fragment")
Signed-off-by: Huichao Cai <chcchc88@163.com>

net/qede: fix build with GCC 12

The x86 version of rte_memcpy can cause warnings. The driver does
not need to use rte_memcpy for everything. Standard memcpy is
just as fast and safer; the compiler and static analysis tools
treat memcpy specially.

Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

net/ice/base: fix build with GCC 12

GCC 12 with -O2 flag would raise the following warning:
../drivers/net/ice/base/ice_switch.c:7220:61: error: writing 1 byte into a
region of size 0 [-Werror=stringop-overflow=]
7220 | buf[recps].content.lkup_indx[i + 1] = entry->fv_idx[i];
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~

This patch changed the type of fv_idx in struct ice_recp_grp_entry to
align with its callers which are also u8 type.

Fixes: 04b8ec1ea807 ("net/ice/base: add protocol structures and defines")
Cc: stable@dpdk.org
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/iavf: add basic NEON Rx

This patch adds the basic NEON Rx path to the iavf driver. It does not
include scatter or flex varieties.

Tested on N1SDP platform with Intel XL710 NIC and 40G connection.
Tested with a single core and testpmd rxonly mode. Saw no significant
performance difference between scalar and Arm vPMD paths using this test
in iavf and saw the same results when comparing scalar and Arm vPMD
path in i40e.

Signed-off-by: Kathleen Capella <kathleen.capella@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>

net/i40e: add outer VLAN processing

Outer VLAN processing is supported after firmware v8.4, kernel driver
also change the default behavior to support this feature. To align with
kernel driver, add support for outer VLAN processing in DPDK.

But it is forbidden for firmware to change the Inner/Outer VLAN
configuration while there are MAC/VLAN filters in the switch table.
Therefore, we need to clear the MAC table before setting config,
and then restore the MAC table after setting.

This will not impact on an old firmware.

Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Signed-off-by: Kevin Liu <kevinx.liu@intel.com>
Acked-by: Yuying Zhang <yuying.zhang@intel.com>

net/ice: add DDP runtime configuration dump

Dump DDP runtime configure into a binary (package) file from ice PF port.

Add command line:
    ddp dump <port_id> <config_path>

Parameters:
    <port_id>       the PF Port ID
    <config_path>   dumped runtime configure file, if not a absolute path,
                    it will be dumped to testpmd running directory.

For example:
testpmd> ddp dump 0 current.pkg

If you want to dump ice VF DDP runtime configure, you need bind other
unused PF port of the NIC first, and then dump the PF's runtime configure
as target output.

Signed-off-by: Steve Yang <stevex.yang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/ice: fix race condition in Rx timestamp

In multi-cores cases for Rx timestamp offload, to avoid phc time being
frequently overwritten, move related variables from ice_adapter to
ice_rx_queue structure, and each queue will handle timestamp calculation
by itself.

Fixes: 953e74e6b73a ("net/ice: enable Rx timestamp on flex descriptor")
Fixes: 5543827fc6df ("net/ice: improve performance of Rx timestamp offload")
Cc: stable@dpdk.org
Signed-off-by: Simei Su <simei.su@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/qede: fix build with GCC 13

Reproduced with "gcc (GCC) 13.0.0 20220616 (experimental)"

Build error:
In file included from ../drivers/net/qede/qede_debug.c:9:
../drivers/net/qede/qede_debug.c: In function ‘qed_grc_dump_addr_range’:
../drivers/net/qede/base/ecore.h:95:17:
warning: overflow in conversion from ‘int’ to ‘u8’
{aka ‘unsigned char’} changes value from ‘(int)vf_id << 8 | 128’
to ‘128’ [-Woverflow]
   95 |                 ((_value & _name##_MASK) << _name##_SHIFT)
      |                 ^
../drivers/net/qede/qede_debug.c:1907:31:
note: in expansion of macro ‘FIELD_VALUE’
1907 |         fid = FIELD_VALUE(PXP_PRETEND_CONCRETE_FID_VFVALID, 1)
      |               ^~~~~~~~~~~

To prevent overflow converting 'fib' to uint16_t,
while updating it also updated 'vf_id' to 16 bit too.

Fixes: ec55c118792b ("net/qede: add infrastructure for debug data collection")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
Acked-by: Devendra Singh Rawat <dsinghrawat@marvell.com>

net/cnxk: add SDP VF device IDs

Add SDP VF device ID in the table for probe matching.

Signed-off-by: Radha Mohan Chintakuntla <radhac@marvell.com>

net/cnxk: resize CQ for Rx security for errata

Resize CQ for Rx security offload in case of HW errata.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

net/cnxk: fix PFC class disabling

Disabling a specific PFC class on a SQ is resulting in disabling PFC
on the entire port.

Fixes: 9544713564f5 ("net/cnxk: support priority flow control")
Cc: stable@dpdk.org
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

net/cnxk: remove restriction on VF for PFC config

Currently PFC configuration is not allowed on VFs.
Patch enables PFC configuration on VFs

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>

net/cnxk: add SDP link status

Add SDP link status reporting

Signed-off-by: Satananda Burla <sburla@marvell.com>

common/cnxk: fix mbox structs to avoid unaligned access

Fix mbox structs to avoid unaligned access as mbox
memory is from BAR space.

Fixes: 503b82de2cbf ("common/cnxk: add mbox request and response definitions")
Fixes: e746aec161cc ("common/cnxk: fix SQ flush sequence")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

common/cnxk: enhance CPT parsing header dump

Enhance CPT parse header dump to dump fragment info
and swap pointers before printing.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

common/cnxk: support same TC value across multiple queues

User may want to configure same TC value across multiple queues, but
for that all queues should have a common TL3 where this TC value will
get configured.

Changed the pfc_tc_cq_map/pfc_tc_sq_map array indexing to qid and store
TC values in the array. As multiple queues may have same TC value.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

common/cnxk: add PFC support for VF

Current PFC implementation does not support VFs.
This patch enables PFC on VFs too.

Also fix the config of aura.bp to be based on number
of buffers(aura.limit) and corresponding shift
value(aura.shift).

Fixes: cb4bfd6e7bdf ("event/cnxk: support Rx adapter")
Cc: stable@dpdk.org
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>

common/cnxk: avoid CPT backpressure due to errata

Avoid enabling CPT backpressure due to errata where
backpressure would block requests from even other
CPT LF's. Also allow CQ size >=16K.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

common/cnxk: use computed value for WQE skip

Use computed value for WQE skip instead of a hard-coded value.
WQE skip needs to be number of 128B lines to accommodate rte_mbuf.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

common/cnxk: add include for macro definition

The header file "roc_io.h" uses the "__plt_always_inline" macro but
don't include "roc_platform.h" to get the definition of it. This
inclusion is not necessary for compilation, but the lack of it can
confuse some indexers - such as those in eclipse, which reports the
lines:

"static __plt_always_inline uint64_t"

as possible definitions of a variable called "uint64_t". This confusion
leads to uint64_t being flagged as an unknown type in all other parts of
the project being indexed, e.g. across all of DPDK code.

Adding in the include of roc_platform.h makes it clear to the indexer
that those lines are part of a function definition, and that allows
eclipse to correctly recognise uint64_t as a type from stdint.h

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: add ROC API to free MCAM entry

Add ROC API to free the given MCAM entry. If the MCAM
entry has flow counter associated, this API will clear
and free the flow counter.

Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: support CNF10KB SoC

Support for CNF10KB SoC by adding its PCI device ID.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

common/cnxk: fix CN103XX subsystem device ID

Fix the subsystem device ID for CN103XX.

Fixes: dd462f68f04a ("common/cnxk: support CN103XX platform")
Cc: stable@dpdk.org
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>

common/cnxk: update extra stats for inline device

Inline device's NIX RX and RQ stats are updated
on ethdev extra stats

Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>

mempool/cnxk: support optional wait when counting

When counting the batch allocated pointers in cnxk mempool driver,
currently it always waits for in-flight batch operations to finish.
Add a provision to make this waiting optional.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>

common/cnxk: handle ROC model init failure

Return with error on fail to initialize ROC model.

Fixes: 014a9e222bac ("common/cnxk: add model init and IO handling API")
Cc: stable@dpdk.org
Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

common/cnxk: print NIX inline outbound CPT LF registers

This add the support to dump NIX inline outbound CPT LF
registers.

Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>

common/cnxk: fix decrypt packet count register update

Corrects the CPT decrypt packet counter register.

Fixes: b1a22e5d4f ("common/cnxk: add CPT diagnostics")
Cc: stable@dpdk.org
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>

doc: add platform option in cnxk native build

Update cnxk platform documentation to use
-Dplatform meson option for native builds.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

common/cnxk: support dumping flow MCAM entry data

When dumping flow data, read hardware MCAM entry corresponding
to the flow and print that data also.

Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Reviewed-by: Kiran Kumar K <kirankumark@marvell.com>

net/cnxk: fix crash in IPsec telemetry

Calling this telemetry callback with no argument caused a crash.

Fixes: 41cc645c214f ("net/cnxk: add inline IPsec telemetry for CN9K")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>

common/cnxk: add macros to platform layer

Added new platform layer macros for pointer operations,
bitwise operations, spinlock and 32 bit read and write.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>

common/cnxk: fix channel number setting in MCAM entries

Adding changes to accommodate the following requirements
while masking the channel number.
1. For CN10K device, channel number should not be masked
   for first pass rules with RTE_FLOW_ACTION_TYPE_SECURITY
   action. And channel number should be masked for all
   other flow rules.
2. For CN9K device channel number should not be masked.

Fixes: 4968b362b639 ("common/cnxk: support CPT second pass flow rules")
Cc: stable@dpdk.org
Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Reviewed-by: Kiran Kumar K <kirankumark@marvell.com>

net/thunderx: populate max and min MTU values

Populate maximum and minimum MTU values while retrieving
device information.

Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

net/thunderx: support attaching from secondary

Adding support for device hotplugging - attach and detach from
secondary

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: support allmulticast

Implement allmulticast operations for octeontx driver:
rte_eth_allmulticast_enable()/rte_eth_allmulticast_disable().

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: support xstats

Adding support for xstats eth operations.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/thunderx: support setting link attributes

Adding support to configure link attributes like speed,
duplex, negotiation.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/thunderx: reset Rx DMAC control register

During initialization, reset RX DMAC control register by
sending mbox message NIC_MBOX_MSG_RESET_XCAST to PF.

Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

net/thunderx: support polling of link state change

Moving the logic of link polling to VF from PF. Now VF
is supposed to poll for the link status, rather PF alerting
VF about any link change.

Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

net/octeontx: handle port reconfiguration

Adding support for port reconfiguration as user may require to
do so on a running system.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: support setting link attributes

Adding support to configure link attributes like speed,
duplex, negotiation.

Signed-off-by: Harman Kalra <hkalra@marvell.com>

net/octeontx: fix port close

Segmentation fault has been observed while closing the ethernet
port. Reason for the segfault is, eth port close also shuts down
event device while other ethernet port is still using the event
device.

Fixes: da6c687471a3 ("net/octeontx: add start and stop support")
Cc: stable@dpdk.org
Signed-off-by: Harman Kalra <hkalra@marvell.com>

ci: enable C++ check for Arm and PPC

The crossbuild-essential-<arch> packages contain all necessary
dependencies to cross-compile binaries for a given architecture
including C and C++ compilers. Therefore use those instead of listing
packages directly. This way C++ compiler is also installed and C++
include checks will be checked in CI for ARM and PowerPC.

Cc: stable@dpdk.org
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

config: fix C++ cross compiler for Arm and PPC

Through some mixup all cross-files for ARM and PowerPC platforms were
using C Preprocessor (cpp) instead of GCC (g++).
This caused meson to fail detecting the C++ compiler presence and
therefore disabling some targets (i.e. C++ include file checks).

Fixes: e53a5299d219 ("build: support vendor specific ARM cross builds")
Cc: stable@dpdk.org
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

malloc: fix allocation of almost hugepage size

If called to allocate memory of size is between multiple of hugepage
size minus malloc_header_len and hugepage size, rte_malloc fails.

This fix replaces malloc_elem_trailer_len with malloc_elem_overhead in
try_expand_heap() to include malloc_elem_header_len when calculating
n_seg.

Bugzilla ID: 800
Fixes: 07dcbfe0101f ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org
Signed-off-by: Fidaullah Noonari <fidaullah.noonari@emumba.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

eal/unix: make stack dump signal safe

rte_dump_stack() needs to be usable in situations when a bug is
encountered and from signal handlers (such as SEGV).

Glibc backtrace_symbols() calls malloc which makes it
dangerous in a signal handler that is handling errors that maybe
due to memory corruption. Additionally, rte_log() is unsafe because
syslog() is not signal safe; printf() is also documented as
not being safe.

This version formats message and uses writev for each line in a manner
similar to what glibc version of backtrace_symbols_fd() does. The
FreeBSD version of backtrace_symbols_fd() is not signal safe.

Sample output:

0: ./build/app/dpdk-testpmd (rte_dump_stack+0x2b) [560a6e9c002b]
1: ./build/app/dpdk-testpmd (main+0xad) [560a6decd5ad]
2: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xcd) [7fd43d3e27fd]
3: ./build/app/dpdk-testpmd (_start+0x2a) [560a6e83628a]

Bugzilla ID: 929

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>

vhost/crypto: fix descriptor processing

copy_data was returning a pointer to an increased (off by one) descriptor.
Subsequent calls to copy_data in the library were then failing.
Fix this by incrementing the descriptor only if there is some left data
to copy.

Fixes: 4414bb67010d ("vhost/crypto: fix build with GCC 12")
Cc: stable@dpdk.org
Reported-by: Jakub Poczatek <jakub.poczatek@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Jakub Poczatek <jakub.poczatek@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>

net/virtio: unmap PCI device in secondary process

In multi-process, the secondary process will remap PCI during
initialization, but the mapping is not removed in the uninit path,
the device is not closed, and the device busy error will be reported
when the device is hotplugged.

This patch unmaps PCI device at secondary process uninitialization
based on virtio_rempa_pci.

Fixes: 36a7a2e7a53f ("net/virtio: move PCI device init in dedicated file")
Cc: stable@dpdk.org
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Tested-by: Wei Ling <weix.ling@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

vhost/crypto: fix build with GCC 12

GCC 12 raises the following warning:

In file included from ../lib/mempool/rte_mempool.h:46,
                 from ../lib/mbuf/rte_mbuf.h:38,
                 from ../lib/vhost/vhost_crypto.c:7:
../lib/vhost/vhost_crypto.c: In function ‘rte_vhost_crypto_fetch_requests’:
../lib/eal/x86/include/rte_memcpy.h:371:9: warning: array subscript 1 is
     outside array bounds of ‘struct virtio_crypto_op_data_req[1]’
     [-Warray-bounds]
  371 | rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../lib/vhost/vhost_crypto.c:1178:42: note: while referencing ‘req’
1178 |         struct virtio_crypto_op_data_req req;
      |                                          ^~~

Split this function and separate the per descriptor copy.
This makes the code clearer, and the compiler happier.

Note: logs for errors have been moved to callers to avoid duplicates.

Fixes: 3c79609fda7c ("vhost/crypto: handle virtually non-contiguous buffers")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: prepare virtqueue resource creation

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add virtq sub-resources creation

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add device close task

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add virtq live migration log task

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add virtq creation task

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add VM memory registration task

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add task ring for multi-thread management

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: add multi-thread management for configuration

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: optimize datapath-control synchronization

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: pre-create virtq at probing time

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

common/mlx5: extend virtq modifiable fields

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: reuse event queues

To speed up queue creation time, event QP and CQ will create only once.
Each virtq creation will reuse same event QP and CQ.

Because FW will set event QP to error state during virtq destroy,
need modify event QP to RESET state, then modify QP to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW QP reset, QP pi/ci all become 0 while CQ pi/ci keep as
previous. Add new variable qp_ci to save SW QP ci. Move QP pi
independently with CQ ci.

Add new function mlx5_vdpa_drain_cq to drain CQ CQE after virtq
release.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

common/mlx5: add DevX API to move queues to reset state

Support set QP to RESET state.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: support pre-creation of virtq resource

The motivation of this change is to reduce vDPA device queue creation
time by creating some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/mlx5: fix maximum number of virtqs

The driver wrongly takes the capability value for
the number of virtq pairs instead of just the number of virtqs.

Adjust all the usages of it to be the number of virtqs.

Fixes: c2eb33aaf967 ("vdpa/mlx5: manage virtqs by array")
Cc: stable@dpdk.org
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix log message for async dequeue

Since the commit 02798b073520 ("vhost: improve virtio-net layer logs"),
vhost logs contain the socket path as a prefix.
Async dequeue path was copied from the sync dequeue path but a log
was incorrect.

Fixes: 84d5204310d7 ("vhost: support async dequeue for split ring")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: fix statistics update in async dequeue

This patch adds missing per-virtqueue statistics in async dequeue path.

Fixes: 84d5204310d7 ("vhost: support async dequeue for split ring")
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Wei Ling <weix.ling@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

vhost: rename number of available entries

This patchs renames the local variables free_entries to
avail_entries in the dequeue path.

Indeed, this variable represents the number of new packets
available in the Virtio transmit queue, so these entries
are actually used, not free.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

vdpa/mlx5: workaround VAR offset within page

vDPA driver first uses kernel driver to allocate doorbell (VAR) area for
each device. Then uses var->mmap_off and var->length to mmap uverbs device
file as doorbell userspace virtual address.

Current kernel driver provides var->mmap_off equal to page start of VAR.
It's fine with x86 4K page server, because VAR physical address is only 4K
aligned thus locate in 4K page start.

But with aarch64 64K page server, the actual VAR physical address has
offset within page (not located in 64K page start).
So the vDPA driver needs to add this within page offset
(caps.doorbell_bar_offset) to get the right VAR virtual address.

Fixes: 62c813706e4 ("vdpa/mlx5: map doorbell")
Cc: stable@dpdk.org
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vhost: support async packed ring dequeue

This patch implements packed ring dequeue data path
for asynchronous vhost.

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

vdpa/ifc/base: fix null pointer dereference

Fix null pointer dereference reported in coverity scan.

Coverity issue: 378882
Fixes: 5d75517beffe ("vdpa/ifc/base: access block device registers")
Signed-off-by: Andy Pei <andy.pei@intel.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

examples/vhost: support clear in-flight for async dequeue

This patch allows vring_state_changed() to clear in-flight
dequeue packets. It also clears the in-flight packets in
a thread-safe way in destroy_device().

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

vhost: support clear in-flight packets for async dequeue

rte_vhost_clear_queue_thread_unsafe() supports to clear
in-flight packets for async enqueue only. But after
supporting async dequeue, this API should support async dequeue too.

This patch also adds the thread-safe version of this API,
the difference between the two API is that thread safety uses lock.

These APIs maybe used to clean up packets in the async channel
to prevent packet loss when the device state changes or
when the device is destroyed.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>

net/vhost: perform SW checksum in Tx path

Virtio specification supports guest checksum offloading
for L4, which is enabled with VIRTIO_NET_F_GUEST_CSUM
feature negotiation. However, the Vhost PMD does not
advertise Tx checksum offload capabilities.

Advertising these offload capabilities at the ethdev level
is not enough, because we could still end-up with the
application enabling these offloads while the guest not
negotiating it.

This patch advertises the Tx checksum offload capabilities,
and introduces a compatibility layer to cover the case
VIRTIO_NET_F_GUEST_CSUM has not been negotiated but the
application does configure the Tx checksum offloads. This
function performs the L4 Tx checksum in SW for UDP and TCP.
Compared to Rx SW checksum, the Tx SW checksum function
needs to compute the pseudo-header checksum, as we cannot
know whether it was done before.

This patch does not advertise SCTP checksum offloading
capability for now, but it could be handled later if the
need arises.

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Cheng Jiang <cheng1.jiang@intel.com>