dpdk.git
4 years agonet/ice/base: check number of chained recipes
Wei Zhao [Fri, 10 Apr 2020 00:41:55 +0000 (08:41 +0800)]
net/ice/base: check number of chained recipes

When we add some long switch rule, we need check the
number of final recipe number, if it is large than
ICE_MAX_CHAIN_RECIPE, we should refuse this rule.
For example:

"flow create 0 ingress pattern eth / ipv6
src is CDCD:910A:2222:5498:8475:1111:3900:1536
dst is CDCD:910A:2222:5498:8475:1111:3900:2022
tc is 3 / udp dst is 45 / end actions queue index 2 / end"

This rule will consume 6 recipe, if it is not refused, it
will cause the following code over write of lkup_indx and mask.

LIST_FOR_EACH_ENTRY(entry, &rm->rg_list, ice_recp_grp_entry,
l_entry) {
last_chain_entry->fv_idx[i] = entry->chain_idx;
buf[recps].content.lkup_indx[i] = entry->chain_idx;
buf[recps].content.mask[i++] = CPU_TO_LE16(0xFFFF);
..........
}

Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Nannan Lu <nannan.lu@intel.com>
4 years agonet/i40e: restrict pointer aliasing for NEON
Gavin Hu [Mon, 13 Apr 2020 16:40:25 +0000 (00:40 +0800)]
net/i40e: restrict pointer aliasing for NEON

Restrict pointer aliasing to optimize the code generated.

The patch showed ~3% performance uplift on Arm N1SDP platform, and no
degradation on ThunderX2. The tet case is RFC2544 zero-loss L2
forwarding running testpmd.

[1] https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Restricted-Pointers.html

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
4 years agonet/i40e: relax barrier in Tx for NEON
Gavin Hu [Mon, 13 Apr 2020 16:40:24 +0000 (00:40 +0800)]
net/i40e: relax barrier in Tx for NEON

To keep ordering of mixed accesses, 'DMB OSH' is sufficient.
'DSB' inside the I40E_PCI_REG_WRITE is overkill.[1]

This patch fixes by replacing with just sufficient barriers in the
normal PMD and vPMD.

It showed 7% performance uplift on ThunderX2 and 4% on Arm N1SDP.
The test case is the RFC2544 zero-loss test running testpmd.

[1] http://inbox.dpdk.org/dev/CALBAE1M-ezVWCjqCZDBw+MMDEC4O9
qf0Kpn89EMdGDajepKoZQ@mail.gmail.com

Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
4 years agonet/igc: support flow API
Alvin Zhang [Wed, 15 Apr 2020 08:48:10 +0000 (16:48 +0800)]
net/igc: support flow API

Below type of flows are supported:
ether-type filter, 2-tuple filter, SYN filter, RSS.
Update docs too.

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/igc: support MAC loopback mode
Alvin Zhang [Wed, 15 Apr 2020 08:48:09 +0000 (16:48 +0800)]
net/igc: support MAC loopback mode

Enable mac-loopback mode.

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/igc: support VLAN
Alvin Zhang [Wed, 15 Apr 2020 08:48:08 +0000 (16:48 +0800)]
net/igc: support VLAN

Below ops ware added:
vlan_filter_set
vlan_offload_set
vlan_tpid_set
vlan_strip_queue_set

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/igc: support RSS
Alvin Zhang [Wed, 15 Apr 2020 08:48:07 +0000 (16:48 +0800)]
net/igc: support RSS

Below ops are added:
reta_update
reta_query
rss_hash_update
rss_hash_conf_get

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/igc: support flow control
Alvin Zhang [Wed, 15 Apr 2020 08:48:06 +0000 (16:48 +0800)]
net/igc: support flow control

Update feature list too.

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/igc: enable Rx queue interrupts
Alvin Zhang [Wed, 15 Apr 2020 08:48:05 +0000 (16:48 +0800)]
net/igc: enable Rx queue interrupts

Setup NIC to generate MSI-X interrupts.
Set the IVAR register to map interrupt causes to vectors.
Implement interrupt enable/disable functions.

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/igc: enable statistics
Alvin Zhang [Wed, 15 Apr 2020 08:48:04 +0000 (16:48 +0800)]
net/igc: enable statistics

Enable base statistics, extend statistics and per-queue statistics.

Below ops are added:
stats_get
xstats_get
xstats_get_by_id
xstats_get_names_by_id
xstats_get_names
stats_reset
xstats_reset
queue_stats_mapping_set

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/igc: support Rx and Tx
Alvin Zhang [Wed, 15 Apr 2020 08:48:03 +0000 (16:48 +0800)]
net/igc: support Rx and Tx

Below ops are added too:
mac_addr_add
mac_addr_remove
mac_addr_set
set_mc_addr_list
mtu_set
promiscuous_enable
promiscuous_disable
allmulticast_enable
allmulticast_disable
rx_queue_setup
rx_queue_release
rx_queue_count
rx_descriptor_done
rx_descriptor_status
tx_descriptor_status
tx_queue_setup
tx_queue_release
tx_done_cleanup
rxq_info_get
txq_info_get
dev_supported_ptypes_get

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/igc: implement device base operations
Alvin Zhang [Wed, 15 Apr 2020 08:48:02 +0000 (16:48 +0800)]
net/igc: implement device base operations

Bellow ops are implemented:
dev_configure
dev_start
dev_stop
dev_close
dev_reset
dev_set_link_up
dev_set_link_down
link_update
fw_version_get
dev_led_on
dev_led_off

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/igc: support device initialization
Alvin Zhang [Wed, 15 Apr 2020 08:48:01 +0000 (16:48 +0800)]
net/igc: support device initialization

Update base codes, add readme.
Add OS specific functions and definitions.
Add device initialization codes.

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/igc: add skeleton
Alvin Zhang [Wed, 15 Apr 2020 08:48:00 +0000 (16:48 +0800)]
net/igc: add skeleton

Implement device detection and loading.
Add igc driver guide docs.

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/octeontx2: disable unnecessary error interrupts
Nithin Dabilpuram [Mon, 13 Apr 2020 13:45:40 +0000 (19:15 +0530)]
net/octeontx2: disable unnecessary error interrupts

Disable CQ_DISABLED error interrupt in NIX_LF_ERR_INT
to fix spurious interrupts in event dev mode. Also skip
configuring RSS when RQ count is '0' because
RSS table initialization is done incorrectly due to
divide-by-zero error and it is leading to RQ_OOR error
in NIX_LF_ERR_INT.

Fixes: 83ce2880e22e ("net/octeontx2: support RSS")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
4 years agonet/bnx2x: handle guest VLAN for SR-IOV
Souvik Dey [Mon, 13 Apr 2020 23:09:30 +0000 (16:09 -0700)]
net/bnx2x: handle guest VLAN for SR-IOV

In case of bnx2xvf pmd, tx packets can support vland id in 2 ways:
1. Setting the mbuf ol_flags=PKT_TX_VLAN_PKT and passing the
vlanid in mbuf->vlan_tci.
2. The tx packet itself has the vlan id included in the packet.
The first case is working as expected but the second case where
the vlan id is included in thetx packets itself was found not
working as expected. To handle that we need to properly set the
start_bd bitfield and the vlan_or_ethertype instead of setting it
to just the ethertype in case of VF.

Signed-off-by: Souvik Dey <sodey@rbbn.com>
Acked-by: Rasesh Mody <rmody@marvell.com>
4 years agonet/bnx2x: add multicast MAC address filtering
Souvik Dey [Mon, 13 Apr 2020 23:09:02 +0000 (16:09 -0700)]
net/bnx2x: add multicast MAC address filtering

Add support the set_mc_addr_list device operation in the bnx2xvf PMD.

The configured addresses are stored in the device private area, so
they can be flushed before adding new ones.
Without this v6 multicast packets were properly forwarded to the
Guest VF.

Signed-off-by: Souvik Dey <sodey@rbbn.com>
Acked-by: Rasesh Mody <rmody@marvell.com>
4 years agonet/mlx5: use open/read/close for ib stats query
Mohsin Shaikh [Thu, 9 Apr 2020 20:37:06 +0000 (04:37 +0800)]
net/mlx5: use open/read/close for ib stats query

fgets(3)/fread(3)/fscanf(3) etc. use mmap(2)/munmap(2) which leads
to TLB shutdown interrupts to all DPDK app cores including RX cores.
This can cause packet drops. Use read(2)/write(2) instead.

Bugzilla ID: 440
Cc: stable@dpdk.org
Signed-off-by: Mohsin Shaikh <mohsinshaikh@niometrics.com>
Reviewed-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
4 years agonet/mlx5: fix index when creating flow
Bing Zhao [Thu, 9 Apr 2020 14:38:41 +0000 (22:38 +0800)]
net/mlx5: fix index when creating flow

When creating a flow, usually the creating routine is called in
serial. No parallel execution is supported right now. The same
function will be called only once for a single flow creation.

But there is a special case that the creating routine will be called
nested. If the xmeta feature is enabled and there is FLAG / MARK in
the actions list, some metadata reg copy flow needs to be created
before the original flow is applied to the hardware.
In the flow non-cached mode, resources only for flow creation will
not be saved anymore. The memory space is pre-allocated and reused
for each flow. A global index for each device is used to indicate
the memory address of the resources. If the function is called in a
nested mode, then the index will be reset and make everything get
corrupted.

To solve this, a nested index is introduced to save the position for
the original flow creation. Currently, only one level nested call
of the flow creating routine is supported.

Fixes: e7bfa3596a0a ("net/mlx5: separate the flow handle resource")

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
4 years agonet/mlx4: fix build with -fno-common
Thomas Monjalon [Wed, 8 Apr 2020 00:09:00 +0000 (02:09 +0200)]
net/mlx4: fix build with -fno-common

The variable storages of the same name are merged together
if compiled with -fcommon. This is the default.
This default behaviour allows to declare a variable in a header file and
share the variable in every .o binaries thanks to merge at link-time.

In the case of dlopen linking of the glue library, the pointer mlx4_glue
is referencing the glue functions struct and is set after calling
dlopen.

If compiling with -fno-common (default in GCC 10), the variables must be
declared as extern to avoid multiple re-definitions.
In case the glue layer is split in glue library, the variable mlx4_glue
needs to have its own storage for the rest of the PMD.

Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agocommon/mlx5: fix build with -fno-common
Thomas Monjalon [Wed, 8 Apr 2020 00:08:59 +0000 (02:08 +0200)]
common/mlx5: fix build with -fno-common

The variable storages of the same name are merged together
if compiled with -fcommon. This is the default.
This default behaviour allows to declare a variable in a header file and
share the variable in every .o binaries thanks to merge at link-time.

In the case of dlopen linking of the glue library, the pointer mlx5_glue
is referencing the glue functions struct and is set after calling
dlopen.

If compiling with -fno-common (default in GCC 10), the variable must be
declared as extern to avoid multiple re-definitions.
In case the glue layer is split in glue library, the variable mlx5_glue
needs to have its own storage for the rest of the PMD.

Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agocommon/mlx5: split glue initialization
Thomas Monjalon [Wed, 8 Apr 2020 00:08:58 +0000 (02:08 +0200)]
common/mlx5: split glue initialization

The function mlx5_glue_init was doing three things:
- initialize logs
- load glue library if in dlopen mode
- initialize glue layer
They are split in three functions for clarity.

The config option RTE_IBVERBS_LINK_DLOPEN is not used anymore
outside of make and meson files. It is replaced with MLX5_GLUE,
which is defined in the same condition and is already used with dlopen.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agocommon/iavf: update version
Qi Zhang [Sun, 12 Apr 2020 12:50:29 +0000 (20:50 +0800)]
common/iavf: update version

Update base code release version in readme

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
4 years agocommon/iavf: support flow director in virtual channel
Qi Zhang [Mon, 13 Apr 2020 09:32:56 +0000 (17:32 +0800)]
common/iavf: support flow director in virtual channel

Adds new ops and structures to support VF to add/delete/validate/
query flow director.

ADD and VALIDATE FDIR share one ops: VIRTCHNL_OP_ADD_FDIR_FILTER.
VF sends this request to PF by filling out the related field in
virtchnl_fdir_add. If the rule is created successfully, PF
will return flow id and program status to VF. If the rule is
validated successfully, the PF will only return program status
to VF.

DELETE FDIR uses ops: VIRTCHNL_OP_DEL_FDIR_FILTER.
VF sends this request to PF by filling out the related field in
virtchnl_fdir_del. If the rule is deleted successfully, PF
will return program status to VF.

Query FDIR uses ops: VIRTCHNL_OP_QUERY_FDIR_FILTER.
VF sends this request to PF by filling out the related field in
virtchnl_fdir_query. If the request is successfully done by PF,
PF will return program status and query info to VF.

Signed-off-by: Simei Su <simei.su@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
4 years agocommon/iavf: support advanced RSS input set change
Qi Zhang [Mon, 13 Apr 2020 09:32:55 +0000 (17:32 +0800)]
common/iavf: support advanced RSS input set change

Add new ops and a new VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF flag to support
VF to add or del a specific rss configuration by virtchnl.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
4 years agocommon/iavf: add virtual channel protocol header
Qi Zhang [Mon, 13 Apr 2020 09:32:54 +0000 (17:32 +0800)]
common/iavf: add virtual channel protocol header

To support advanced AVF's FDIR and RSS feature, we need to figure out
what kind of data structure should be passed from VF to PF to describe
an FDIR rule or RSS config rule. The common part of the requirement is
we need a data structure to represent the input set selection of a rule's
hash key.

An input set selection is a group of fields be selected from one or more
network protocol layers that could be identified as a specific flow.
For example, select dst IP address from an IPv4 header combined with
dst port from the TCP header as the input set for an IPv4/TCP flow.

The patch adds a new data structure virtchnl_proto_hdrs to abstract
a network protocol headers group which is composed of layers of network
protocol header(virtchnl_proto_hdr).

A protocol header contains a 32 bits mask (field_selector) to describe
which fields are selected as input sets, as well as a header type
(enum virtchnl_proto_hdr_type). Each bit is mapped to a field in
enum virtchnl_proto_hdr_field guided by its header type.

+------------+-----------+------------------------------+
|            | Proto Hdr | Header Type A                |
|            |           +------------------------------+
|            |           | BIT 31 | ... | BIT 1 | BIT 0 |
|            |-----------+------------------------------+
|Proto Hdrs  | Proto Hdr | Header Type B                |
|            |           +------------------------------+
|            |           | BIT 31 | ... | BIT 1 | BIT 0 |
|            |-----------+------------------------------+
|            | Proto Hdr | Header Type C                |
|            |           +------------------------------+
|            |           | BIT 31 | ... | BIT 1 | BIT 0 |
|            |-----------+------------------------------+
|            |    ....                                  |
+-------------------------------------------------------+

All fields in enum virtchnl_proto_hdr_fields are grouped with header type
and the value of the first field of a header type is always 32 aligned.

enum proto_hdr_type {
header_type_A = 0;
header_type_B = 1;
....
}

enum proto_hdr_field {
/* header type A */
header_A_field_0 = 0,
header_A_field_1 = 1,
header_A_field_2 = 2,
header_A_field_3 = 3,

/* header type B */
header_B_field_0 = 32, // = header_type_B << 5
header_B_field_0 = 33,
header_B_field_0 = 34
header_B_field_0 = 35,
....
};

So we have:
proto_hdr_type = proto_hdr_field / 32
bit offset = proto_hdr_field % 32

To simply the protocol header's operations, couple help macros are added.
For example, to select src IP and dst port as input set for an IPv4/UDP
flow.

we have:
struct virtchnl_proto_hdr hdr[2];

VIRTCHNL_SET_PROTO_HDR_TYPE(&hdr[0], IPV4)
VIRTCHNL_ADD_PROTO_HDR_FIELD(&hdr[0], IPV4, SRC)

VIRTCHNL_SET_PROTO_HDR_TYPE(&hdr[1], UDP)
VIRTCHNL_ADD_PROTO_HDR_FIELD(&hdr[1], UDP, DST)

A protocol header also contains a byte array, this field should only
be used by an FDIR rule and should be ignored by RSS. For an FDIR rule,
the byte array is used to store the protocol header of a training
package. The byte array must be network order.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
4 years agocommon/iavf: support virtual channel for Flex RXD
Qi Zhang [Mon, 13 Apr 2020 09:32:53 +0000 (17:32 +0800)]
common/iavf: support virtual channel for Flex RXD

Add new VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC flag, opcode
VIRTCHNL_OP_GET_SUPPORTED_RXDIDS and add member rxdid
in struct virtchnl_rxq_info to support AVF Flex RXD
extension.

Signed-off-by: Leyi Rong <leyi.rong@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
4 years agocommon/iavf: add DDP package query in virtual channel
Qi Zhang [Mon, 13 Apr 2020 09:32:52 +0000 (17:32 +0800)]
common/iavf: add DDP package query in virtual channel

Add VIRTCHNL_OP_DCF_GET_PKG_INFO to query DDP package identification.

Signed-off-by: Leyi Rong <leyi.rong@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
4 years agocommon/iavf: add packet type aborted code
Qi Zhang [Mon, 13 Apr 2020 09:32:51 +0000 (17:32 +0800)]
common/iavf: add packet type aborted code

Add IAVF_RX_PTYPE_PARSER_ABORTED definition, so iavf driver will know
opcode for parser aborted packets.
Without this definition driver would have to rely on magic numbers.

Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
4 years agonet/ice/base: update version
Qi Zhang [Sun, 12 Apr 2020 12:47:57 +0000 (20:47 +0800)]
net/ice/base: update version

Update base code version in readme.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Xiaolong Ye <xiaolong.ye@intel.com>
4 years agonet/hns3: fix VLAN filter when setting promisucous mode
Chengchang Tang [Fri, 10 Apr 2020 11:09:30 +0000 (19:09 +0800)]
net/hns3: fix VLAN filter when setting promisucous mode

Currently, when upper level application call the API function named
rte_eth_dev_set_vlan_offload to configure the hardware vlan filter
offload and call the rte_eth_promiscuous_enable API to enable
promiscuous mode based on hns3 PF device, driver can't receive the
packets with a vlan tag which has not been added by calling the API
function named rte_eth_dev_vlan_filter.

This patch fixes it by disabling the vlan filter when setting the
promiscuous mode and enabling the vlan filter again after the
promiscuous mode are disabled.

Fixes: 19a3ca4c99cf ("net/hns3: add start/stop and configure operations")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
4 years agonet/hns3: fix default VLAN filter configuration for PF
Chengchang Tang [Fri, 10 Apr 2020 11:09:29 +0000 (19:09 +0800)]
net/hns3: fix default VLAN filter configuration for PF

Currently, By default VLAN filter is enabled during initialization and
couldn't be turned off based on hns3 PF device. If upper applications
don't call rte_eth_dev_vlan_filter API function to set vlan based on
hns3 PF device, hns3 PF PMD driver will can't receive the packets with
vlan tag.  It will leads to some compatibility issues, the behaviors of
using hns3 network engine and other NICs are different.

This patch disables the VLAN filter during initialization and allows the
upper level applications to enable or disable the VLAN filter.

Fixes: 411d23b9eafb ("net/hns3: support VLAN")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
4 years agonet/hns3: fix RSS key length
Lijun Ou [Fri, 10 Apr 2020 11:09:28 +0000 (19:09 +0800)]
net/hns3: fix RSS key length

When upper application calls the rte_eth_dev_rss_hash_conf_get API
function to get the RSS key parameters, the function should return the
RSS key length supported by the device. Otherwise, an error will occur
when the upper application needs to use the RSS key length supported
by this specified hardware for judgment and configure the specified
key into hardware.

For example, in the following scenario:
When users want to use their own RSS key, but the length of the key is
bigger than the one of the supported by hardware.

As a result, users need to get the RSS key length supported by hardware
according to the above API firstly, and then compare the actual obtained
RSS key length with the length of their own RSS key.

If the driver does not return the actual value, error may occur when
user calls the rte_eth_dev_rss_hash_update API function to configure
their own key into hardware.

Besides, this patch fixes the problem of stepping on memory when the RSS
key array configured by users are less than the RSS key length supported
by the driver at the same time.

Fixes: c37ca66f2b27 ("net/hns3: support RSS")
Cc: stable@dpdk.org
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
4 years agonet/hns3: add RSS hash offload to capabilities
Lijun Ou [Fri, 10 Apr 2020 11:09:27 +0000 (19:09 +0800)]
net/hns3: add RSS hash offload to capabilities

Currently, when upper application calls rte_eth_dev_info_get API
function to query the Rx offload capability based on hns3 network
engine, RSS hash offload capacity is missing.

This patch fixes it by adding the related capacity in the
'.dev_infos_get' ops implementation function named hns3_dev_infos_get
and hns3vf_dev_infos_get for hns3 PF/VF PMD driver.

Fixes: c37ca66f2b27 ("net/hns3: support RSS")
Cc: stable@dpdk.org
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
4 years agonet/hns3: clear residual flow rules on init
Chengwen Feng [Fri, 10 Apr 2020 11:09:26 +0000 (19:09 +0800)]
net/hns3: clear residual flow rules on init

This patch fixes that the flow director rules are not cleared during
initialization, which lead to remaining flow director rules after upper
application (such as testpmd) restarted.

Fixes: fcba820d9b9e ("net/hns3: support flow director")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
4 years agonet/hns3: fix Rx interrupt after reset
Chengwen Feng [Fri, 10 Apr 2020 11:09:25 +0000 (19:09 +0800)]
net/hns3: fix Rx interrupt after reset

Currently, Rx interrupt cannot work normally after reset (such as FLR,
global reset and IMP reset), when running l3fwd-power application based
on hns3 network engine.

The root cause is that the hardware configuration about Rx interrupt
does not recover after reset.

This patch fixes it with the following modification.
1. The internal static function named hns3(vf)_init_ring_with_vector is
   moved from hns3_init_pf to hns3(vf)_init_hardware because
   hns3(vf)_init_hardware is called both in the initialization and the
   RESET_STAGE_DEV_INIT stage of the reset process.
2. The internal static function named hns3(vf)_restore_rx_interrupt is
   added in hns3(vf)_restore_conf, it is used to recover hardware
   configuration about interrupt vectors of rx queues in the
   RESET_STAGE_DEV_INIT stage of the reset process.
3. The internal static function named hns3_dev_all_rx_queue_intr_enable
   and hns3_enable_all_queues are added in hns3(vf)_dev_start(which
   called in the initialization, so after calling the rte_eth_dev_start
   API successfully, the driver is ready to work.
4. The function named hns3_dev_all_rx_queue_intr_enable and
   hns3_enable_all_queues are also added in hns3(vf)_start_service(which
   called in the RESET_STAGE_DEV_INIT stage of the reset process), so
   after start_service, the driver is ready to work.

Note:
1. Because FLR will clear queue's interrupt enable bit hardware
   configuration, so we add calling hns3_dev_all_rx_queue_intr_enable to
   enable interrupt before enabling queues.
2. After finished the initialization, we can enable queues to work by
   calling the internal function named hns3_enable_all_queues.

Fixes: 02a7b55657b2 ("net/hns3: support Rx interrupt")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
4 years agonet/hns3: fix adding multicast MAC address
Chengchang Tang [Fri, 10 Apr 2020 11:09:24 +0000 (19:09 +0800)]
net/hns3: fix adding multicast MAC address

Currently, when upper application calls the rte_eth_dev_mac_addr_add API
function to add a MC mac address based on hns3 PF/VF device, it will
fail.

In hns3 network engine adding UC and MC mac address with different
commands with firmware. We need to determine whether the input address
is a UC or a MC address to call different commands in the
'.mac_addr_add' and '.mac_addr_remove' ops implementation functions in
hns3 PF and VF driver as below:
  hns3_add_mac_addr
  hns3vf_add_uc_mac_addr
  hns3_remove_mac_addr
  hns3vf_remove_mac_addr

By the way, it is recommended calling the rte_eth_dev_set_mc_addr_list API
function to set the MC mac address, because using the
rte_eth_dev_mac_addr_add API function to set MC mac address may affect the
specifications of UC mac addresses.

Fixes: 7d7f9f80bbfb ("net/hns3: support MAC address related operations")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
4 years agonet/hns3: replace interrupt vector zero with common macro
Wei Hu (Xavier) [Fri, 10 Apr 2020 11:09:23 +0000 (19:09 +0800)]
net/hns3: replace interrupt vector zero with common macro

This patch replaces the specific macro named RTE_INTR_VEC_ZERO_OFFSET
provided by DPDK framework instead of the magic number 0.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
4 years agonet/hns3: simplify process of some return values
Lijun Ou [Fri, 10 Apr 2020 11:09:22 +0000 (19:09 +0800)]
net/hns3: simplify process of some return values

Currently, the return value processing of some functions can be combined
and the result is that some codes can be optimized.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
4 years agovhost: prefix vDPA enum value for PCI address type
Maxime Coquelin [Thu, 26 Mar 2020 14:14:22 +0000 (15:14 +0100)]
vhost: prefix vDPA enum value for PCI address type

In order to avoid potential conflicts, rename the PCI_ADDR
enum value to VDPA_ADDR_PCI in vdpa_addr_type_enum.

All symbols referencing this enum are experimental, so it
does not break API policy.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agovhost: make IOTLB cache name unique among processes
Itsuro Oda [Wed, 11 Mar 2020 23:19:18 +0000 (08:19 +0900)]
vhost: make IOTLB cache name unique among processes

Currently, iotlb cache name is comprised of vid and virtqueue
index. For example, "iotlb_cache_0_0". Because vid is assigned
per process, iotlb cache name is not unique among multi processes.
For example a secondary process uses a vhost
(ex. eth_vhost0,iface=/tmp/sock0) and another secondary process
uses a vhost (ex. eth_vhost1,iface=/tmp/sock1), iotlb cache
name of both vhost ("iotlb_cache_0_0") are same and as a result
iotlb cache is broken.

This patch makes iotlb cache name unique among milti processes
by adding process id to the iotlb cache name.

The prefix of the name is shortened to "iotlb_" since the maximum
length of pool name is 25 bytes (RTE_MEMPOOL_NAMESIZE is 26).
Note that it is just 25 characters in maximum at the moment.
Here,
* pid_t == int: max 10 digits.
* vid < MAX_VHOST_DECICE(1024): max 4 digits.
* vq_index < VHOST_MAX_VRING(256): max 3 digits.

Fixes: d012d1f293f4 ("vhost: add IOTLB helper functions")
Cc: stable@dpdk.org
Signed-off-by: Itsuro Oda <oda@valinux.co.jp>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agovhost: remove unused variable
Xiaolong Ye [Sat, 7 Mar 2020 13:22:35 +0000 (21:22 +0800)]
vhost: remove unused variable

VHOST_FEATURES has been removed in previous refactoring.

Fixes: 0917f9d1f059 ("vhost: use new APIs to handle features")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agonet/virtio: fix outdated comment
Xiaolong Ye [Sat, 7 Mar 2020 13:22:34 +0000 (21:22 +0800)]
net/virtio: fix outdated comment

Fix comment that is no more correct as the code evolved.

Fixes: 9470427c88e1 ("net/virtio: do not store PCI device pointer at shared memory")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agonet/vhost: fix potential memory leak on close
Itsuro Oda [Thu, 5 Mar 2020 02:54:50 +0000 (11:54 +0900)]
net/vhost: fix potential memory leak on close

If a vhost device is closed before eth_dev_configure is done
to the device, internal resources allocated to the device
would not be freed. This patch fixes it.

Fixes: 3d01b759d267 ("net/vhost: delay driver setup")
Cc: stable@dpdk.org
Signed-off-by: Itsuro Oda <oda@valinux.co.jp>
Reviewed-by: Xiaolong Ye <xiaolong.ye@intel.com>
4 years agonet/vhost: enable promiscuous and multicast by default
Xiaolong Ye [Wed, 26 Feb 2020 13:45:34 +0000 (21:45 +0800)]
net/vhost: enable promiscuous and multicast by default

With this patch, the promiscuous and multicast fields are initialized as
enabled for vhost PMD by default, this allows the devices to be used
when running applications that attempt to enable promiscuous or
multicast mode.
Similar things have done for other virtual PMDs by commit f165210321c4
("drivers/net: enable promiscuous and multicast by default")

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agonet/vhost: add options for linear and external buffer
Sivaprasad Tummala [Wed, 26 Feb 2020 10:00:34 +0000 (10:00 +0000)]
net/vhost: add options for linear and external buffer

Added vHost PMD arguments 'linear-buffer' and 'ext-buffer'
to configure  'RTE_VHOST_USER_LINEARBUF_SUPPORT' and
'RTE_VHOST_USER_EXTBUF_SUPPORT' flags in the vhost library

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agovhost: fix packed ring zero-copy
Marvin Liu [Mon, 24 Feb 2020 15:14:19 +0000 (23:14 +0800)]
vhost: fix packed ring zero-copy

Available buffer ID should be stored in the zmbuf in the packed-ring
dequeue path. There's no guarantee that local queue avail index is
equal to buffer ID.

Fixes: d1eafb532268 ("vhost: add packed ring zcopy batch and single dequeue")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reported-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agoexamples/vhost_blk: use common macro for minimum
Thomas Monjalon [Wed, 19 Feb 2020 10:39:22 +0000 (11:39 +0100)]
examples/vhost_blk: use common macro for minimum

The macro RTE_MIN can be used in vhost-blk example.

This change implies fixing the sign of used_len as size_t
as defined in vhost_strcpy_pad().

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agovhost/crypto: add missing user protocol flag
Fan Zhang [Wed, 29 Jan 2020 10:19:37 +0000 (10:19 +0000)]
vhost/crypto: add missing user protocol flag

This patch fixes the vhost crypto missed
"VHOST_USER_PROTOCOL_F_CONFIG" flag problem during initialization.
Newer Qemu version requires this feature enabled.

Fixes: 939066d96563 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
4 years agonet/hinic: adds Tx queue xstats members
Xiaoyun Wang [Fri, 10 Apr 2020 09:21:47 +0000 (17:21 +0800)]
net/hinic: adds Tx queue xstats members

Because some apps may pass illegal parameters, driver increases
checks on illegal parameters and DFX statistics, which includes
sge_len0 and mbuf_null txq xstats member.

Signed-off-by: Xiaoyun Wang <cloud.wangxiaoyun@huawei.com>
4 years agonet/hinic/base: optimize log style
Xiaoyun Wang [Fri, 10 Apr 2020 09:21:46 +0000 (17:21 +0800)]
net/hinic/base: optimize log style

The patch adds space between descriptors and variables in log files.

Signed-off-by: Xiaoyun Wang <cloud.wangxiaoyun@huawei.com>
4 years agonet/hinic/base: fix PF firmware hot-active problem
Xiaoyun Wang [Fri, 10 Apr 2020 09:21:45 +0000 (17:21 +0800)]
net/hinic/base: fix PF firmware hot-active problem

When FW is hotactive which means updating the FW but not needs
to reboot OS, FW returns HINIC_DEV_BUSY_ACTIVE_FW for pf driver
because firmware is being reinitialized, at which point the cmdq
initialization that relies on the fw channel will fail, so driver
should reinit the cmdq when port start.

Fixes: 0194313b2df6 ("net/hinic/base: fix port start during FW hot update")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyun Wang <cloud.wangxiaoyun@huawei.com>
4 years agonet/null: add argument for no Rx
Ferruh Yigit [Mon, 2 Mar 2020 17:36:45 +0000 (17:36 +0000)]
net/null: add argument for no Rx

Add an new device argument 'no-rx', which will prevent PMD receiving
packets.

This is useful for testing when a PMD is needed only to send packets to.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/null: group device arguments
Ferruh Yigit [Mon, 2 Mar 2020 17:36:44 +0000 (17:36 +0000)]
net/null: group device arguments

Group device argument to the struct, to increase code readability.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/null: prefer unsigned int
Ferruh Yigit [Mon, 2 Mar 2020 17:36:43 +0000 (17:36 +0000)]
net/null: prefer unsigned int

Prefer 'unsigned int' storage type keyword against 'unsigned', this also
silence the checkpatch warnings.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/null: remove redundant check
Ferruh Yigit [Mon, 2 Mar 2020 17:36:42 +0000 (17:36 +0000)]
net/null: remove redundant check

There is no need to check if the argument exist or not,
`rte_kvargs_process` returns success if the argument is not provided at
all.

Fixes: c743e50c475f ("null: new poll mode driver")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agodoc: add net null PMD guide
Ferruh Yigit [Mon, 2 Mar 2020 17:36:41 +0000 (17:36 +0000)]
doc: add net null PMD guide

Net null PMD was missing documentation, adding it.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/null: fix secondary burst function selection
Ferruh Yigit [Mon, 2 Mar 2020 17:36:40 +0000 (17:36 +0000)]
net/null: fix secondary burst function selection

Secondary process uses the primary process device and while setting the
Rx/Tx functions it uses the device arguments from the secondary process
instead of the primary ones.

This may cause primary and secondary process use different Rx/Tx
functions unintentionally.

Fixes: bccc77a6a74a ("net/null: fix multi-process Rx and Tx")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agonet/i40e/base: update version
Jiaqi Min [Wed, 8 Apr 2020 10:05:23 +0000 (10:05 +0000)]
net/i40e/base: update version

Update base code release version in readme.

Signed-off-by: Jiaqi Min <jiaqix.min@intel.com>
Acked-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
4 years agonet/i40e/base: add constants for PTP pins
Jiaqi Min [Wed, 8 Apr 2020 10:05:22 +0000 (10:05 +0000)]
net/i40e/base: add constants for PTP pins

Introduce constants for handling PTP pins used for external
clock source.

Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Jiaqi Min <jiaqix.min@intel.com>
Acked-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Acked-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
4 years agonet/i40e/base: introduce device ID for V710-TL 5G
Jiaqi Min [Wed, 8 Apr 2020 10:05:21 +0000 (10:05 +0000)]
net/i40e/base: introduce device ID for V710-TL 5G

This change is adding new device ID and handling it in the same way as
X710-T*L head of family. A new device ID is for new V710-T*L adapter
supporting speeds up to 5G.

Signed-off-by: Zalfresso-Jundzillo <marekx.zalfresso-jundzillo@intel.com>
Signed-off-by: Jiaqi Min <jiaqix.min@intel.com>
Acked-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
4 years agonet/i40e/base: update X722/X710 FW API version to 1.10
Jiaqi Min [Wed, 8 Apr 2020 10:05:20 +0000 (10:05 +0000)]
net/i40e/base: update X722/X710 FW API version to 1.10

update X722/X710 FW API version to 1.10.

Signed-off-by: Piotr Azarewicz <piotr.azarewicz@intel.com>
Signed-off-by: Jiaqi Min <jiaqix.min@intel.com>
Acked-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
4 years agonet/iavf: support more flow patterns
Xiao Zhang [Fri, 3 Apr 2020 05:42:42 +0000 (13:42 +0800)]
net/iavf: support more flow patterns

Add patterns support for AH/ESP/L2TPV3OIP/PFCP.
Added patterns are as follows:
/* GTPU */
    eth/ipv4/udp/gtpu
    eth/ipv4/udp/gtpu/gtp_psc
/* ESP */
    eth/ipv4/esp  eth/ipv4/udp/esp
    eth/ipv6/esp  eth/ipv6/udp/esp
/* AH */
    eth/ipv4/ah  eth/ipv6/ah
/* L2TPV3 */
    eth/ipv4/l2tpv3oip  eth/ipv6/l2tpv3oip
/* PFCP */
    eth/ipv4/udp/pfcp  eth/ipv6/udp/pfcp

Signed-off-by: Xiao Zhang <xiao.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
4 years agonet/iavf: support generic flow API
Qiming Yang [Fri, 3 Apr 2020 05:42:41 +0000 (13:42 +0800)]
net/iavf: support generic flow API

This patch added iavf_flow_create, iavf_flow_destroy,
iavf_flow_flush and iavf_flow_validate support,
these are used to handle all the generic filters.

This patch supported basic L2, L3, L4 and GTPU patterns.

Signed-off-by: Qiming Yang <qiming.yang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
4 years agonet/pfe: fix double free of MAC address
Yunjian Wang [Thu, 9 Apr 2020 01:59:00 +0000 (09:59 +0800)]
net/pfe: fix double free of MAC address

The 'mac_addrs' freeing has been moved to rte_eth_dev_release_port(),
so freeing 'mac_addrs' like this in pfe_eth_exit() is unnecessary and
will cause double free.

Fixes: 67fc3ff97c39 ("net/pfe: introduce basic functions")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
4 years agoapp/testpmd: fix PPPoE flow command
Xiao Zhang [Tue, 31 Mar 2020 13:29:40 +0000 (21:29 +0800)]
app/testpmd: fix PPPoE flow command

The command line to create RTE flow for specific proto_id of PPPOES can
not work.

It was:
testpmd> flow create 0 ingress pattern proto_id
 proto_id [TOKEN]: match PPPoE session protocol identifier
testpmd> flow create 0 ingress pattern proto_id proto_id
 proto_id [TOKEN]: match PPPoE session protocol identifier
testpmd> flow create 0 ingress pattern proto_id proto_id proto_id
 proto_id [TOKEN]: match PPPoE session protocol identifier

The proto_id can not be set with previous implementation.

This patch is to fix this issue, and change the command line to:
testpmd> flow create 0 pattern pppoe_proto_id is xxxx

Fixes: 226c6e60c35b ("ethdev: add PPPoE to flow API")
Cc: stable@dpdk.org
Signed-off-by: Xiao Zhang <xiao.zhang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
4 years agonet/mlx5: reorganize fallback counter management
Suanming Mou [Tue, 7 Apr 2020 03:59:47 +0000 (11:59 +0800)]
net/mlx5: reorganize fallback counter management

Currently, the fallback counter is also allocated from the pool, the
specify fallback function code becomes a bit duplicate.

Reorganize the fallback counter code to make it reuse from the normal
counter code.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agonet/mlx5: split flow counter struct
Suanming Mou [Tue, 7 Apr 2020 03:59:46 +0000 (11:59 +0800)]
net/mlx5: split flow counter struct

Currently, the counter struct saves both the members used by batch
counters and none batch counters. The members which are only used
by none batch counters cost 16 bytes extra memory for batch counters.
As normally there will be limited none batch counters, mix the none
batch counter and batch counter members becomes quite expensive for
batch counter. If 1 million batch counters are created, it means 16 MB
memory which will not be used by the batch counters are allocated.

Split the mlx5_flow_counter struct for batch and none batch counters
helps save the memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agonet/mlx5: optimize flow counter handle type
Suanming Mou [Tue, 7 Apr 2020 03:59:45 +0000 (11:59 +0800)]
net/mlx5: optimize flow counter handle type

Currently, DV and verbs counters are both changed to indexed. It means
while creating the flow with counter, flow can save the indexed value to
address the counter.

Save the 4 bytes indexed value in the rte_flow instead of 8 bytes
pointer helps to save memory with millions of flows.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agonet/mlx5: change Direct Verbs counter to indexed
Suanming Mou [Tue, 7 Apr 2020 03:59:44 +0000 (11:59 +0800)]
net/mlx5: change Direct Verbs counter to indexed

This part of the counter optimize change the DV counter to indexed as
what have already done in verbs. In this case, all the mlx5 flow counter
can be addressed by index.

The counter index is composed of pool index and the counter offset in
the pool counter array. The batch and none batch counter dcs ID offset
0x800000 is used to avoid the mix up for the index. As batch counter dcs
ID starts from 0x800000 and none batch counter dcs starts from 0, the
0x800000 offset is added to the batch counter index to indicate the
index of batch counter.

The counter pointer in rte_flow struct will be aligned to index instead
of pointer. It will save 4 bytes memory for every rte_flow. With
millions of rte_flow, it will save MBytes memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agocommon/mlx5: add batch counter id offset
Suanming Mou [Tue, 7 Apr 2020 03:59:43 +0000 (11:59 +0800)]
common/mlx5: add batch counter id offset

This commit is a part for the DV counter optimization.

The batch counter dcs id starts from 0x800000 and none batch counter
starts from 0. As currently, the counter is changed to be indexed by
pool index and the offset of the counter in the pool counters_raw array.
It means now the counter index is same for batch and none batch counter.
Add the 0x800000 batch counter offset to the batch counter index helps
indicate the counter index is from batch or none batch container pool.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agonet/mlx5: change verbs counter allocator to indexed
Suanming Mou [Tue, 7 Apr 2020 03:59:42 +0000 (11:59 +0800)]
net/mlx5: change verbs counter allocator to indexed

This is part of the counter optimize which will save the indexed counter
id instead of the counter pointer in the rte_flow.

Place the verbs counter into the container pool helps the counter to be
indexed correctly independent with the raw counter.

The counter pointer in rte_flow will be changed to indexed value after
the DV counter is also changed to indexed.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agonet/mlx5: optimize counter release query generation
Suanming Mou [Tue, 7 Apr 2020 03:59:41 +0000 (11:59 +0800)]
net/mlx5: optimize counter release query generation

Query generation was introduced to avoid counter to be reallocated
before the counter statistics be fully updated. Since the counters
be released between query trigger and query handler may miss the
packets arrived in the trigger and handler gap period. In this case,
user can only allocate the counter while pool query_gen is greater
than the counter query_gen + 1 which indicates a new round of query
finished, the statistic is fully updated.

Split the pool query_gen to start_query_gen and end_query_gen helps
to have a better identify for the counter released in the gap period.
And it helps the counter released before query trigger or after query
handler can be reallocated more efficiently.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agonet/mlx5: fix counter container usage
Suanming Mou [Tue, 7 Apr 2020 03:59:40 +0000 (11:59 +0800)]
net/mlx5: fix counter container usage

As none-batch counter pool allocates only one counter every time, after
the new allocated counter pop out, the pool will be empty and moved to
the end of the container list in the container.

Currently, the new non-batch counter allocation maybe happened with new
counter pool allocated, it means the new counter comes from a new pool.
While new pool is allocated, the container resize and switch happens.
In this case, after the pool becomes empty, it should be added to the
new container pool list as the pool belongs.

Update the container pointer accordingly with pool allocation to avoid
add the pool to the incorrect container.

Fixes: 5382d28c2110 ("net/mlx5: accelerate DV flow counter transactions")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
4 years agonet/ena: update driver version to v2.1.0
Michal Krawczyk [Wed, 8 Apr 2020 08:29:21 +0000 (10:29 +0200)]
net/ena: update driver version to v2.1.0

The v2.1.0 is refactoring Tx and Rx paths, including few bug fixes and
is also adding a new features which are going to be available with the
newest hardware.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agodoc: add notes on ENA usage on metal instances
Michal Krawczyk [Wed, 8 Apr 2020 08:29:20 +0000 (10:29 +0200)]
doc: add notes on ENA usage on metal instances

As AWS metal instances are supporting IOMMU, the usage of igb_uio or
vfio-pci can lead to a problems (when to use which module), especially
that the vfio-pci isn't supporting SMMU on arm64.

To clear up the problem of using those modules in various setup
conditions (with or without IOMMU) on metal instances, more detailed
explanation was added.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: reuse zero length Rx descriptor
Michal Krawczyk [Wed, 8 Apr 2020 08:29:19 +0000 (10:29 +0200)]
net/ena: reuse zero length Rx descriptor

Some ENA devices can pass to the driver descriptor with length 0. To
avoid extra allocation, the descriptor can be reused by simply putting
it back to the device.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: refactor Tx
Michal Krawczyk [Wed, 8 Apr 2020 08:29:18 +0000 (10:29 +0200)]
net/ena: refactor Tx

The original Tx function was very long and was containing both cleanup
and the sending sections. Because of that it was having a lot of local
variables, big indentation and was hard to read.

This function was split into 2 sections:
  * Sending - which is responsible for preparing the mbuf, mapping it
    to the device descriptors and finally, sending packet to the HW
  * Cleanup - which is releasing packets sent by the HW. Loop which was
    releasing packets was reworked a bit, to make intention more visible
    and aligned with other parts of the driver.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: use macros for ring index operations
Michal Krawczyk [Wed, 8 Apr 2020 08:29:17 +0000 (10:29 +0200)]
net/ena: use macros for ring index operations

To improve code readability, abstraction was added for operating on IO
rings indexes.

Driver was defining local variable for ring mask in each function that
needed to operate on the ring indexes. Now it is being stored in the
ring as this value won't change unless size of the ring will change and
macros for advancing indexes using the mask has been added.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: limit refill threshold by fixed value
Michal Krawczyk [Wed, 8 Apr 2020 08:29:16 +0000 (10:29 +0200)]
net/ena: limit refill threshold by fixed value

Divider used for both Tx and Rx cleanup/refill threshold can cause too
big delay in case of the really big rings - for example if the 8k Rx
ring will be used, the refill won't trigger unless 1024 threshold will
be reached. It will also cause driver to try to allocate that much
descriptors.

Limiting it by fixed value - 256 in that case, would limit maximum
time spent in repopulate function.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: rework getting number of available descriptors
Michal Krawczyk [Wed, 8 Apr 2020 08:29:15 +0000 (10:29 +0200)]
net/ena: rework getting number of available descriptors

ena_com API should be preferred for getting number of used/available
descriptors unless extra calculation needs to be performed.

Some helper variables were added for storing values that are later
reused. Moreover, for limiting the value of sent/received packets to
the number of available descriptors, the RTE_MIN is used instead of
if function, which was doing similar thing but was less descriptive.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: refactor Rx
Michal Krawczyk [Wed, 8 Apr 2020 08:29:14 +0000 (10:29 +0200)]
net/ena: refactor Rx

* Split main Rx function into multiple ones - the body of the main
  was very big and further there were 2 nested loops, which were
  making the code hard to read
* Rework how the Rx mbuf chains are being created - Instead of having
  while loop which has conditional check if it's first segment, handle
  this segment outside the loop and if more fragments are existing,
  process them inside.
* Initialize Rx mbuf using simple function - it's the common thing for
  the 1st and next segments.
* Create structure for Rx buffer to align it with Tx path, other ENA
  drivers and to make the variable name more descriptive - on DPDK, Rx
  buffer must hold only mbuf, so initially array of mbufs was used as
  the buffers. However, it was misleading, as it was named
  "rx_buffer_info". To make it more clear, the structure holding mbuf
  pointer was added and now there is possibility to expand it in the
  future without reworking the driver.
* Remove redundant variables and conditional checks.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: disable meta caching
Michal Krawczyk [Wed, 8 Apr 2020 08:29:13 +0000 (10:29 +0200)]
net/ena: disable meta caching

In the LLQ (Low-latency queue) mode, the device can indicate that meta
data descriptor caching is disabled. In that case the driver should send
valid meta descriptor on every Tx packet.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: add Tx drops statistic
Michal Krawczyk [Wed, 8 Apr 2020 08:29:12 +0000 (10:29 +0200)]
net/ena: add Tx drops statistic

ENA device can report in the AENQ handler amount of Tx packets that were
dropped and not sent.

This statistic is showing global value for the device and because
rte_eth_stats is missing field that could indicate this value (it
isn't the Tx error), it is being presented as a extended statistic.

As the current design of extended statistics prevents tx_drops from
being an atomic variable and both tx_drops and rx_drops are only updated
from the AENQ handler, both were set as non-atomic for the alignment.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: remove memory barriers before doorbells
Michal Krawczyk [Wed, 8 Apr 2020 08:29:11 +0000 (10:29 +0200)]
net/ena: remove memory barriers before doorbells

The doorbell code is already issuing the doorbell by using rte_write.
Because of that, there is no need to do that before calling the
function.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: support large LLQ headers
Michal Krawczyk [Wed, 8 Apr 2020 08:29:10 +0000 (10:29 +0200)]
net/ena: support large LLQ headers

Default LLQ (Low-latency queue) maximum header size is 96 bytes and can
be too small for some types of packets - like IPv6 packets with multiple
extension. This can be fixed, by using large LLQ headers.

If the device supports larger LLQ headers, the user can activate them by
using device argument 'large_llq_hdr' with value '1'.

If the device isn't supporting this feature, the default value (96B)
will be used.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: refactor getting IO queues capabilities
Michal Krawczyk [Wed, 8 Apr 2020 08:29:09 +0000 (10:29 +0200)]
net/ena: refactor getting IO queues capabilities

Reading values from the device is about the maximum capabilities of the
device. Because of that, the names of the fields storing those values,
functions and temporary variables, should be more descriptive in order
to improve self documentation of the code.

In connection with this, the way of getting maximum queue size could be
simplified - no hardcoded values are needed, as the device is going to
send it's capabilities anyway.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena: set IO ring size to valid value
Michal Krawczyk [Wed, 8 Apr 2020 08:29:08 +0000 (10:29 +0200)]
net/ena: set IO ring size to valid value

IO rings were configured with the maximum allowed size for the Tx/Rx
rings. However, the application could decide to create smaller rings.

This patch is using value stored in the ring instead of the value from
the adapter which is indicating the maximum allowed value.

Fixes: df238f84c0a2 ("net/ena: recreate HW IO rings on start and stop")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena/base: update generation date and commit
Michal Krawczyk [Wed, 8 Apr 2020 08:29:07 +0000 (10:29 +0200)]
net/ena/base: update generation date and commit

The current ena_com version was generated on 25.09.2019.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena/base: fix indentation of multiple defines
Michal Krawczyk [Wed, 8 Apr 2020 08:29:06 +0000 (10:29 +0200)]
net/ena/base: fix indentation of multiple defines

As the alignment of the defines wasn't valid, it was removed at all, so
instead of using multiple spaces or tabs, the single space after define
name is being used.

Fixes: 99ecfbf845b3 ("ena: import communication layer")

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena/base: fix types for printing timestamps
Michal Krawczyk [Wed, 8 Apr 2020 08:29:05 +0000 (10:29 +0200)]
net/ena/base: fix types for printing timestamps

Because ena_com is being used by multiple platforms which are using
different C versions, PRIu64 cannot be used directly and must be defined
in the platform file.

Fixes: b2b02edeb0d6 ("net/ena/base: upgrade HAL for new HW features")

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena/base: use 48-bit memory addresses
Michal Krawczyk [Wed, 8 Apr 2020 08:29:04 +0000 (10:29 +0200)]
net/ena/base: use 48-bit memory addresses

ENA device is using 48-bit memory for IO. Because of that, the upper
limit had to be updated.

From the driver perspective, it's just a cosmetic change to make
definition of the structure 'ena_common_mem_addr' more descriptive and
the address value was verified anyway for the valid range in the
function 'ena_com_mem_addr_set()'.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena/base: add error logs when preparing Tx
Michal Krawczyk [Wed, 8 Apr 2020 08:29:03 +0000 (10:29 +0200)]
net/ena/base: add error logs when preparing Tx

To make the debugging easier, the error logs were added in the Tx path.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena/base: fix indentation in CQ polling
Michal Krawczyk [Wed, 8 Apr 2020 08:29:02 +0000 (10:29 +0200)]
net/ena/base: fix indentation in CQ polling

The spaces instead of tabs were used for the indent.

Fixes: 3adcba9a8987 ("net/ena: update HAL to the newer version")

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena/base: fix documentation of functions
Michal Krawczyk [Wed, 8 Apr 2020 08:29:01 +0000 (10:29 +0200)]
net/ena/base: fix documentation of functions

The documentation format was aligned and few typos were fixed.

Fixes: 99ecfbf845b3 ("ena: import communication layer")

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena/base: add accelerated LLQ mode
Michal Krawczyk [Wed, 8 Apr 2020 08:29:00 +0000 (10:29 +0200)]
net/ena/base: add accelerated LLQ mode

In order to use the accelerated LLQ (Low-lateny queue) mode, the driver
must limit the Tx burst and be aware that the device has the meta
caching disabled. In that situation, the meta descriptor must be valid
on each Tx packet.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena/base: remove extra properties strings
Michal Krawczyk [Wed, 8 Apr 2020 08:28:59 +0000 (10:28 +0200)]
net/ena/base: remove extra properties strings

This buffer was never used by the ENA PMD. It could be used for
debugging, but it's presence is redundant now.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena/base: rework interrupt moderation
Michal Krawczyk [Wed, 8 Apr 2020 08:28:58 +0000 (10:28 +0200)]
net/ena/base: rework interrupt moderation

This feature allows for adaptive interrupt moderation. It's not used by
the DPDK PMD, but is a part of the newest HAL version.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
4 years agonet/ena/base: remove conversion of indirection table
Michal Krawczyk [Wed, 8 Apr 2020 08:28:57 +0000 (10:28 +0200)]
net/ena/base: remove conversion of indirection table

After the indirection table is being saved in the device, there is no
need to convert it back, as it's already saved in host_rss_ind_tbl
array.

As a result, the call to the ena_com_ind_tbl_convert_from_device() is
not needed.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
4 years agonet/ena/base: fix testing for supported hash function
Michal Krawczyk [Wed, 8 Apr 2020 08:28:56 +0000 (10:28 +0200)]
net/ena/base: fix testing for supported hash function

There was a bug in ena_com_fill_hash_function(), which was causing bit to
be shifted left one bit too much.

To fix that, the ENA_FFS macro is being used (returning the location of
the first bit set), hash_function value is being subtracted by 1 if any
hash function is supported by the device and BIT macro is used for
shifting for better verbosity.

Fixes: 99ecfbf845b3 ("ena: import communication layer")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>