Lijun Ou [Mon, 2 Nov 2020 14:38:19 +0000 (22:38 +0800)]
net/hns3: cleanup includes
Some header files have included by others. Also,
some header files have a header file self-contained
error will trigger building warning. As a result,
it is unnecessary and move it into the correct
location.
Hongbo Zheng [Mon, 2 Nov 2020 14:38:17 +0000 (22:38 +0800)]
net/hns3: check quantity limiter support before using it
If hardware does not support QL (quantity limiter), the int_ql_max
is 0, software should confirm ql_value is less than int_ql_max
before write QL register. This patch add check of int_ql_max
value from firmware and delete the unused variable coalesce_mode.
Huisong Li [Mon, 2 Nov 2020 14:38:16 +0000 (22:38 +0800)]
net/hns3: fix configurations of port-level scheduling rate
Scheduling rate of port-level in hns3 PF driver configured to
hardware is obtained from firmware, which determines the
bandwidth capability of the port. The rate in firmware is
generally configured with the maximum value for network engine
supporting multiple rates, such as 10G and 25G. It may cause
the following issues:
1) When a 10G optical module is used on the network engine, scheduling
rate of this port will also be configured to hardware with 25G.
However, the MAC rate of this port is 10G. In this case, it is
unreasonable that the port scheduling rate is different from the MAC
rate.
2) If default speed in firmware is not the maximum value, the 25G port
may not reach the capability of the port.
Therefore, we fix configurations of port-level scheduling rate
according to updating of MAC link speed.
Fixes: 59fad0f32135 ("net/hns3: support link update operation") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com>
Chengchang Tang [Mon, 2 Nov 2020 14:38:14 +0000 (22:38 +0800)]
net/hns3: fix Tx checksum with fixed header length
Currently, the header length of all the layers are fixed, It would
lead to a csum error when the header length changed.
This patch fixes above problem by using the header length in mbuf
instead of the fixed header length to perform the TX cksum offload.
Fixes: bba636698316 ("net/hns3: support Rx/Tx and related operations") Cc: stable@dpdk.org Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com>
Chengchang Tang [Mon, 2 Nov 2020 14:38:13 +0000 (22:38 +0800)]
net/hns3: fix Tx checksum outer header prepare
Currently, there are two mistakes in Tx checksum outer header prepare.
1) Check whether the packet outer header is IPV4 based on PKT_TX_IPV4
which is incorrect.
2) For HIP08, the outer UDP cksum could not be offloaded. And driver
should ensure the outer udp cksum filed set to 0. In current code,
PKT_TX_UDP_CKSUM is used to determine whether the outer layer of
the packet is a UDP header. Actually, for tunnel TSO, the flag will
never be set.
For the first mistake, it is fixed by replacing PKT_TX_IPV4 with
PKT_TX_OUTER_IPV4. And the protocol number in L3 header is used to check
whether the outer L4 header is UDP.
Fixes: bba636698316 ("net/hns3: support Rx/Tx and related operations") Fixes: 6dca716c9e1d ("net/hns3: support TSO") Cc: stable@dpdk.org Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com>
Chengchang Tang [Mon, 2 Nov 2020 14:38:12 +0000 (22:38 +0800)]
net/hns3: limit promiscuous mode for VF
For Kunpeng920, both tx and rx promisc is set when the promisc mode
is enabled. In other words, all the ingress packets and the packets sent
from the PF and other VFs on the same physical port will be copied
to the function which set promisc mode on.
Kunpeng930 support to turn off the tx unicast promisc. A limit promisc
mode is introduced, which means turn off the tx unicast promisc when
promisc is set.
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com>
Xiaoyun Wang [Sat, 31 Oct 2020 03:38:35 +0000 (11:38 +0800)]
net/hinic: fix outer L3 length parse
This patch fixes outer_l3_len parse error when
PKT_TX_OUTER_IP_CKSUM is not set, which does not affect
checksum function, just be consistent with mbuf meta
information description.
The outer_l3_len is calculated wrong because 'vlan_hdr' is calculated
wrong, 'vlan_hdr' fixed and code refactored.
Zhenghua Zhou [Tue, 27 Oct 2020 06:42:52 +0000 (06:42 +0000)]
app/testpmd: do not allow dynamic change of core number
When the number of forwarding cores changed in runtime, the issue may
be encountered:
If the nbcore set little than current nbcore, the forwarding thread
will still running on the extra cores. Therefore, trying to stop
forwarding will hang testpmd, since it will wait for the extra cores to
stop.
So do not allow to change nbcore number when forwarding is running.
Fixes: 0c0db76f42ed ("app/testpmd: separate forward config setup from display") Cc: stable@dpdk.org Signed-off-by: Zhenghua Zhou <zhenghuax.zhou@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Simei Su [Mon, 2 Nov 2020 11:55:47 +0000 (19:55 +0800)]
net/iavf: fix supported RSS type
When a RSS rule with symmetric hash function, the RSS type shouldn't
carry with l3/l4 SRC/DST_ONLY. This patch adds invalid RSS type check
for the case.
Junfeng Guo [Mon, 2 Nov 2020 01:54:36 +0000 (09:54 +0800)]
net/ice: delete unsupported ptypes in default hash set
Ptypes for GTPU with inner SCTP are not supported in current DDP pkg.
Thus, delete them in the default hash set config function.
Also clean up the rss vsi when calling the hash set config function.
Dekel Peled [Sun, 1 Nov 2020 17:57:45 +0000 (17:57 +0000)]
common/mlx5: use general object type for cap index
PRM defines the general object types using positive numbers.
The same values are used as index for the relevant bit in HCA
capabilities general_obj_types bit mask.
net/mlx5: support flow tag and packet header miniCQEs
CQE compression allows us to save the PCI bandwidth and improve
the performance by compressing several CQEs together to a miniCQE.
But the miniCQE size is only 8 bytes and this limits the ability
to successfully keep the compression session in case of various
traffic patterns.
The current miniCQE format only keeps the compression session alive
in case of uniform traffic with the Hash RSS as the only difference.
There are requests to keep the compression session in case of tagged
traffic by RTE Flow Mark Id and mixed UDP/TCP and IPv4/IPv6 traffic.
Add 2 new miniCQE formats in order to achieve the best performance
for these traffic patterns: Flow Tag and Packet Header miniCQEs.
The existing rxq_cqe_comp_en devarg is modified to specify the
desired miniCQE format. Specifying 2 selects Flow Tag format
for better compression rate in case of RTE Flow Mark traffic.
Specifying 3 selects Checksum format (existing format for MPRQ).
Specifying 4 selects L3/L4 Header format for better compression
rate in case of mixed TCP/UDP and IPv4/IPv6 traffic.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
David Marchand [Mon, 19 Oct 2020 09:41:51 +0000 (11:41 +0200)]
net/mlx: remove separate ABI version for glue libraries
The glue libraries are tightly bound to the mlx drivers of a dpdk
version and are packaged with them.
Keeping a separate ABI version prevents us from installing two versions
of dpdk.
Maintaining this separate version just adds confusion.
Align the glue library ABI version to the global ABI version.
Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Ophir Munk [Tue, 27 Oct 2020 10:16:59 +0000 (10:16 +0000)]
common/mlx5/linux: replace malloc and free in glue
This commit replaces mlx5_malloc and mlx5_free calls with Linux calls
malloc and free in file mlx5_glue.c.
The current mlx5_malloc calls have no flags, alignment or socket
selection, so they are equivalent to calling malloc. Rdma-core itself
is using malloc. When using mlx5_malloc the glue library is dependent
on common_mlx5 library which must be compiled first. Not doing so and
in case ibverbs_link=dlopen will result in compilation failure:
mlx5_glue.c: undefined reference to `mlx5_malloc'.
To make all of this simpler and remove the common_mlx5 dependency - this
commit does the alloc/free replacements.
Fixes: 66914d19d135 ("common/mlx5: convert control path memory to unified malloc") Cc: stable@dpdk.org Signed-off-by: Ophir Munk <ophirmu@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Tal Shnaiderman [Mon, 26 Oct 2020 18:17:48 +0000 (20:17 +0200)]
common/mlx5: fix DevX SQ object creation
Fix wrong assignment of allow_multi_pkt_send_wqe
in mlx5_devx_cmd_create_sq.
The incorrect assignment was introduced in the initial
mlx5_devx_cmd_create_sq implementation.
sq_attr->flush_in_error_en is
mistakenly assigned to both allow_multi_pkt_send_wqe and
flush_in_error_en, it was detected during Windows PMD development.
The fix is simply assigning the right value in mlx5_devx_cmd_create_sq
to sq_attr->allow_multi_pkt_send_wqe
Fixes: ae18a1ae9692 ("net/mlx5: support Tx hairpin queues") Cc: stable@dpdk.org Signed-off-by: Tal Shnaiderman <talshn@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Long Li [Sat, 31 Oct 2020 00:24:09 +0000 (17:24 -0700)]
net/netvsc: control use of external mbuf on Rx
When receiving packets, netvsp puts data in a buffer mapped through UIO.
Depending on packet size, netvsc may attach the buffer as an external
mbuf. This is not a problem if this mbuf is consumed in the application,
and the application can correctly read data out of an external mbuf.
However, there are two problems with data in an external mbuf.
1. Due to the limitation of the kernel UIO implementation, physical
address of this external buffer is not exposed to the user-mode. If
this mbuf is passed to another driver, the other driver is unable to
map this buffer to iova.
2. Some DPDK applications are not aware of external mbuf, and may bug
when they receive an mbuf with external buffer attached.
Introduce a driver parameter "rx_extmbuf_enable" to control if netvsc
should use external mbuf for receiving packets. The default value is 0.
(netvsc doesn't use external mbuf, it always allocates mbuf and copy
data to mbuf) A non-zero value tells netvsc to attach external buffers
to mbuf on receiving packets, thus avoid copying memory.
The values for Rx and Tx copy break should be tunable rather
than hard coded constants.
The rx_copybreak sets the threshold where the driver uses an
external mbuf to avoid having to copy data. Setting 0 for copybreak
will cause driver to always create an external mbuf. Setting
a value greater than the MTU would prevent it from ever making
an external mbuf and always copy. The default value is 256 (bytes).
Likewise the tx_copybreak sets the threshold where the driver
aggregates multiple small packets into one request. If tx_copybreak
is 0 then each packet goes as a VMBus request (no copying).
If tx_copybreak is set larger than the MTU, then all packets smaller
than the chunk size of the VMBus send buffer will be copied; larger
packets always have to go as a single direct request. The default
value is 512 (bytes).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Long Li <longli@microsoft.com>
Michal Krawczyk [Fri, 30 Oct 2020 11:31:19 +0000 (12:31 +0100)]
net/ena/base: align IO CQ allocation to 4K
Latest generation HW requires IO completion queue descriptors to be
aligned to a 4K in order to achieve the best performance.
Because of that, the new allocation macros were added, which allows
driver to allocate the memory with specified alignment.
The previous allocation macros are now wrappers around the macros
doing the alignment, with the alignment value equal to cacheline size.
Fixes: b68309be44c0 ("net/ena/base: update communication layer for the ENAv2") Cc: stable@dpdk.org Signed-off-by: Ido Segev <idose@amazon.com> Signed-off-by: Michal Krawczyk <mk@semihalf.com> Reviewed-by: Igor Chauskin <igorch@amazon.com> Reviewed-by: Amit Bernstein <amitbern@amazon.com>
Michal Krawczyk [Fri, 30 Oct 2020 11:31:18 +0000 (12:31 +0100)]
net/ena: change name of supported PCI device IDs
The ID 0xEC21 is not associated with LLQ feature of the device, so it
would be misleading for the user. Because of that, the current
identifier is more precise.
Together with code update, the documentation was changed to reflect
current changes
Signed-off-by: Michal Krawczyk <mk@semihalf.com> Reviewed-by: Igor Chauskin <igorch@amazon.com> Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Michal Krawczyk [Fri, 30 Oct 2020 11:31:17 +0000 (12:31 +0100)]
net/ena: fix setting Rx checksum flags in mbuf
The driver was never setting PKT_RX_*_CKSUM_GOOD flags, so the only way
of checking if the checksum was checked was by testing for the
PKT_RX_*_CKSUM_BAD. In that situation, the application couldn't detect
if the checksum was valid or unknown, as unknown flag is equal to 0.
Moreover, the l3_csum_err value is only valid if the l3_proto is
indicating IPv4, so it shouldn't be checked for other protocols.
Fixes: 1173fca25af9 ("ena: add polling-mode driver") Cc: stable@dpdk.org Signed-off-by: Michal Krawczyk <mk@semihalf.com> Reviewed-by: Igor Chauskin <igorch@amazon.com> Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Hyong Youb Kim [Fri, 30 Oct 2020 07:27:49 +0000 (00:27 -0700)]
net/enic: fix header sizes when copying flow patterns
Several functions use sizeof(struct rte_flow_item_eth) and
sizeof(struct rte_flow_item_ipv6) when copying headers. These sizes
used to coincide with the sizes of rte_ether_hdr and
rte_ipv6_hdr. But, with recently added fields, rte_flow_item_eth and
rte_flow_item_ipv6 have grown in size. Use sizeof(rte_ether_hdr) and
sizeof(rte_ipv6_hdr) instead.
Coverity issue: 363572, 363573 Fixes: ea7768b5bba8 ("net/enic: add flow implementation based on Flow Manager API") Cc: stable@dpdk.org Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
Hongbo Zheng [Thu, 29 Oct 2020 12:51:56 +0000 (20:51 +0800)]
net/hns3: check setting VF PCI bus return value
Currently hns3vf_reinit_dev only judge whether the return value of
setting PCI bus function is not 0, while it will return a negative
value when execute failed.
Fixes: 243651cb6c8c ("net/hns3: check PCI config space reads") Cc: stable@dpdk.org Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com>
Chengchang Tang [Thu, 29 Oct 2020 12:51:55 +0000 (20:51 +0800)]
net/hns3: fix clearing HW ring after queue stop
Currently, the rx HW ring is not cleared after queue stop.
When there are packets remaining in the HW rings and the
queues have been stopped, if upper layer user calls the
rx_burst function at this time, an illegal memory access
will occur due to the sw rings has been released.
This patch fix this by reset the sw ring after disable the
queue.
Fixes: fa29fe45a7b4 ("net/hns3: support queue start and stop") Cc: stable@dpdk.org Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com>
Huisong Li [Thu, 29 Oct 2020 12:51:54 +0000 (20:51 +0800)]
net/hns3: fix data type to store queue number
Currently, u8 type variable is used to control to release fake queues in
hns3_fake_rx/tx_queue_config function. Although there is no case in
which more than 256 fake queues are created in hns3 network engine, it
is unreasonable to compare u8 variable with u16 variable.
Fixes: a951c1ed3ab5 ("net/hns3: support different numbers of Rx and Tx queues") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com>
Hongbo Zheng [Thu, 29 Oct 2020 12:51:53 +0000 (20:51 +0800)]
net/hns3: fix unchecked return value
There are coverity defects related "calling
hns3_reset_all_tqps without checking return value
in hns3_do_start".
This patch fixes the warning by add "void" declaration
because here is exception handling, hns3_reset_all_tqps
will have the corresponding error message if it is
handled incorrectly, so it is not necessary to check
hns3_reset_all_tqps return value, here keep ret as the
error code causing the exception.
Coverity issue: 363048 Fixes: fa29fe45a7b4 ("net/hns3: support queue start and stop") Cc: stable@dpdk.org Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com>
Chengchang Tang [Thu, 29 Oct 2020 12:51:52 +0000 (20:51 +0800)]
net/hns3: fix packet type report in Rx
Currently, hns3 supports recognizing a lot of ptypes, but most
tunnel packet types are not reported to the API
rte_eth_dev_get_supported_ptypes.
And there are some errors in L2 and L3 packet recognition. The
ARP and LLDP are classified to L3 field in RX descriptor. So,
the ptype of LLDP and ARP packets will be set twice. And ptypes
are assigned by bitwise OR, which will eventually cause the ptype
result to be incorrect.
Besides, when a packet with only L2 header, its ptype will not
report by hns3 PMD. This is because the L2/L3 ptype table is not
initialized properly. In this case, the table query result is 0
by default.
As a result, it fixes missing supported ptypes and the mistake in
L2/L3 packet recognition and the unreported L2 packet ptype by
reporting its L2 type when the L3 type unrecognized..
Fixes: bba636698316 ("net/hns3: support Rx/Tx and related operations") Cc: stable@dpdk.org Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com>
Huisong Li [Thu, 29 Oct 2020 12:51:51 +0000 (20:51 +0800)]
net/hns3: fix RSS max queue id allowed in multi-TC
Currently, driver uses the maximum number of queues configured by user
as the maximum queue id that can be specified by the RSS rule or the
reta_update api. It is unreasonable and may trigger an incorrect
behavior in the multi-TC scenario. The driver must ensure that the queue
id configured in the redirection table must be within the range of the
number of queues allocated to a TC.
Fixes: c37ca66f2b27 ("net/hns3: support RSS") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com>
Lijun Ou [Thu, 29 Oct 2020 12:51:50 +0000 (20:51 +0800)]
net/hns3: get number of used descriptors of Rx queue
Implement the available and used rxd number count function.
In Kunpeng series, the NIC hardware supports to read the bd numbers
which wait processed from the hardware FBD (Full Buffer Descriptor),
and the driver maintains the bd number to be written back hardware.
Compare the number of FBDs with the number of BDs to be written back to
the hardware.
The number of used descriptors of a rx queue is computed as follows:
The fbd numbers of reading from FBD register plus the bd numbers to be
written back to hardware maintained by the driver.
Jeff Guo [Fri, 30 Oct 2020 08:40:30 +0000 (16:40 +0800)]
net/iavf: support flex desc metadata extraction
Enable metadata extraction for flexible descriptors in AVF, that would
allow network function directly get metadata without additional parsing
which would reduce the CPU cost for VFs. The enabling metadata
extractions involve the metadata of VLAN/IPv4/IPv6/IPv6-FLOW/TCP/MPLS
flexible descriptors, and the VF could negotiate the capability of
the flexible descriptor with PF and correspondingly configure the
specific offload at receiving queues.
Signed-off-by: Jeff Guo <jia.guo@intel.com> Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Haiyue Wang [Tue, 27 Oct 2020 01:23:28 +0000 (09:23 +0800)]
net/ice: rename dynamic mbuf name
Rename the dynamic mbuf name to 'intel_pmd_xxx' format, so that the
Intel PMD which has the protocol extraction feature will share the
same dynamic field/flags space in mbuf.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Bing Zhao [Thu, 29 Oct 2020 05:35:25 +0000 (13:35 +0800)]
app/testpmd: fix eCPRI command line style
In the current implementation of eCPRI flow item parsing of the CLI,
the token items in the list are not connected properly.
A command containing "rtc_ctrl rtc_id spec 14857 rtc_id mask 0xff00"
will be considered invalid. In order to support spec with mask, the
common entry needs to be typed twice and the whole command will be
too long.
By changing the token lists, it could support spec with mask without
backing from the entry of the item.
Intel SSE has __m128i, but ARMv8 has __uint128_t. So, add compat
efsys_uint128_t to be used in driver source and have either __u128i
or __uint128_t behind.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Lijun Ou [Wed, 21 Oct 2020 08:54:43 +0000 (16:54 +0800)]
net/hns3: enable RSS for IPv6-SCTP dst/src port fields
For Kunpeng930 NIC hardware, it supports to use dst/src port to
RSS hash for ipv6-sctp packet type. However, the Kunpeng920 NIC
hardware is different with it. The Kunpeng920 NIC only supports
dst/src ip to RSS hash for ipv6-sctp packet type.
Andrew Rybchenko [Thu, 22 Oct 2020 09:42:29 +0000 (10:42 +0100)]
ethdev: remove legacy SYN filter type support
Instead of SYN filter RTE flow API should be used.
Move corresponding definitions to ethdev internal driver API
since it is used by drivers internally.
Preserve RTE_ETH_FILTER_SYN because of it as well.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Haiyue Wang <haiyue.wang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Andrew Rybchenko [Thu, 22 Oct 2020 09:42:26 +0000 (10:42 +0100)]
ethdev: remove legacy EtherType filter type support
Instead of EtherType filter RTE flow API should be used.
Move corresponding definitions to ethdev internal driver API
since it is used by drivers internally.
Preserve RTE_ETH_FILTER_ETHERTYPE because of it as well.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Haiyue Wang <haiyue.wang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Xueming Li [Wed, 28 Oct 2020 10:44:39 +0000 (10:44 +0000)]
vdpa/mlx5: specify lag port affinity
If set TIS lag port affinity to auto, firmware assign port affinity on
each creation with Round Robin. In case of 2 PFs, if create virtq,
destroy and create again, then each virtq will get same port affinity.
To resolve this fw limitation, this patch sets create TIS with specified
affinity for each PF.
Fixes: bff735011078 ("vdpa/mlx5: prepare virtio queues") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Patrick Fu [Tue, 27 Oct 2020 08:50:19 +0000 (16:50 +0800)]
vhost: fix uninitialized local variable
This patch initializes a local parameter in async data path to avoid
compiler warnings.
Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring") Cc: stable@dpdk.org Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Patrick Fu [Tue, 27 Oct 2020 02:06:08 +0000 (10:06 +0800)]
vhost: fix guest/host physical address conversion
gpa_to_hpa() function almost always fails due to the wrong setup of
the binary tree search key. Since there has already been a similar
function gpa_to_first_hpa() available in the vhost, instead of fixing
the issue in its original logic, gpa_to_hpa() function is rewritten to
be a wrapper of the gpa_to_first_hpa() to avoid code redundancy.
Fixes: e246896178e6 ("vhost: get guest/host physical address mappings") Fixes: faa9867c4da2 ("vhost: use binary search in address conversion") Cc: stable@dpdk.org Signed-off-by: Patrick Fu <patrick.fu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Adrian Moreno [Mon, 26 Oct 2020 16:39:30 +0000 (17:39 +0100)]
net/virtio-user: set status on socket reconnect
Newer vhost-user backends will rely on SET_STATUS to start the device
so this required to support them.
Fixes: 57912824615f ("net/virtio-user: support vhost status setting") Cc: stable@dpdk.org Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Adrian Moreno [Mon, 26 Oct 2020 16:39:29 +0000 (17:39 +0100)]
net/virtio-user: do not assume vhost status feature
There are some status reads and updates that need to happen before the
protocol features are negotiated. Therefore, assuming the backend does
support this feature can lead to failures.
On server mode, do not assume the backend supports
VHOST_USER_PROTOCOL_F_STATUS. Activate it back on reconnection and
clear it on disconnection.
Fixes: 57912824615f ("net/virtio-user: support vhost status setting") Cc: stable@dpdk.org Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Adrian Moreno [Mon, 26 Oct 2020 16:39:27 +0000 (17:39 +0100)]
net/virtio-user: ignore result if status is unsupported
GET/SET STATUS is an optional feature, so it may not be negotiated. In
that case, the VIRTIO_GET_STATUS call will not update the status (given
as a pointer argument). Failing to identify this case would lead to
undefined behavior as the device status will be updated with the value
of a stack-allocated variable.
To fix this, return ENOTSUP if the feature is not supported and, in that
case, don't update device status.
Fixes: 44102e6298e7 ("net/virtio: check protocol feature in user backend")
Cc stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Adrian Moreno [Mon, 26 Oct 2020 16:39:26 +0000 (17:39 +0100)]
net/virtio-user: do not assume features are negotiated
According to the virtio spec, ACK and DRIVER status bits should be set
before feature negotiation.
However, until the protocol features are negotiated, the driver does not
know if the device actually supports those vhost-user messages.
Therefore, until FEATURES_OK is set, the GET/SET_STATUS messages should
not be sent.
Fixes: 57912824615f ("net/virtio-user: support vhost status setting") Cc: stable@dpdk.org Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use the relaxed ordering and explicit
memory barrier for Clock Queue and timestamps synchronization.