Didier Pallard [Wed, 28 Mar 2018 15:43:49 +0000 (17:43 +0200)]
net/vmxnet3: skip empty segments in transmission
Packets containing empty segments are dropped by hypervisor, prevent
this case by skipping empty segments in transmission.
Also drop empty mbufs to be sure that at least one segment is transmitted
for each mbuf.
Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Acked-by: Yong Wang <yongwang@vmware.com>
Didier Pallard [Wed, 28 Mar 2018 15:43:48 +0000 (17:43 +0200)]
net/vmxnet3: ignore empty segments in reception
When several TCP fragments are contained in a packet that is only one mbuf
segment long, vmxnet3 receives an empty segment following first one, that
contains offload information. In current version, this segment is
propagated as is to upper application.
Remove those empty segments directly when receiving buffers, they may
generate unneeded extra processing in the upper application.
Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Acked-by: Yong Wang <yongwang@vmware.com>
Didier Pallard [Wed, 28 Mar 2018 15:43:47 +0000 (17:43 +0200)]
net/vmxnet3: guess MSS if not provided in LRO mode
Not so old variants of vmxnet3 do not provide MSS value along with
LRO packet. When this case happens, try to guess MSS value with
information at hand.
Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Acked-by: Yong Wang <yongwang@vmware.com>
Didier Pallard [Wed, 28 Mar 2018 15:43:45 +0000 (17:43 +0200)]
net/vmxnet3: fix Rx offload information in multiseg packets
In case we are working on a multisegment buffer, most bit are set
in last segment of the buffer. Correctly look at those bits in eop part
of the rx_offload function.
Fixes: 2fdd835f992c ("vmxnet3: support jumbo frames") Cc: stable@dpdk.org Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Acked-by: Yong Wang <yongwang@vmware.com>
Didier Pallard [Wed, 28 Mar 2018 15:43:44 +0000 (17:43 +0200)]
net/vmxnet3: gather offload data on first and last segment
Offloads are split between first and last segment of a packet.
Call a single vmxnet3_rx_offload function that will contain all
offload operations. This patch does not introduce any code modification.
Pass a vmxnet3_hw as parameter to the function, it is not presently
used in this patch, but will be later used for TSO offloads.
Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Acked-by: Yong Wang <yongwang@vmware.com>
Didier Pallard [Wed, 28 Mar 2018 15:43:43 +0000 (17:43 +0200)]
net/vmxnet3: return unknown IPv4 extension len ptype
Rather than parsing IP header to get proper ptype to return, just return
RTE_PTYPE_L3_IPV4_EXT_UNKNOWN, that tells application that we have an IP
packet with unknown header length.
Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Acked-by: Yong Wang <yongwang@vmware.com>
The memory region is [start, end), so if the memseg of 'end' isn't
allocated yet, the returned memseg will have zero entries and this will
make 'end' zero (nil).
Fixes: c2fe5823224a ("net/mlx4: use virt2memseg instead of iteration") Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The memory region is [start, end), so if the memseg of 'end' isn't
allocated yet, the returned memseg will have zero entries and this will
make 'end' zero (nil).
Fixes: 718e35999c96 ("net/mlx5: use virt2memseg instead of iteration") Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Initialize mbuf->data_off to RTE_PKTMBUF_HEADROOM after allocation.
Without this, it might be possible that the DMA address provided
to the HW may not be in sync to what is indicated to the application
in bnxt_rx_pkt.
In some cases bnxt_hwrm_cfa_l2_set_rx_mask is being called before
VNICs are allocated. The FW returns an error in such cases.
Move bnxt_init_nic to bnxt_dev_init such that the ids are initialized
to an invalid id.
Prevent sending the command to the FW only with a valid vnic id.
Chas Williams [Sun, 18 Mar 2018 01:45:52 +0000 (21:45 -0400)]
net/vmxnet3: keep link state consistent
The vmxnet3 never attempts link speed negotiation. As a virtual device
the link speed is vague at best. However, it is important for certain
applications, like bonding, to see a consistent link_status. 802.3ad
requires that only links of the same cost (link speed) be enslaved.
Keeping the link status consistent in vmxnet3 avoids races with bonding
enslavement.
Fixes: 1e3a958f40b3 ("ethdev: fix link autonegotiation value") Cc: stable@dpdk.org Signed-off-by: Chas Williams <chas3@att.com> Acked-by: Yong Wang <yongwang@vmware.com>
net/vmxnet3: increase Rx data ring descriptor size
Vmxnet3 driver supports receive data ring viz. a set of small sized
buffers that are always mapped by the emulation. If a packet fits into
the receive data ring buffer, the emulation delivers the packet via the
receive data ring.
Increasing the receive data ring descriptor size from 128 to 256
showed performance gains as high as 5% for packets smaller than 256.
Signed-off-by: Shraddha Joshi <jshraddha@vmware.com> Acked-by: Jin Heo <heoj@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Acked-by: Boon Ang <bang@vmware.com> Acked-by: Yong Wang <yongwang@vmware.com>
This patch provides a fix for PCI function level reset after an
ungraceful exit from an application. The fix is to enable internal
target read as part of device attach before getting device information
from device config space, device itself and shared memory. In addition
to that, add a 200ms delay for the recovery flow to complete.
app/testpmd: fix missing boolean values in flow command
Original implementation lacks the on/off toggle.
This patch shows up as a fix because it has been a popular request ever
since the first DPDK release with the original implementation but was never
addressed.
Except for a list of queues, RSS configuration (hash key and fields) cannot
be specified from the flow command line and testpmd does not provide safe
defaults either.
In order to validate their implementation with testpmd, PMDs had to
interpret its NULL RSS configuration parameters somehow, however this has
never been valid to begin with.
This patch makes testpmd always provide default values.
The list of RSS types to use is exclusively taken from the global "rss_hf"
variable, itself configured through the "port config all rss" command or
--rss-ip/--rss-udp command-line options.
app/testpmd: fix lack of flow action configuration
Configuration structure is not optional with flow rule actions that expect
one; this pointer is not supposed to be NULL and PMDs should not have to
verify it.
Like pattern item spec/last/mask fields, it is currently set when at least
one configuration parameter is provided on the command line. This patch
sets it as soon as an action is created instead.
When an unsupported hash type is part of a RSS configuration structure, it
is silently ignored instead of triggering an error. This may lead
applications to assume that such types are accepted, while they are in fact
not part of the resulting flow rules.
John Daley [Wed, 18 Apr 2018 00:00:20 +0000 (17:00 -0700)]
net/enic: fix uninitialized variable
A local variable was used without initialization and triggered a
coverity issue.
Is is fixed here, but there is no ill effect of not initializing
the variable in this case. 'rxq_interrupt_offset' is irrelevant
if 'rxq_interrupt_enable' is not set (the condition caught by
coverity).
Coverity issue: 268314 Fixes: fc2c8c0668fd ("net/enic: use Tx completion index instead of messages") Cc: stable@dpdk.org Signed-off-by: John Daley <johndale@cisco.com> Reviewed-by: Hyong Youb Kim <hyonkim@cisco.com>
net/thunderx: fix MTU configuration for jumbo packets
thunderx pmd driver passes dev_info.max_rx_pktlen as
9200 (via rte_eth_dev_info_get()) to application.
But, when application tries to set MTU as
(9200 - sizeof(ethernet_header_t)) the operation fails
because of missing CRC and VLAN additions.
This patch fixes the following for thunderx pmd driver:
- Sets NIC_HW_MAX_FRS to 9216 (instead of 9200)
- Sets NIC_HW_MAX_MTU to 9190 (NIC_HW_MAX_FRS - ETH_HLEN
- ETHER_CRC_LEN - 2*VLAN_HLEN)
- Sets dev_info->max_rx_pkt_len to NIC_HW_MAX_MTU +
ETH_HLEN (instead of 9200)
- Allows rte_eth_dev_set_mtu() to pass if application
(like VPP) calls rte_eth_dev_set_mtu() before
rte_eth_dev_start() by putting appropriate check for
dev->data->dev_started
Fixes: 65d9804edc05 ("net/thunderx: support MTU configuration") Cc: stable@dpdk.org Signed-off-by: Nitin Saxena <nitin.saxena@caviumnetworks.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Wei Dai [Mon, 16 Apr 2018 08:14:25 +0000 (16:14 +0800)]
net/ixgbe: fix segfault in configuring VF VLAN strip
This patch fixes a segment fault in ixgbevf_vlan_offload_set( )
when a Rx queue with index < max_rx_queues is not setup.
For such queue, rxq = dev->data->rx_queues[i] is null pointer.
Fixes: 860a94d3c692 ("net/ixgbe: support VLAN strip per queue offloading in VF") Signed-off-by: Wei Dai <wei.dai@intel.com> Tested-by: Xueqin Lin <xueqin.lin@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Wei Dai [Tue, 17 Apr 2018 07:43:50 +0000 (15:43 +0800)]
net/ixgbe: fix missing support of multi-segs offloading
This patch adds missing supported Tx multi-segs offloading.
Fixes: 51215925a32f ("net/ixgbe: convert to new Tx offloads API") Cc: stable@dpdk.org Signed-off-by: Wei Dai <wei.dai@intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add ixgbe MDIO lock/unlock and access APIs to read and write registers
using specific device address. This provides MDIO access to any devices
that are not associated with the autoprobed PHY.Export these APIs via
the map file
Signed-off-by: Shweta Choudaha <shweta.choudaha@att.com> Reviewed-by: Chas Williams <chas3@att.com> Reviewed-by: Luca Boccassi <bluca@debian.org> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Since we are storing the mem_zone address for each ring created,
we are freeing the same address multiple times.
For example the memory zone created for Rx is being freed during
Rx ring cleanup, AGG ring cleanup and CQ cleanup.
Avoid this by storing the memory zone address in RXQ instead and
free it as a part of queue_release dev_op.
In the same way do the same for TX queues as well.
The fw_l2_filter_id for a ntuple filter is needed only for the lifetime
of the ntuple filter. Once the filter is free, reset the field.
The associated l2_filter will be freed as a part of its own cleanup.
The hwrm_queue_qportcfg command has been extended to determine
the COS queue that a Tx ring needs to use. This patch adds code
to determine the information from the FW and use it while
creating the Tx rings.
bnxt_hwrm_clear_l2_filter needs to be called only if the filter type
is L2 and not otherwise.
Also check for the return value of bnxt_hwrm_clear_l2_filter().
We are wrongly freeing up a filter in the driver while it is still
configured in the HW. This can cause incorrect L2 filter id to be
used for filters created subsequently.
net/enic: enable overlay offload for VXLAN and GENEVE
Recent NIC models support overlay offload. The overlay offload
feature enables the following on the NIC.
- Rx/Tx checksum offloads for both inner and outer packets.
- Rx inner packet type classification.
- TSO.
- Inner RSS.
TX descriptors do not require any changes, except the header length
for TSO. The NIC parses outer/inner packets and performs offloads on
them as necessary. The header length for tunneled TSO includes both
inner and outer headers.
The NIC actually parses and performs the above for NVGRE as well. DPDK
currently has no offload flags for NVGRE, and the hardware has no
controls to individually enable tunnel types either. So do nothing for
now.
The driver enables overlay offload by default. Add a devargs
'disable-overlay=<0|1>' to allow the app to disable it.
Also update the enic guide doc.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
When non IP packets are sent on TUN interface, the logic put Ipv6 as
protocol field in header. With the current patch, the check is modified
for ipv4, ipv6 and non ip.
Ivan Malov [Tue, 17 Apr 2018 15:18:38 +0000 (16:18 +0100)]
net/sfc: add missing Rx fini on RSS setup fail path
Fixes: 4ec1fc3ba881 ("net/sfc: add basic stubs for RSS support on driver attach") Cc: stable@dpdk.org Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
David Marchand [Mon, 16 Apr 2018 09:40:17 +0000 (11:40 +0200)]
net/enic: add primary MAC address handler
Modified enic_del_mac_address() to get a return value from the vnic layer.
Reused the .mac_addr_add and .mac_addr_del callbacks code to implement
primary mac address handler.
Signed-off-by: David Marchand <david.marchand@6wind.com> Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
This patch allows to use another MAC address than the one coming
with the NIC by default.
The change requires to tell the vNIC after writing into the port
BAR space. The change will fail if the port is enabled and the
vNIC does not support a live address change.
Xiao Wang [Tue, 17 Apr 2018 07:06:23 +0000 (15:06 +0800)]
net/ifcvf: add ifcvf vDPA driver
The IFCVF vDPA (vhost data path acceleration) driver provides support for
the Intel FPGA 100G VF (IFCVF). IFCVF's datapath is virtio ring compatible,
it works as a HW vhost backend which can send/receive packets to/from
virtio directly by DMA.
Different VF devices serve different virtio frontends which are in
different VMs, so each VF needs to have its own DMA address translation
service. During the driver probe a new container is created, with this
container vDPA driver can program DMA remapping table with the VM's memory
region information.
Key vDPA driver ops implemented:
- ifcvf_dev_config:
Enable VF data path with virtio information provided by vhost lib,
including IOMMU programming to enable VF DMA to VM's memory, VFIO
interrupt setup to route HW interrupt to virtio driver, create notify
relay thread to translate virtio driver's kick to a MMIO write onto HW,
HW queues configuration.
- ifcvf_dev_close:
Revoke all the setup in ifcvf_dev_config.
Live migration feature is supported by IFCVF and this driver enables
it. For the dirty page logging, VF helps to log for packet buffer write,
driver helps to make the used ring as dirty when device stops.
Because vDPA driver needs to set up MSI-X vector to interrupt the
guest, only vfio-pci is supported currently.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Signed-off-by: Rosen Xu <rosen.xu@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Xiao Wang [Tue, 17 Apr 2018 07:06:21 +0000 (15:06 +0800)]
vfio: add multi container support
This patch adds APIs to support container create/destroy and device
bind/unbind with a container. It also provides API for IOMMU programing
on a specified container.
A driver could use "rte_vfio_container_create" helper to create a new
container from eal, use "rte_vfio_container_group_bind" to bind a device
to the newly created container. During rte_vfio_setup_device the container
bound with the device will be used for IOMMU setup.
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Xiao Wang [Tue, 17 Apr 2018 07:06:20 +0000 (15:06 +0800)]
vfio: extend data structure for multi container
Currently eal vfio framework binds vfio group fd to the default
container fd during rte_vfio_setup_device, while in some cases,
e.g. vDPA (vhost data path acceleration), we want to put vfio group
to a separate container and program IOMMU via this container.
This patch extends the vfio_config structure to contain per-container
user_mem_maps and defines an array of vfio_config. The next patch will
base on this to add container API.
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Maxime Coquelin [Fri, 27 Apr 2018 09:04:38 +0000 (11:04 +0200)]
vhost/crypto: fix build with gcc 4.7.2
Build error has been reported by Intel build system:
SUSE12SP3_64 / Linux 3.7.10-1 / GCC 4.7.2
lib/librte_vhost/vhost_crypto.c: In function ‘rte_vhost_crypto_set_zero_copy’:
lib/librte_vhost/vhost_crypto.c:1192:2: error:
comparison of unsigned expression < 0 is always false
As enums can be either signed or unsigned, this patch removes
the negative check and cast to unsigned the upper limit check.
Fixes: 939066d96563 ("vhost/crypto: add public function implementation") Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thomas Monjalon [Fri, 27 Apr 2018 00:54:00 +0000 (02:54 +0200)]
eal: fix build with glibc < 2.16
The fake getauxval function does not use its parameter.
So the compiler raised this error:
lib/librte_eal/common/eal_common_cpuflags.c:25:25: error:
unused parameter 'type'
Fixes: 2ed9bf330709 ("eal: abstract away the auxiliary vector") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If mempool manager supports object blocks (physically and virtual
contiguous set of objects), it is sufficient to get the first
object only and the function allows to avoid filling in of
information about each block member.
Signed-off-by: Artem V. Andreev <artem.andreev@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Thomas Monjalon [Wed, 25 Apr 2018 13:03:39 +0000 (15:03 +0200)]
app/pdump: remove unused socket path options
The options --server-socket-path and --client-socket-path
were said to be deprecated and will be removed soon.
No need to wait for removing application options which have
no effect, and can confuse the user.
Fixes: 660098d61f57 ("pdump: use generic multi-process channel") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Andrew Rybchenko [Wed, 25 Apr 2018 17:00:37 +0000 (18:00 +0100)]
test/mempool: fix autotest retry
Single producer / single consumer mempool handle is stored in static
variable and the mempool allocated if stored value is NULL.
If the mempool is freed, NULL should be restored to make sure that
the mempool is allocated once again next time when the test is run.
Phil Yang [Tue, 6 Feb 2018 02:21:38 +0000 (10:21 +0800)]
test: fix memory flags test for low NUMA nodes number
Since RTE_MAX_NUMA_NODES is configurable, the existing socket number
could greater than RTE_MAX_NUMA_NODES. Optimize test case to cover this
situation.(i.e RTE_MAX_NUMA_NODES=1)
Fixes: 45f1b6e8680a ("app: add new tests on eal flags") Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com>
A typical distribution will compile with default config and all
buses enabled. Therefore every driver should be silent and not
log anything for this normal case.
This patch gets rid of these messages when running on basic x86
environment such as bare metal or VM.
fslmc: DPAA2: DPRC not available
fslmc: FSLMC Bus Not Available. Skipping
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
eal: make semantics of lcore role function more intuitive
rte_lcore_has_role() returns 0 if role of lcore matches requested
role. The return value of the API is confusing, and this is a known
problem with a deprecation notice announcing the change to more
intuitive semantics:
Commit 064518f68d48 ("doc: announce EAL API change to lcore role function")
Implement changes announced in the deprecation notice, and remove it.
Also, fix usages of this API to reflect the change. Control thread patches
expected new behavior and were broken before, now they are fixed as well.
Fixes: d651ee4919cd ("eal: set affinity for control threads") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
event/octeontx: fix snprintf mempool name overflow
Bugzilla-ID: 28 Fixes: f874c1eb1519 ("event/octeontx: create and free timer adapter") Reported-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com> Tested-by: Harry van Haaren <harry.van.haaren@intel.com>
This commit removes the experimental tags from the
service cores functions, they now become part of the
main DPDK API/ABI.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
eal/linux: remove useless unlock of hugepage when clearing
Coverity was complaining about not checking result of call to
fcntl() for unlocking the file. Disregarding the fact that error
value returned from fcntl() unlock call is highly unlikely in the
first place, we are subsequently calling close() on that same fd,
which will drop the lock, which makes call to fcntl() unnecessary.
Fix this by removing a call to fcntl() altogether.
Regular expressions are not the best way to match a hierarchical
pattern like dynamic log levels. And the separator for dynamic
log levels is period which is the regex wildcard character.
A better solution is to use filename matching 'globbing' so
that log levels match like file paths. For compatibility,
use colon to separate pattern match style arguments. For
example:
--log-level 'pmd.net.virtio.*:debug'
This also makes the documentation match what really happens
internally.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Rather than attempting to load the contents of the auxv directly,
prefer to use an exposed API - and if that doesn't exist then attempt
to load the vector. This is because on some systems, when a user
is downgraded, the /proc/self/auxv file retains the old ownership
and permissions. The original method of /proc/self/auxv is retained.
This also removes a potential abort() in the code when compiled with
NDEBUG. A quick parse of the code shows that many (if not all) of
the CPU flag parsing isn't used internally, so it should be okay.