net/qede/base: limit number of non ethernet queues to 64
Limit the number of non ethernet queues to 64, allowing a max queues to
status block ratio of 2:1 in case of storage target. Theoretically a
non-target storage PF can have 128 queues and SBs.
This change is to support 64 entries for a target iSCSI/FCoE PF and 128
for a non-target.
Add support for the following OneView APIs:
- ecore_mcp_ov_update_mtu() - Send MTU value to the management FW.
- ecore_mcp_ov_update_mac() - Send MAC address to the management FW.
- ecore_mcp_ov_update_eswitch() - Send eswitch_mode to management FW
after the firmware load.
This fix adds a ecore_mcp_update_stag() handler to handle the STAG update
events from management FW and program the STAG value.
It also clears the stag config on PF, when management FW invalidates
the stag value.
Bruce Richardson [Fri, 31 Aug 2018 13:35:10 +0000 (14:35 +0100)]
net/*/base: allow use of experimental APIs in base code
The driver setting of "allow_experimental_apis" was not being used when
building the base code. To allow this we can manually put in a check
in the base code files for the setting and set the appropriate cflag
if it's needed.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Tested-by: Ilya Maximets <i.maximets@samsung.com>
Xiao Wang [Fri, 14 Sep 2018 01:25:17 +0000 (09:25 +0800)]
net/ifc: do not notify before HW ready
If the device is not clearly reset by the previous driver and holds
some invalid ring addr, and the relay thread kicks it before HW is
properly re-configured, a bad DMA request may happen.
Besides, the notify_addr which is used by the relay thread is set in
the vdpa_ifcvf_start function, if a kick relay happens before
vdpa_ifcvf_start finishes, a null addr is accessed.
Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver") Cc: stable@dpdk.org Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Xiaolong Ye <xiaolong.ye@intel.com>
Remove driver log when no interrupt event indicated
in alarm handler for both PF and VF, otherwise there
will be lots of prints which makes console unusable.
Xiaoyun Li [Tue, 18 Sep 2018 02:22:40 +0000 (10:22 +0800)]
net/i40e: add option to use latest vector path
For IA, the AVX2 vector path is only recommended to be used on later
platforms (identified by AVX512 support, like SKL etc.) This is because
performance benchmark shows downgrade when running AVX2 vector path on
early platform (BDW/HSW) in some cases. But we still observe perf gain
with some real work loading.
So this patch introduced the new devarg use-latest-supported-vec to
force the driver always selecting the latest supported vec path. Then
apps are able to take AVX2 path on early platforms. And this logic can
be re-used if we will have AVX512 vec path in future.
This patch only affects IA platforms. The selected vec path would be
like the following:
Without devarg/devarg = 0:
Machine vPMD
AVX512F AVX2
AVX2 SSE4.2
SSE4.2 SSE4.2
<SSE4.2 Not Supported
With devarg = 1
Machine vPMD
AVX512F AVX2
AVX2 AVX2
SSE4.2 SSE4.2
<SSE4.2 Not Supported
Other platforms can also apply the same logic if necessary in future.
Nelio Laranjeiro [Fri, 31 Aug 2018 07:16:05 +0000 (09:16 +0200)]
net/mlx4: support meson build
Compile Mellanox driver when their external dependencies are met. A
glue version of the driver can still be requested by using the
-Denable_driver_mlx_glue=true
Meson will try to find the required external libraries. When they are
not installed system wide, they can be provided though CFLAGS, LDFLAGS
and LD_LIBRARY_PATH environment variables, example (considering
RDMA-Core is installed in /tmp/rdma-core):
Note: LD_LIBRARY_PATH before ninja is necessary when the meson
configuration has changed (e.g. meson configure has been called), in
such situation the LD_LIBRARY_PATH is necessary to invoke the
autoconfiguration script.
Nelio Laranjeiro [Fri, 31 Aug 2018 07:16:05 +0000 (09:16 +0200)]
net/mlx5: support meson build
Compile Mellanox driver when its external dependencies are met. A
glue version of the driver can still be requested by using the
-Denable_driver_mlx_glue=true
Meson will try to find the required external libraries. When they are
not installed system wide, they can be provided though CFLAGS, LDFLAGS
and LD_LIBRARY_PATH environment variables, example (considering
RDMA-Core is installed in /tmp/rdma-core):
Note: LD_LIBRARY_PATH before ninja is necessary when the meson
configuration has changed (e.g. meson configure has been called), in
such situation the LD_LIBRARY_PATH is necessary to invoke the
autoconfiguration script.
Thomas Monjalon [Tue, 25 Sep 2018 22:02:20 +0000 (00:02 +0200)]
bus/ifpga: remove useless driver cast
The rte_afu_driver is assigned to rte_afu_device.driver during probing.
There is no need of accessing the rte_afu_driver via rte_device.driver
and type casting to its container.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Rosen Xu <rosen.xu@intel.com>
When compiling on FreeBSD, lots of warnings/errors are thrown for
unused parameter. Fix these by marking the parameters as unused
in the code.
Fixes: 1009ba1704f9 ("mem: add internal API to get and set segment fd") Fixes: 3a44687139eb ("mem: allow querying offset into segment fd") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Alex Kiselev [Mon, 4 Jun 2018 10:13:02 +0000 (13:13 +0300)]
ip_frag: add function to delete expired entries
A fragmented packets is supposed to live no longer than max_cycles,
but the lib deletes an expired packet only occasionally when it scans
a bucket to find an empty slot while adding a new packet.
Therefore a fragment might sit in the table forever.
Signed-off-by: Alex Kiselev <alex@therouter.net> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Phil Yang [Wed, 12 Sep 2018 01:54:26 +0000 (09:54 +0800)]
app/testpmd: optimize mbuf pool allocation
By default, testpmd will create membuf pool for all NUMA nodes and
ignore EAL configuration.
Count the number of available NUMA according to EAL core mask or core
list configuration. Optimized by only creating membuf pool for those
nodes.
Fixes: c9cafcc82de8 ("app/testpmd: fix mempool creation by socket id") Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Acked-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
mem: support using memfd segments for in-memory mode
Enable using memfd-created segments if supported by the system.
This will allow having real fd's for pages but without hugetlbfs
mounts, which will enable in-memory mode to be used with virtio.
The implementation is mostly piggy-backing on existing real-fd
code, except that we no longer need to unlink any files or track
per-page locks in single-file segments mode, because in-memory
mode does not support secondary processes anyway.
We move some checks from EAL command-line parsing code to memalloc
because it is now possible to use single-file segments mode with
in-memory mode, but only if memfd is supported.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In a few cases, user may need to query offset into fd for a
particular memory segment (for example, to selectively map
pages). This commit adds a new API to do that.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Now that we can retrieve page fd's internally, we can expose it
as an external API. This will add two flavors of API - thread-safe
and non-thread-safe. Fix up internal API's to return values we need
without modifying rte_errno internally if called from within EAL.
We do not want calling code to accidentally close an internal fd, so
we make a duplicate of it before we return it to the user. Caller is
therefore responsible for closing this fd.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Enable setting and retrieving segment fd's internally.
For now, retrieving fd's will not be used anywhere until we
get an external API, but it will be useful for things like
virtio, where we wish to share segment fd's.
Setting segment fd's will not be available as a public API
at this time, but internally it is needed for legacy mode,
because we're not allocating our hugepages in memalloc in
legacy mode case, and we still need to store the fd.
Another user of get segment fd API is memseg info dump, to
show which pages use which fd's.
Not supported on FreeBSD.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Previously, we were only tracking lock file fd's in single-file
segments mode, but did not track fd's in non-single file mode
because we didn't need to (mmap() call still kept the lock). Now
that we are going to expose these fd's to the world, we need to
have access to them, so track them even in non-single file
segments mode.
We don't need to close fd's after mmap() because we're still
tracking them in an fd list. Also, for anonymous hugepages mode,
fd will always be -1 so exit early on error.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Previously, we were only using lock lists to store per-page lock fd's
because we cannot use modern fcntl() file description locks to lock
parts of the page in single file segments mode.
Now, we will be using this list to store either lock fd's (along with
memseg list fd) in single file segments mode, or per-page fd's (and set
memseg list fd to -1), so rename the list accordingly.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Previously, when we allocated hugepages, we closed the fd's corresponding
to them after we've done our mappings. Since we did mmap(), we didn't
actually lose the reference, but file descriptors used for mmap() do not
count against the fd limit. Since we are going to store all of our fd's,
we will hit the fd limit much more often when using smaller page sizes.
Fix this to raise the fd limit to maximum unconditionally.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In noshconf mode, no shared files are created, but we're still trying
to unlink them, resulting in detach/destroy failure even though it
should have succeeded. Fix it by exiting early in noshconf mode.
Fixes: 3ee2cde248a7 ("fbarray: support --no-shconf mode") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The strncpy function has long been deemed unsafe for use,
in favor of strlcpy or snprintf.
While snprintf is standard and strlcpy is still largely available,
they both have issues regarding error checking and performance.
Both will force reading the source buffer past the requested size
if the input is not a proper c-string, and will return the expected
number of bytes copied, meaning that error checking needs to verify
that the number of bytes copied is not superior to the destination
size.
This contributes to awkward code flow, unclear error checking and
potential issues with malformed input.
The function strscpy has been discussed for some time already and
has been made available in the linux kernel[1].
David Marchand [Mon, 17 Sep 2018 12:45:09 +0000 (14:45 +0200)]
mbuf: remove deprecated segment free functions
__rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
a long time now (early 17.05), are not part of the abi and are easily
replaced with existing api.
Signed-off-by: David Marchand <david.marchand@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Note that the library built by meson will not have the _uio suffix:
librte_pmd_vmxnet3.so - as it follows the directory name, while the
legacy makefile rename it to librte_pmd_vmxnet3_uio.so.
Bruce Richardson [Tue, 18 Sep 2018 10:55:52 +0000 (11:55 +0100)]
build: fix compatibility with meson 0.41 onwards
Versions of meson prior to 0.47 flattened the parameters to the
"set_variable" function, which meant that the function could not take
array variables as a parameter. Therefore, we need to disable driver
tracking for those older versions, in order to maintain compatibility
with the minimum supported 0.41 version, and also v0.45 shipped in
Ubuntu 18.04 release.
Fixes: 806c45dd483d ("build: add configuration summary at end of config") Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Tested-by: Timothy Redaelli <tredaelli@redhat.com>
Bruce Richardson [Fri, 14 Sep 2018 16:17:17 +0000 (17:17 +0100)]
devtools: use shared libs to save space in build test
For usability, the default build type in meson is static, so that
binaries can be run from the build directory easily. However, static
builds take more space, so for build-testing purposes default to using
shared builds where possible.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Luca Boccassi <bluca@debian.org>
Bruce Richardson [Wed, 29 Aug 2018 16:19:20 +0000 (17:19 +0100)]
build: add configuration summary at end of config
After running meson to configure a DPDK build, it can be useful to know
what was automatically enabled or disabled. Therefore, print out by way of
summary a categorised list of libraries and drivers to be built.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Bruce Richardson [Thu, 19 Jul 2018 14:37:02 +0000 (15:37 +0100)]
build: simplify logic for default library dependencies
EAL is a standard dependency of all libraries, except for those built
before it. We can therefore simplify the logic by just checking if EAL
has been processed, and make it a standard dependency if so.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Install in $kerneldir/../extra/dpdk. Usually $kerneldir should something
like: /lib/modules/$kver/build, so this directory will match the default
one used by legacy makefiles.
Fixes: a52f4574f798 ("igb_uio: build with meson") Cc: stable@dpdk.org Signed-off-by: Luca Boccassi <bluca@debian.org> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Bruce Richardson [Mon, 17 Sep 2018 08:18:00 +0000 (09:18 +0100)]
compat: fix symbol version support with meson
For meson builds, the define to enable the symbol version
macros in rte_compat.h was missing. This led to symbols being
omitted from shared objects. For example, checking rte_distributor.so
with objdump and comparing make and meson built versions:
Integrate accelerated networking support into netvsc PMD.
This allows netvsc to manage VF without using failsafe or vdev_netvsc.
For the exception vswitch path some tests like transmit
get a 22% increase in packets/sec.
For the VF path, the code is slightly shorter but has no
real change in performance.
Pro:
* using netvsc is more like other DPDK NIC's
* the exception packet uses less CPU
* much smaller code size
* no locking required on VF transmit/receive path
* no legacy Linux network device to get mangled by userspace
* much simpler (1K vs 9K) LOC
* unified extended statistics
Con:
* using netvsc has more complex startup model
* no bifurcated driver support
* no flow support (since host does not have flow API).
* no tunnel offload support
* no receive interrupt support
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Implement callback functionality on link state changes.
This is not really driven off of interrupt file descriptor like most other
PMD's. Instead, it happens when a link state change message arrives
in the common ring buffer.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
net/netvsc: exhausting Tx descriptors is not an error
If application sends faster than vswitch can keep up, then the
transmit descriptor pool will be exhausted. This is not a failure
so change the name statistic and don't include it in oerrors.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Removed DEV_RX_OFFLOAD_CRC_STRIP offload flag.
Without any specific Rx offload flag, default behavior by PMDs is to
strip CRC.
PMDs that support keeping CRC should advertise DEV_RX_OFFLOAD_KEEP_CRC
Rx offload capability.
Applications that require keeping CRC should check PMD capability first
and if it is supported can enable this feature by setting
DEV_RX_OFFLOAD_KEEP_CRC in Rx offload flag in rte_eth_dev_configure()
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Tomasz Duszynski <tdu@semihalf.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Jan Remes <remes@netcope.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Igor Romanov [Wed, 29 Aug 2018 07:51:24 +0000 (08:51 +0100)]
net/bonding: do not ignore RSS key on device config
Bonding driver ignores the value of RSS key (that is set in the port RSS
configuration) in bond_ethdev_configure(). So the only way to set
non-default RSS key is by using rss_hash_update(). This is not an
expected behaviour.
Make the bond_ethdev_configure() set default RSS key only if
requested key is set to NULL.
Fixes: 734ce47f71e0 ("bonding: support RSS dynamic configuration") Cc: stable@dpdk.org Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Chas Williams <chas3@att.com>
Igor Romanov [Wed, 29 Aug 2018 07:48:30 +0000 (08:48 +0100)]
net/bonding: use evenly distributed default RSS RETA
Default Redirection Table that is set in bonding driver is distributed
evenly over all Rx queues only within every RETA group (the first RETA
entries in every group are always start with zero). But in the most
drivers, default RETA is distributed over all Rx queues without sequence
resets in the beginning of a new group, which implies more balanced
per-core load.
Change the default RETA to be evenly distributed over all Rx queues
considering the whole table.
Fixes: 734ce47f71e0 ("bonding: support RSS dynamic configuration") Cc: stable@dpdk.org Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Chas Williams <chas3@att.com>
Shagun Agrawal [Mon, 27 Aug 2018 12:52:32 +0000 (18:22 +0530)]
net/cxgbe: add flow ops to match based on dest MAC
Add flow operations to match packets based on destination MAC address.
Allocate and program hardware MPS table with the destination MAC
address to be matched against. The returned MPS index is then used while
offloading flows to LETCAM (maskfull) and HASH (maskless) filter regions.
Also update existing mac_addr_set() to use the new MPS table API.
Shagun Agrawal [Mon, 27 Aug 2018 12:52:31 +0000 (18:22 +0530)]
net/cxgbe: add API to program hardware MPS table
Add API to program and manage hardware Multi Port Switch table. MPS
holds destination MAC addresses to be matched against incoming packets
for further rule processing. Packets not matching any entry in MPS table
will be dropped by default, unless the underlying port is in promiscuous
mode.
Shagun Agrawal [Mon, 27 Aug 2018 12:52:30 +0000 (18:22 +0530)]
net/cxgbe: add flow operations to offload VLAN actions
Add flow API operations to offload vlan push, pop, and rewrite actions.
For vlan push or rewrite actions, allocate and program an entry from
L2T table. Use the L2T index to program vlan actions for LETCAM
(maskfull) and HASH (maskless) filters.
Shagun Agrawal [Mon, 27 Aug 2018 12:52:29 +0000 (18:22 +0530)]
net/cxgbe: add API to program hardware layer 2 table
Add API to program and manage hardware Layer 2 Table. L2T holds
information necessary to rewrite specific fields in packet, such
as destination MAC address and vlan id.
ethdev: fix missing names in Tx offload name array
Patch 5355f443 added two definitions of DEV_TX_OFFLOAD_xxx.
If new Tx offload capabilities are defined, they also must be mentioned
in rte_tx_offload_names in rte_ethdev.c file.
This patch adds the required lines in array rte_tx_offload_names.
Fixes: 5355f4439e2e ("ethdev: introduce generic IP/UDP tunnel checksum and TSO") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Chas Williams [Mon, 6 Aug 2018 20:05:45 +0000 (16:05 -0400)]
net/i40e: stop LLDP before setting local LLDP MIB
>From the Intel Ethernet Controller X710/XXV710/XL710 Specification
Update:
Starting from NVM 5.02, if the Set Local LLDP MIB command is
received while the DCBx specific agent is stopped, the command
returns an EPERM error. If the command is received while the
LLDP agent is stopped, it sets the local MIB without exchanging
LLDP with peer, and returns SUCCESS.
This results in the harmless, but annoying, diagnostic:
default dcb config fails. err = -53, aq_err = 1.
So, if possible (older firmwares cannot safely stop LLDP), stop the
LLDP daemon when we are in software mod before we attempt to call
i40e_set_dcb_config.
Signed-off-by: Chas Williams <chas3@att.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
This patch adds alarm handler, and then i40e
PF will use alarm handler instead of interrupt
handler when device is started and Rx interrupt
mode is disabled. This way will save CPU cycles
during receiving packets.
vhost-user: drop connection on message handling failures
There are a lot of cases where vhost-user massage handling
could fail and end up in a fully not recoverable state. For
example, allocation failures of shadow used ring and batched
copy array are not recoverable and leads to the segmentation
faults like this on the receiving/transmission path:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f913fecf0 (LWP 43625)]
in copy_desc_to_mbuf () at /lib/librte_vhost/virtio_net.c:760
760 batch_copy[vq->batch_copy_nb_elems].dst =
This could be easily reproduced in case of low memory or big
number of vhost-user ports.
Fix that by propagating error to the upper layer which will
end up with disconnection in case we can not report to
the message sender when the error happens.
Fixes: f689586bc060 ("vhost: shadow used ring update") Cc: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Eric Zhang [Wed, 29 Aug 2018 15:55:21 +0000 (11:55 -0400)]
net/virtio-user: check negotiated features before set
This patch checks negotiated features to see if necessary to offload
before set the tap device offload capabilities. It also checks if kernel
support the TUNSETOFFLOAD operation.
Ilya Maximets [Wed, 15 Aug 2018 14:54:39 +0000 (17:54 +0300)]
vhost: fix zmbufs array leak after NUMA realloc
'numa_realloc()' allocates 'zmbufs' even if zero copy mode
is not configured. This leads to memory leak, because array
is freed only for zero copy case.
Fixes: 2651726defb7 ("vhost: do deep copy while reallocating queue") CC: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Rami Rosen [Fri, 24 Aug 2018 07:43:02 +0000 (10:43 +0300)]
doc: fix wrong usage of bind command
This patch fixes wrong usage of bind command in vhost.rst.
Using "dpdk-devbind.py -b=uio_pci_generic 0000:00:04.0" gives
an error of "unbind failed". It should be "-b uio_pci_generic" so
it will work correctly.