service: fix race in service on app lcore function
This commit fixes a possible race condition if an application
uses the service-cores infrastructure and the function to run
a service on an application lcore at the same time.
The fix is to change the num_mapped_cores variable to be an
atomic variable. This causes concurrent accesses by multiple
threads to a service using rte_service_run_iter_on_app_lcore()
to detect if another core is currently mapped to the service,
and refuses to run if it is not multi-thread safe.
The run iteration on app lcore function has two arguments, the
service id to run, and if atomics should be used to serialize access
to multi-thread unsafe services. This allows applications to choose
if they wish to use use the service-cores feature, or if they
take responsibility themselves for serializing invoking a service.
See doxygen documentation for more details.
Two unit tests were added to verify the behaviour of the
function to run a service on an application core, testing both
a multi-thread safe service, and a multi-thread unsafe service.
The doxygen API documentation for the function has been updated
to reflect the current and correct behaviour.
Fixes: e9139a32f6e8 ("service: add function to run on app lcore") Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Jingjing Wu [Thu, 24 Aug 2017 02:10:56 +0000 (10:10 +0800)]
eal/linux: add interrupt counter size for vdev
For virtual device, the rte_intr_handle struct is
initialized by the virtual device driver, including
the event fd assignment. If the event fd need to be
read for clean, an argument is required for the proper
event fd read.
This patch adds efd_counter_size in rte_intr_handle
struct to tell the rx interrupt process the read size.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Xiaoyun Li [Fri, 3 Nov 2017 12:47:23 +0000 (20:47 +0800)]
eal/x86: revert select optimized memcpy at run-time
Revert the patchset run-time Linking support including the following
3 commits:
Fixes: 84cc318424d4 ("eal/x86: select optimized memcpy at run-time") Fixes: c7fbc80fe60f ("test: select memcpy alignment unit at run-time") Fixes: 5f180ae32962 ("efd: move AVX2 lookup in its own compilation unit")
The patchset would cause perf drop in vhost/virtio loopback performance
test. Because the run-time dispatch must cost at least a function call
comparing to the compile-time dispatch. And the reference cpu cycles value
is small. And in the test, when using 128-256 bytes packet, it would cause
16%-20% perf drop with mergeble path. When using 256 bytes packet, it would
cause 13% perf drop with vector path.
Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Ferruh Yigit [Thu, 2 Nov 2017 00:06:00 +0000 (00:06 +0000)]
eal/linux: force IOVA as PA mode if KNI module inserted
Fix kernel crash with KNI because KNI requires physical addresses.
When IOVA VA mode used, memzones and mbufs physical address fields
contain virtual addresses. But KNI relies on these fields to enable
kernel access for buffers. Those fields having virtual address cause
crash in kernel.
This is a workaround until KNI fixed properly to work with virtual
addresses.
Harry van Haaren [Wed, 25 Oct 2017 12:29:49 +0000 (13:29 +0100)]
eal: fix version map experimental section
Before this commit, the EXPERIMENTAL version of ABI
derived from the DPDK_17.08 tag. In parallel there
was a DPDK_17.11 tag.
Experimental map should always derive from the latest ABI,
so this patch moves the 17.11 section above EXPERIMENTAL,
and updates EXPERIMENTAL to derive from the 17.11 map.
Fixes: aadc3eb002d3 ("pci: export match function") Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Thomas Monjalon [Fri, 20 Oct 2017 12:31:35 +0000 (18:01 +0530)]
doc: add IOVA aware API changes in release notes
The wording changes have been done in the API without breaking
the ABI. The deprecated fields and symbols can be removed later
when an another ABI change will be required.
The deprecation notice can be removed.
The release notes describe the new available API with IOVA wording.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: John McNamara <john.mcnamara@intel.com>
Thomas Monjalon [Sun, 5 Nov 2017 22:26:24 +0000 (23:26 +0100)]
mempool: rename populate functions to IOVA
The functions rte_mempool_populate_phys() and
rte_mempool_populate_phys_tab() are renamed to
rte_mempool_populate_iova() and rte_mempool_populate_iova_tab().
The deprecated functions are kept as aliases to avoid breaking the API.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Thomas Monjalon [Sun, 5 Nov 2017 18:02:29 +0000 (19:02 +0100)]
mempool: rename address mapping function to IOVA
The function rte_mempool_virt2phy() is renamed to rte_mempool_virt2iova().
The new function has one less parameter because it is unused.
The deprecated function is kept as an alias to avoid breaking the API.
Thomas Monjalon [Fri, 20 Oct 2017 12:31:31 +0000 (18:01 +0530)]
mempool: rename addresses from physical to IOVA
The struct fields phys_addr_t rte_mempool_objhdr.physaddr and
rte_mempool_memhdr.phys_addr are renamed to rte_iova_t iova.
The deprecated names are kept in an anonymous union to avoid breaking
the API.
Thomas Monjalon [Sat, 4 Nov 2017 16:15:04 +0000 (17:15 +0100)]
mem: rename address mapping function to IOVA
The function rte_mem_virt2phy() is kept and used in functions which
works only with physical addresses.
For all other calls this function is replaced by rte_mem_virt2iova()
which does a direct mapping (no conversion) in the VA case.
Note: the new function rte_mem_virt2iova() function matches the
behaviour implemented in rte_mem_virt2phy() by the commit 680f6c12600f ("mem: honor IOVA mode in virt2phy")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Thomas Monjalon [Fri, 3 Nov 2017 23:36:47 +0000 (00:36 +0100)]
mem: introduce IOVA type
The IO virtual addresses may be used instead of physical addresses.
As IOVA is more generic, it should be used in most places instead
of physical address wording.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Ferruh Yigit [Thu, 2 Nov 2017 00:25:10 +0000 (00:25 +0000)]
buildtools: fix icc build
There are random build errors in test reports [1]. Build error
is not directly related to DPDK but observed during DPDK build.
When I get similar unexpected build errors in my system, found
out that /dev/null is invalid.
It seems ICC overwrites the /dev/null with "icc -o /dev/null" instead
of sending output to /dev/null. This is not always reproducible, so
hard to say what exactly is triggering the error.
I suspect test-report build errors can be because of the same reason,
and it is good to add a protection for this case.
Instead of sending output to /dev/null save it to the tmp folder and
remove it back when done.
Jerin Jacob [Sat, 28 Oct 2017 06:22:55 +0000 (11:52 +0530)]
bus/pci: fix VFIO device reset
If the device is not capable of resetting, then Linux kernel updates
the errno as EINVAL.
http://elixir.free-electrons.com/linux/v4.9/source/drivers/vfio/pci/vfio_pci.c#L887
Honor the EINVAL errno value to avoid pci vfio setup failure.
Fixes: f25f8f367644 ("bus/pci: check VFIO reset ioctl error") Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Moti Haimovsky [Wed, 25 Oct 2017 15:37:27 +0000 (18:37 +0300)]
net/mlx4: fix no Rx interrupts
This commit addresses the issue of Rx interrupts support with
the new Rx datapath introduced in DPDK version 17.11.
In order to generate an Rx interrupt an event queue is armed with the
consumer index of the Rx completion queue. Since version 17.11 this
index is handled by the PMD so it is now the responsibility of the
PMD to write this value when enabling Rx interrupts.
Moti Haimovsky [Wed, 25 Oct 2017 15:37:26 +0000 (18:37 +0300)]
net/mlx4: introducing consumer index mask
This commit defines MLX4_CQ_DB_CI_MASK which is used when updating
the consumer index of the completion queue instead of the hardcoded
0xffffff used until now.
Matan Azrad [Sun, 22 Oct 2017 05:51:08 +0000 (05:51 +0000)]
net/failsafe: fix Rx clean race
When removing a device, the fail-safe checks that it is not within its
datapath before cleaning it.
When checking whether an Rx burst should be performed on a device, the
remove flag is not checked. Thus the port could still enter its datapath
and miss a removal round. Furthermore, there is a race between the
thread removing the device and the polling thread.
Check the remove flag before entering a sub-device Rx burst when in safe
mode. This check mitigates the aforementioned race condition.
The compilation with gcc-6.3.0 and EXTRA_CFLAGS=-Og gives the following
error:
CC rte_lpm6.o
rte_lpm6.c: In function ‘rte_lpm6_add_v1705’:
rte_lpm6.c:442:11: error: ‘tbl_next’ may be used uninitialized in
this function [-Werror=maybe-uninitialized]
if (!tbl[tbl_index].valid) {
^
rte_lpm6.c:521:29: note: ‘tbl_next’ was declared here
struct rte_lpm6_tbl_entry *tbl_next;
^~~~~~~~
This is a false positive from gcc. Fix it by initializing tbl_next
to NULL.
Xueming Li [Thu, 26 Oct 2017 08:29:23 +0000 (16:29 +0800)]
examples/multi_process: fix received message length
Simple_mp example receives message size less than 64 chars while send
side accepts chars less than 128, this leads to different result when
sending text length larger than 64.
This patch uses same buffer length on both message pool and command
line.
Fixes: af75078fece3 ("first public release") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Pavan Nikhilesh [Wed, 25 Oct 2017 14:50:29 +0000 (20:20 +0530)]
app/testeventdev: use service cores
Use service cores for offloading event scheduling in case of
centralized scheduling instead of calling the schedule api directly.
This removes the dependency on dedicated scheduler core specified by
giving command line option --slcore.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Pavan Nikhilesh [Wed, 25 Oct 2017 14:50:28 +0000 (20:20 +0530)]
event/sw: extend service capability
Extend the service capability of the sw event device by exposing service id
to the application.
The application can use service id to configure service cores to run event
scheduling.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Pavan Nikhilesh [Wed, 25 Oct 2017 14:50:27 +0000 (20:20 +0530)]
eventdev: add API to get service id
In case of sw event device the scheduling can be done on a service core
using the service registered at the time of probe.
This patch adds a helper function to get the service id that can be used
by the application to assign a lcore for the service to run on.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Pavan Nikhilesh [Wed, 25 Oct 2017 14:21:42 +0000 (19:51 +0530)]
eventdev: fix inconsistency in queue config
With the current scheme of event queue configuration the cfg schedule
type macros (RTE_EVENT_QUEUE_CFG_*_ONLY) are inconsistent with the
event schedule type (RTE_SCHED_TYPE_*) this requires unnecessary
conversion between the fastpath and slowpath API's while scheduling
events or configuring event queues.
This patch aims to fix such inconsistency by using event schedule
types (RTE_SCHED_TYPE_*) for event queue configuration.
This patch also fixes example/eventdev_pipeline_sw_pmd as it doesn't
convert RTE_EVENT_QUEUE_CFG_*_ONLY to RTE_SCHED_TYPE_* which leads to
improper events being enqueued to the eventdev.
Fixes: adb5d5486c39 ("examples/eventdev_pipeline_sw_pmd: add sample app") Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
The original code used movl instead of xchgl, this caused
rte_atomic64_cmpset to use ebx as the lower dword of the source
to cmpxchg8b instead of the lower dword of function argument "src".
Fixes: af75078fece3 ("first public release") Cc: stable@dpdk.org Reported-by: Job Abraham <job.abraham@intel.com> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Tested-by: Job Abraham <job.abraham@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Jianfeng Tan [Tue, 24 Oct 2017 07:44:53 +0000 (07:44 +0000)]
bus/pci: fix UIO bind check
When checking if any devices bound to uio, we did not exclude
those which are blacklisted (or in the case that a whitelist
is specified).
This patch fixes it by only checking whitelisted devices, or
not-blacklisted devices depending on the bus scan mode.
Fixes: 815c7deaed2d ("pci: get IOMMU class on Linux") Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Gaetan Rivet <gaetan.rivet@6wind.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Gaetan Rivet [Thu, 26 Oct 2017 10:05:55 +0000 (12:05 +0200)]
vfio: move PCI related symbols
These symbols are only relevant to PCI operations.
Move them to a private PCI-related header, allowing to remove the
dependency of the PCI subsystem upon private eal_vfio.h.
Gaetan Rivet [Thu, 26 Oct 2017 10:05:50 +0000 (12:05 +0200)]
mem: expose function for physical address use
This function was previously private to the EAL layer.
Other subsystems requires it, such as the PCI bus.
In order not to force other components to include stdbool, which is
incompatible with several NIC drivers, the return type has
been changed from bool to int.
Ferruh Yigit [Wed, 25 Oct 2017 08:36:45 +0000 (09:36 +0100)]
eal: fix build with glibc < 2.12
build error:
CC rte_cycles.o
cc1: warnings being treated as errors
...dpdk/lib/librte_eal/common/arch/x86/rte_cycles.c: In function
‘rdmsr’:
...dpdk/lib/librte_eal/common/arch/x86/rte_cycles.c:67:2: error:
implicit declaration of function ‘pread’
...dpdk/lib/librte_eal/common/arch/x86/rte_cycles.c:67:2: error:
nested extern declaration of ‘pread’
from pread man page:
pread(), pwrite():
_XOPEN_SOURCE >= 500
|| /* Since glibc 2.12: */ _POSIX_C_SOURCE >= 200809L
For glibc < 2.12 _XOPEN_SOURCE >= 500 is required.
Adding _GNU_SOURCE define to the file which implies _XOPEN_SOURCE=700
Shreyansh Jain [Thu, 26 Oct 2017 14:09:06 +0000 (19:39 +0530)]
event/dpaa2: fix shared build
Fixes: cbc12b0a96f5 ("mk: do not generate LDLIBS from directory dependencies") Fixes: b677d4c6d281 ("net/dpaa2: add API for event Rx adapter") Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Radu Nicolau [Thu, 26 Oct 2017 14:15:13 +0000 (15:15 +0100)]
examples/ipsec-secgw: fix build without security lib
Build fails when rte_security is disabled; make rte_security mandatory
Fixes: ec17993a145a ("examples/ipsec-secgw: support security offload") Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Tested-by: David Marchand <david.marchand@6wind.com>
David Harton [Fri, 1 Sep 2017 02:36:28 +0000 (22:36 -0400)]
ethdev: allow returning error on VLAN offload ops
Some devices may not support or fail setting VLAN offload
configuration based on dynamic circumstances so the
vlan_offload_set_t vector is modified to return an int so
the caller can determine success or not.
rte_eth_dev_set_vlan_offload is updated to return the
value provided by the vector when called along with restoring
the original offload configs on failure.
Existing vlan_offload_set_t vectors are modified to return
an int. Majority of cases return 0 but a few that actually
can fail now return their failure codes.
Finally, a vlan_offload_set_t vector is added to virtio
to facilitate dynamically turning VLAN strip on or off.
Signed-off-by: David Harton <dharton@cisco.com> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tomasz Kulasek [Fri, 20 Oct 2017 15:49:26 +0000 (17:49 +0200)]
net/bonding: fix check slaves link properties
Result of slaves link properties validation is not used when new slave
is added.
This patch uses the value of link_properties_valid() to determinate if
slave can be used in the bonding. If function fails, error is returned
preventing to add slave with invalid link properties.
Coverity issue: 158661 Fixes: deba8a2f8b0b ("net/bonding: fix link properties management") Cc: stable@dpdk.org Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Nélio Laranjeiro [Wed, 25 Oct 2017 14:04:36 +0000 (16:04 +0200)]
net/mlx5: fix device stop with multiple regions
LIST macro are not safe when inside a LIST_FOREACH() a LIST_REMOVE() is
called to remove an entry, this behavior is undefined causing some entries
to disappear from the list.
Roger Melton [Thu, 12 Oct 2017 17:24:35 +0000 (13:24 -0400)]
net/e1000: correct VLAN tag byte order for i35x LB packets
When copying VLAN tags from the RX descriptor to the vlan_tci field
in the mbuf header, igb_rxtx.c:eth_igb_recv_pkts() and
eth_igb_recv_scattered_pkts() both assume that the VLAN tag is always
little endian. While i350, i354 and /i350vf VLAN non-loopback
packets are stored little endian, VLAN tags in loopback packets (LB)
for those devices are big endian.
For i350, i354 and i350vf VLAN loopback packets, swap the tag when
copying from the RX descriptor to the mbuf header. This will ensure
that the mbuf vlan_tci is always little endian.
Signed-off-by: Roger Melton <rmelton@cisco.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Yongseok Koh [Wed, 25 Oct 2017 00:27:25 +0000 (17:27 -0700)]
net/mlx5: fix Tx doorbell memory barrier
Configuring UAR as IO-mapped makes maximum throughput decline by
noticeable amount. If UAR is configured as write-combining register,
a write memory barrier is needed on ringing a doorbell.
rte_wmb() is mostly effective when the size of a burst is comparatively
small. Revert the register back to write-combining and enforce a write
memory barrier instead, except for vectorized Tx burst routines.
Application can change it by setting MLX5_SHUT_UP_BF under its own
necessity.
Olivier Matz [Wed, 25 Oct 2017 15:12:57 +0000 (17:12 +0200)]
mbuf: rename deprecated VLAN flags
PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT are deprecated for a while.
As explained in [1], these flags were kept to let the applications and
PMDs move to the new flag. There is also a need to support Rx vlan
offload without vlan strip (at least for the ixgbe driver).
This patch renames the old flags for this feature, knowing that some
PMDs were using PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT to indicate that
the vlan tci has been saved in the mbuf structure.
It is likely that some PMDs do not set the proper flags when doing vlan
offload, and it would be worth making a pass on all of them.