Shahaf Shuler [Mon, 6 Nov 2017 14:00:25 +0000 (16:00 +0200)]
net/mlx5: fix flow creation on port start
While the PMD avoids from creating hash RXQ with no hash fields and
array of queues after the port was already started, it lacks such
protection when re-creating the flows after the port restarts.
This may lead to inconsistent behavior for flows depending if they were
created before or after the port start.
Xiaoyun Li [Mon, 6 Nov 2017 02:41:40 +0000 (10:41 +0800)]
net/igb: fix Rx interrupt with VFIO and MSI-X
When using VFIO and MSIX interrupt mode, cannot get Rx interrupts. Because
when enabling the interrupt vectors, the offset is computed in a way which
only supports IGB_UIO. But the offset should be different when using VFIO.
This patch fixes this issue.
Fixes: c3cd3de0ab50 ("igb: enable Rx queue interrupts for PF") Cc: stable@dpdk.org Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
With -f-strict-aliasing enabled by default from -O2, gcc > 5.x gives
undefined behavior in port_groupx4 in ARM. 'pn' and 'pnum' are
two different pointers pointing to same chunk of memory and
with -f-strict-aliasing the pointers are assumed to be pointing to
different memory and compiler reorders instructions that depend on
pnum and pn. This breaks port grouping algorithm.
This patch eliminates the above problem by introducing a compiler
barrier between the instructions that depend on pnum, pn and lp.
Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation") Cc: stable@dpdk.org Signed-off-by: Guduri Prathyusha <gprathyusha@caviumnetworks.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Jianbo Liu <jianbo.liu@arm.com>
To group consecutive packets with same destination port in bursts of 4
neon intrinsic data types dp1 and dp2 are calculated such that if
dst_port[]={a,b,c,d,e,f,g,h,i...} dp1 should contain: <a,b,c,d> and
dp2 should contain: <b,c,d,e> in the first iteration. dp1 should
be <e,f,g,h> and dp2 should be <f,g,h,i> in the next iteration.
Whereas the existing code incorrectly calculates dp1 as <d,e,f,g> from
second iteration.
This patch fixes the incorrect ARM NEON instructions on dp1.
Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation") Cc: stable@dpdk.org Signed-off-by: Guduri Prathyusha <gprathyusha@caviumnetworks.com> Acked-by: Jianbo Liu <jianbo.liu@arm.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Pablo de Lara [Mon, 6 Nov 2017 09:36:04 +0000 (09:36 +0000)]
app/crypto-perf: fix crypto op init
The mempool and the physical address of the crypto operation
at mempool initialization were not being set,
leading to incorrect physical addresses.
Fixes: bf9d6702eca9 ("app/crypto-perf: use single mempool") Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Jasvinder Singh [Fri, 27 Oct 2017 09:46:19 +0000 (10:46 +0100)]
app/testpmd: allow TM hierarchy commit on running port
Some drivers might allow to commit the traffic management hierarchy
while being in running state. Therefore, removes port status check
before invoking hierarchy commit API in the cli. If needed, device can
add port status check at the driver layer.
Jasvinder Singh [Fri, 27 Oct 2017 09:10:18 +0000 (10:10 +0100)]
app/testpmd: fix null pointer dereference
malloc() function might returns NULL when memory allocation fails
due to insufficient space. Therefore, check for handling memory allocation
failure is added.
Ophir Munk [Thu, 2 Nov 2017 17:27:03 +0000 (17:27 +0000)]
net/failsafe: fix VLAN stripping configuration
failsafe device has vlan stripping configured at startup however once
a sub device is found as non-capable of vlan-stripping failsafe
updates it configuration and removes vlan stripping from it.
This update occurs only once at startup. Following a later plugin
attempt and in case of vlan stripping mismatch between failsafe
configuration and device capability - failsafe cannot recover and the
device remains constantly in plug out state.
The sequence of events leading to this situation is described as
follows:
1. Start testpmd with failsafe where mlx4 is a sub device (not capable
of vlan stripping). Expected printout:
PMD: net_failsafe: Disabling VLAN stripping offload
2. Execute:
testpmd> port stop all
testpmd> port config all max-pkt-len 2048
testpmd> port start all
3. Do a plug out (e.g. disable sriov)
4. Do a plug in (e.g. enable sriov)
5. Expected result: failsafe successfully configures and starts its sub
devices
Actual result: failsafe is continuously failing with these messages:
PMD: net_failsafe: VLAN stripping offload requested but not supported by
sub_device 0
PMD: net_failsafe: device already configured, cannot fix live
configuration
PMD: net_failsafe: Unable to synchronize sub device state
Root cause analysis: at startup failsafe removes vlan stripping from its
configuration. After executing "port config all max-pkt-len 2048"
testpmd marks failsafe in need for configuration update.
After executing "port start all" testpmd overrides failsafe
configuration with its own configuration which includes vlan stripping
During the plugin attempt failsafe refuses to update its configuration
by removing vlan stripping since it has already updated its
configuration at startup.
The fix is for failsafe to stop validation and disabling non-supported
offloads in its sub-devices.
Fixes: bbc6a53dda44 ("net/failsafe: support Rx offload capabilities") Cc: stable@dpdk.org Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Memory regions assigned to hardware and used during Tx/Rx are mapped to
mbuf pools. Each Rx queue creates its own MR based on the mempool
provided during queue setup, while each Tx queue looks up and registers
MRs for all existing mbuf pools instead.
Since most applications use few large mbuf pools (usually only a single
one per NUMA node) common to all Tx/Rx queues, the above approach wastes
hardware resources due to redundant MRs. This negatively affects
performance, particularly with large numbers of queues.
This patch therefore makes the entire MR registration common to all
queues using a reference count. A spinlock is added to protect against
asynchronous registration that may occur from the Tx side where new
mempools are discovered based on mbuf data.
Matan Azrad [Thu, 2 Nov 2017 16:42:51 +0000 (16:42 +0000)]
net/mlx4: mitigate Tx path memory barriers
Replace most of the memory barriers by IO memory barriers since they
are all targeted to the DRAM; This improves code efficiency for
systems which force store order between different addresses.
Only the doorbell register store should be protected by memory barrier
since it is targeted to the PCI memory domain.
Limit pre byte count store IO memory barrier for systems with cache
line size smaller than 64B (TXBB size).
This patch improves Tx performance by 0.2MPPS for one segment 64B
packets via 1 queue with 1 core test.
Matan Azrad [Thu, 2 Nov 2017 16:42:49 +0000 (16:42 +0000)]
net/mlx4: separate Tx segment cases
Optimize single segment case by processing it in different block which
prevents checks, calculations and barriers relevant only for multi
segment case.
Call a dedicated function for handling multi segments case.
Ophir Munk [Thu, 2 Nov 2017 16:42:45 +0000 (16:42 +0000)]
net/mlx4: associate MR to MP in a short function
Associate memory region to mempool (on data path) in a short function.
Handle the less common case of adding a new memory region to mempool
in a separate function.
Wei Dai [Fri, 3 Nov 2017 08:47:30 +0000 (16:47 +0800)]
net/i40e: fix Rx queue interrupt mapping in VF
When a VF port is bound to VFIO-PCI, miscellaneous interrupt is
mapped to MSI-X vector 0 and Rx queues interrupt are mapped to
other vectors in vfio_enable_msix( ). To simplify implementation,
all VFIO-PCI bound i40e VF Rx queue interrupts can be mapped in
vector 1. And as current igb_uio only support only one vector,
i40e VF PMD should use vector 0 for igb_uio and vector 1 for
VFIO-PCI. Without this patch, VF Rx queue interrupt is mapped
to vector 0 in register settings and mapped to VFIO vector 1
in vfio_enable_msix( ), and then all Rx queue interrupts will
be missed.
Also remove 2 unused macro definitions.
Fixes: 4b90a3ff26c5 ("i40evf: support Rx interrupt") Fixes: 975ffea6f671 ("net/i40e: remove DPDK PF version specific code") Cc: stable@dpdk.org Signed-off-by: Wei Dai <wei.dai@intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Wei Dai [Fri, 3 Nov 2017 08:47:29 +0000 (16:47 +0800)]
net/i40e: fix VFIO interrupt mapping in VF
When a VF port is bound to VFIO-PIC, only miscellaneous interrupt
is mapped to VFIO vector 0 in i40evf_dev_init( ).
In i40evf_dev_interrupt_handle( ) and i40evf_dev_rx_queue_intr_enable( ),
if previous VFIO interrupt mapping set in i40evf_dev_init( ) is not
cleared, it will fail when PMD tries to map Rx queue interrupt to other
VFIO vectors by calling rte_intr_enable( ).
This patch clears the VFIO interrupt mappings before setting both
miscellaneous and Rx queue interrupt mappings again to avoid failure.
And remove the calling of rte_intr_enable( ) in
i40evf_dev_interrupt_handler( ) as there is no need to map VFIO interrupt
in this function repeatedly.
Fixes: 4b90a3ff26c5 ("i40evf: support Rx interrupt") Cc: stable@dpdk.org Signed-off-by: Wei Dai <wei.dai@intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Acked-by: Jingjing Wu <jingjing.wu@intel.com>
John Daley [Thu, 2 Nov 2017 05:47:10 +0000 (22:47 -0700)]
net/enic: fix TSO for packets greater than 9208 bytes
A check was previously added to drop Tx packets greater than what the Nic
is capable of sending since such packets can freeze the send queue. The
check did not account for TSO packets however, so TSO was limited to 9208
bytes.
Check packet length only for non-TSO packets. Also insure that TSO packet
segment size plus the headers do not exceed what the Nic is capable of
since this also can freeze the send queue.
Use the PKT_TX_TCP_SEG ol_flag instead of m->tso_segsz which is the
preferred way to check for TSO.
Fixes: ed6e564c214e ("net/enic: fix memory leak with oversized Tx packets") Cc: stable@dpdk.org Signed-off-by: John Daley <johndale@cisco.com>
Akhil Goyal [Wed, 1 Nov 2017 08:16:41 +0000 (13:46 +0530)]
net/dpaa2: set queues after reconfiguration
if dpaa2_dev_tx_queue_setup is called multiple times, the
assignment of device->data->tx_queues is not done, as a result
tx_queues remain NULL after reconfiguration.
This patch sets the tx_queues from the device private data to the
usable device tx queues.
Fixes: 7ae777d064e8 ("net/dpaa2: add support for congestion notification") Cc: stable@dpdk.org Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Adrien Mazarguil [Tue, 31 Oct 2017 10:31:04 +0000 (11:31 +0100)]
net/mlx4: fix Rx after updating number of queues
When not in isolated mode, internal flow rules are automatically
maintained by the PMD to receive traffic according to global device
settings (MAC, VLAN, promiscuous mode and so on).
Since RSS support was added to the mix, it must also check whether Rx
queue configuration has changed when refreshing flow rules to prevent
the following from happening:
- With a smaller number of Rx queues, traffic is implicitly dropped
since the existing RSS context cannot be re-applied.
- With a larger number of Rx queues, traffic remains balanced within the
original (smaller) set of queues.
One workaround before this commit was to temporarily enter/leave
isolated mode to make it regenerate internal flow rules.
Nélio Laranjeiro [Fri, 27 Oct 2017 06:50:00 +0000 (08:50 +0200)]
net/mlx5: fix flow director matching rules
Flow director API does not provide a layer 2 configuration when the
filter is for layer 3 and 4 causing the translation to generic flow API
to be wrong, as not providing a mask for layer ends by using the
default one.
In this case, the Ethernet mask layer is full whereas it must be empty.
Ajit Khaparde [Mon, 30 Oct 2017 16:08:08 +0000 (11:08 -0500)]
net/bnxt: fix HWRM command failures during VF unload
In some cases when a VF driver is unloaded after the PF driver,
certain HWRM commands are returned with an error.
Instead the PF can tell the FW to permit these commands in order
to allow a clean unload.
Maxime Coquelin [Fri, 3 Nov 2017 15:52:35 +0000 (16:52 +0100)]
vhost: postpone ring address translations at kick time only
If multiple queue pairs are created but all are not used, the
device is never started, as unused queues aren't enabled and
their ring addresses aren't translated. The device is changed
to running state when all rings addresses are translated.
This patch fixes this by postponning rings addresses translation
at kick time unconditionnaly, VHOST_USER_F_PROTOCOL_FEATURES
being negotiated or not.
Reported-by: Lei Yao <lei.a.yao@intel.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Jacek Piasecki [Thu, 26 Oct 2017 06:21:09 +0000 (08:21 +0200)]
cfgfile: fix leak on creation error
Unsuccesfull memory allocation for elements inside cfgfile
structure could result in resource leak.
Fixed by pointer verification after each malloc,
if malloc fail - error branch is proceeded with freeing memory.
Coverity issue: 195032 Fixes: d4cb8197589d ("cfgfile: support runtime modification") Signed-off-by: Jacek Piasecki <jacekx.piasecki@intel.com> Acked-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>
service: fix race in service on app lcore function
This commit fixes a possible race condition if an application
uses the service-cores infrastructure and the function to run
a service on an application lcore at the same time.
The fix is to change the num_mapped_cores variable to be an
atomic variable. This causes concurrent accesses by multiple
threads to a service using rte_service_run_iter_on_app_lcore()
to detect if another core is currently mapped to the service,
and refuses to run if it is not multi-thread safe.
The run iteration on app lcore function has two arguments, the
service id to run, and if atomics should be used to serialize access
to multi-thread unsafe services. This allows applications to choose
if they wish to use use the service-cores feature, or if they
take responsibility themselves for serializing invoking a service.
See doxygen documentation for more details.
Two unit tests were added to verify the behaviour of the
function to run a service on an application core, testing both
a multi-thread safe service, and a multi-thread unsafe service.
The doxygen API documentation for the function has been updated
to reflect the current and correct behaviour.
Fixes: e9139a32f6e8 ("service: add function to run on app lcore") Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Jingjing Wu [Thu, 24 Aug 2017 02:10:56 +0000 (10:10 +0800)]
eal/linux: add interrupt counter size for vdev
For virtual device, the rte_intr_handle struct is
initialized by the virtual device driver, including
the event fd assignment. If the event fd need to be
read for clean, an argument is required for the proper
event fd read.
This patch adds efd_counter_size in rte_intr_handle
struct to tell the rx interrupt process the read size.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Xiaoyun Li [Fri, 3 Nov 2017 12:47:23 +0000 (20:47 +0800)]
eal/x86: revert select optimized memcpy at run-time
Revert the patchset run-time Linking support including the following
3 commits:
Fixes: 84cc318424d4 ("eal/x86: select optimized memcpy at run-time") Fixes: c7fbc80fe60f ("test: select memcpy alignment unit at run-time") Fixes: 5f180ae32962 ("efd: move AVX2 lookup in its own compilation unit")
The patchset would cause perf drop in vhost/virtio loopback performance
test. Because the run-time dispatch must cost at least a function call
comparing to the compile-time dispatch. And the reference cpu cycles value
is small. And in the test, when using 128-256 bytes packet, it would cause
16%-20% perf drop with mergeble path. When using 256 bytes packet, it would
cause 13% perf drop with vector path.
Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Ferruh Yigit [Thu, 2 Nov 2017 00:06:00 +0000 (00:06 +0000)]
eal/linux: force IOVA as PA mode if KNI module inserted
Fix kernel crash with KNI because KNI requires physical addresses.
When IOVA VA mode used, memzones and mbufs physical address fields
contain virtual addresses. But KNI relies on these fields to enable
kernel access for buffers. Those fields having virtual address cause
crash in kernel.
This is a workaround until KNI fixed properly to work with virtual
addresses.
Harry van Haaren [Wed, 25 Oct 2017 12:29:49 +0000 (13:29 +0100)]
eal: fix version map experimental section
Before this commit, the EXPERIMENTAL version of ABI
derived from the DPDK_17.08 tag. In parallel there
was a DPDK_17.11 tag.
Experimental map should always derive from the latest ABI,
so this patch moves the 17.11 section above EXPERIMENTAL,
and updates EXPERIMENTAL to derive from the 17.11 map.
Fixes: aadc3eb002d3 ("pci: export match function") Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Thomas Monjalon [Fri, 20 Oct 2017 12:31:35 +0000 (18:01 +0530)]
doc: add IOVA aware API changes in release notes
The wording changes have been done in the API without breaking
the ABI. The deprecated fields and symbols can be removed later
when an another ABI change will be required.
The deprecation notice can be removed.
The release notes describe the new available API with IOVA wording.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: John McNamara <john.mcnamara@intel.com>
Thomas Monjalon [Sun, 5 Nov 2017 22:26:24 +0000 (23:26 +0100)]
mempool: rename populate functions to IOVA
The functions rte_mempool_populate_phys() and
rte_mempool_populate_phys_tab() are renamed to
rte_mempool_populate_iova() and rte_mempool_populate_iova_tab().
The deprecated functions are kept as aliases to avoid breaking the API.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Thomas Monjalon [Sun, 5 Nov 2017 18:02:29 +0000 (19:02 +0100)]
mempool: rename address mapping function to IOVA
The function rte_mempool_virt2phy() is renamed to rte_mempool_virt2iova().
The new function has one less parameter because it is unused.
The deprecated function is kept as an alias to avoid breaking the API.
Thomas Monjalon [Fri, 20 Oct 2017 12:31:31 +0000 (18:01 +0530)]
mempool: rename addresses from physical to IOVA
The struct fields phys_addr_t rte_mempool_objhdr.physaddr and
rte_mempool_memhdr.phys_addr are renamed to rte_iova_t iova.
The deprecated names are kept in an anonymous union to avoid breaking
the API.
Thomas Monjalon [Sat, 4 Nov 2017 16:15:04 +0000 (17:15 +0100)]
mem: rename address mapping function to IOVA
The function rte_mem_virt2phy() is kept and used in functions which
works only with physical addresses.
For all other calls this function is replaced by rte_mem_virt2iova()
which does a direct mapping (no conversion) in the VA case.
Note: the new function rte_mem_virt2iova() function matches the
behaviour implemented in rte_mem_virt2phy() by the commit 680f6c12600f ("mem: honor IOVA mode in virt2phy")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Thomas Monjalon [Fri, 3 Nov 2017 23:36:47 +0000 (00:36 +0100)]
mem: introduce IOVA type
The IO virtual addresses may be used instead of physical addresses.
As IOVA is more generic, it should be used in most places instead
of physical address wording.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Ferruh Yigit [Thu, 2 Nov 2017 00:25:10 +0000 (00:25 +0000)]
buildtools: fix icc build
There are random build errors in test reports [1]. Build error
is not directly related to DPDK but observed during DPDK build.
When I get similar unexpected build errors in my system, found
out that /dev/null is invalid.
It seems ICC overwrites the /dev/null with "icc -o /dev/null" instead
of sending output to /dev/null. This is not always reproducible, so
hard to say what exactly is triggering the error.
I suspect test-report build errors can be because of the same reason,
and it is good to add a protection for this case.
Instead of sending output to /dev/null save it to the tmp folder and
remove it back when done.