dpdk.git
5 years agomk: allow renaming of build directories
Bruce Richardson [Mon, 12 Nov 2018 12:26:15 +0000 (12:26 +0000)]
mk: allow renaming of build directories

When building using make, the Makefile in the build directory contained
the name of the build directory to be passed as an "O=" parameter to
the DPDK SDK makefiles. Unfortunately, this meant that the compilation
would always fail if the build directory was renamed. To remove this
limitation, we can use $(CURDIR) instead of the directory name.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
5 years agoeal/x86: move header to standard BSD license
Bruce Richardson [Thu, 21 Dec 2017 16:53:35 +0000 (16:53 +0000)]
eal/x86: move header to standard BSD license

This updates the license on the rte_rtm.h file to be the standard
BSD-3-Clause license used for the rest of DPDK, thus bringing the file in
compliance with the DPDK licensing policy.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
5 years agotest/hash: improve output for r/w test
Bruce Richardson [Mon, 12 Nov 2018 10:47:19 +0000 (10:47 +0000)]
test/hash: improve output for r/w test

The hash read-write autotest generates a lot of text, which is very dense
on the screen. Even the summary at the end is hard to follow as everything
is very compact. We can improve readability by highlighting the starts of
the various sections, and by indenting the values within subsections.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
5 years agoeal/x86: reduce contention when retrying TSX
Bruce Richardson [Mon, 12 Nov 2018 10:47:18 +0000 (10:47 +0000)]
eal/x86: reduce contention when retrying TSX

When TSX transactions abort, it is generally worth retrying a number of
times before falling back to the traditional locking path, as the
parallelism benefits from TSX can be worth it when a transaction does
succeed. For cases with multiple threads and high contention rates, it
can be useful to have increasing delays between retry attempts, so as to
avoid having the same threads repeatedly collided.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
5 years agohash: fix TSX aborts with newer gcc
Yipeng Wang [Mon, 12 Nov 2018 10:47:16 +0000 (10:47 +0000)]
hash: fix TSX aborts with newer gcc

gcc 7 and 8 with O3 will generate vzeroupper from rte_memcpy
into TSX region which may abort the TSX transaction.

This fix changes rte_memcpy to memcpy which will not insert
extra vzeroupper into the library.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
5 years agoipc: remove panic in async request
Anatoly Burakov [Tue, 13 Nov 2018 18:03:52 +0000 (18:03 +0000)]
ipc: remove panic in async request

EAL should not crash when setting alarm fails. Also, remove the
profanity in error message.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agotest/bpf: use hton for endianness
Malvika Gupta [Fri, 2 Nov 2018 19:08:08 +0000 (14:08 -0500)]
test/bpf: use hton for endianness

Convert host machine endianness to networking endianness for
comparison of incoming packets with BPF filter

Suggested-by: Brian Brooks <brian.brooks@arm.com>
Signed-off-by: Malvika Gupta <malvika.gupta@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agotest/bpf: add immediate load
Konstantin Ananyev [Thu, 8 Nov 2018 12:36:44 +0000 (12:36 +0000)]
test/bpf: add immediate load

New test-case to cover (BPF_LD | BPF_IMM | EBPF_DW) instruction.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agobpf: fix x86 JIT for immediate loads
Konstantin Ananyev [Thu, 8 Nov 2018 12:36:43 +0000 (12:36 +0000)]
bpf: fix x86 JIT for immediate loads

x86 jit can generate invalid code for (BPF_LD | BPF_IMM | EBPF_DW)
instructions, when immediate value is bigger then INT32_MAX.

Fixes: cc752e43e079 ("bpf: add JIT compilation for x86_64 ISA")
Cc: stable@dpdk.org
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agopci: fix parsing of address without function number
Thomas Monjalon [Sun, 11 Nov 2018 23:58:56 +0000 (00:58 +0100)]
pci: fix parsing of address without function number

If the last part of the PCI address (function number) is missing,
the parsing was successful, assuming function 0.
The call to strtoul is not returning an error in such a case,
so an explicit check is inserted before.

This bug has always been there in older parsing macros:
- GET_PCIADDR_FIELD
- GET_BLACKLIST_FIELD

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Reported-by: Wisam Jaddo <wisamm@mellanox.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
5 years agohash: separate lock-free and r/w lock lookup
Honnappa Nagarahalli [Sat, 10 Nov 2018 18:55:34 +0000 (12:55 -0600)]
hash: separate lock-free and r/w lock lookup

The lock-free algorithm has caused significant lookup
performance regression for certain use cases. The
regression is attributed to the use of non-relaxed
memory orderings. 2 versions of the lookup functions
are created. One that uses the RW lock and the one that
is lock-free. This restores the performance regression
caused for use cases that used RW lock version of the
lookup function.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")

Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
5 years agoring/c11: relax ordering for load and store of the head
Gavin Hu [Fri, 9 Nov 2018 11:42:47 +0000 (19:42 +0800)]
ring/c11: relax ordering for load and store of the head

When calling __atomic_compare_exchange_n, use relaxed ordering for the
success case, as multiple producers/consumers do not release updates to
each other so no need for acquire or release ordering.

Because the thread fence in place, ordering for the first iteration can
be relaxed.

Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core,4 threads/core,2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i

Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34

With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.59
MP/MC bulk enq/dequeue (size: 8): 10.54
SP/SC bulk enq/dequeue (size: 32): 1.73
MP/MC bulk enq/dequeue (size: 32): 2.38

No significant improvement, nor regression was seen, as the optimisation
is not at the critical path.

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
5 years agoring/c11: keep deterministic order allowing retry to work
Gavin Hu [Fri, 9 Nov 2018 11:42:46 +0000 (19:42 +0800)]
ring/c11: keep deterministic order allowing retry to work

Use case scenario:
1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some
   reasons (running out of cpu time, preempted,...)
2) Thread 2 is enqueuing. It succeeds in enqueuing and moves prod.head
   forward.
3) Thread 3 is dequeuing. It succeeds in dequeuing and moves the cons.tail
   beyond the prod.head read by thread 1.
4) Thread 1 is re-scheduled. It reads cons.tail.

cpu1(producer)      cpu2(producer)          cpu3(consumer)
load r->prod.head
    ^               load r->prod.head
    |               load r->cons.tail
    |               store r->prod.head(+n)
  stalled           <-- enqueue ----->
    |               store r->prod.tail(+n)
    |                                        load r->cons.head
    |                                        load r->prod.tail
    |                                        store r->cons.head(+n)
    |                                        <...dequeue.....>
    v                                        store r->cons.tail(+n)
load r->cons.tail

For thread 1, the __atomic_compare_exchange_n detects the outdated
prod.head and retry the flow with the new one. This retry flow works ok on
strong ordering platform(eg:x86). But for weak ordering platforms(arm,
ppc), loading cons.tail and prod.head might be re-ordered, prod.head is new
but cons.tail becomes too old, the retry flow, based on the detection of
outdated head, does not trigger as expected, thus the outdate cons.tail
causes wrong free_entries.

Similarly, for dequeuing, outdated prod.tail leads to wrong avail_entries.

The fix is to keep the deterministic order of two loads allowing the retry
to work.

Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core, 4 threads/core, 2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i

Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.64
MP/MC bulk enq/dequeue (size: 8): 9.58
SP/SC bulk enq/dequeue (size: 32): 1.98
MP/MC bulk enq/dequeue (size: 32): 2.30

With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34

The results showed the thread fence degrade the performance slightly, but
it is required for correctness.

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
5 years agoeal: fix build
Jerin Jacob [Wed, 7 Nov 2018 06:59:06 +0000 (06:59 +0000)]
eal: fix build

Some toolchain has fls() definition in string.h as argument type int,
which is conflicting uint32_t argument type.

/export/dpdk.org/lib/librte_eal/common/rte_reciprocal.c:47:19:
error: conflicting types for ‘fls’
 static inline int fls(uint32_t x)
                  ^~~

/opt/marvell-tools-201/aarch64-marvell-elf/include/strings.h:59:6:
note: previous declaration of ‘fls’ was here
 int  fls(int) __pure2;

FreeBSD string.h also has fls() with argument as int type.
https://www.freebsd.org/cgi/man.cgi?query=fls&sektion=3

Fixing the conflict by using rte version of fls.

Fixes: ffe3ec811ef5 ("sched: introduce reciprocal divide")
Fixes: faf2b25c9f80 ("fm10k: support VMDQ in multi-queue configuration")
Cc: stable@dpdk.org
Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
5 years agoeal: introduce rte version of fls
Jerin Jacob [Wed, 7 Nov 2018 06:59:03 +0000 (06:59 +0000)]
eal: introduce rte version of fls

The function returns the last (most-significant) bit set.
Added unit testcase to verify rte_fls_u32().

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
5 years agotest: fix build
Ferruh Yigit [Tue, 6 Nov 2018 14:35:01 +0000 (14:35 +0000)]
test: fix build

With "make -C test/" command getting following warnings:
 awk: cmd. line:1: fatal: cannot open file `/cmdline_test/cmdline_test/'
      for reading (No such file or directory)
 awk: cmd. line:1: fatal: cannot open file
      `/test-pipeline/test-pipeline/' for reading (No such file or
      directory)
 awk: cmd. line:1: fatal: cannot open file `/test-acl/test-acl/'
      for reading (No such file or directory)

This is because unexpected/invalid MAPFILE param passed to
check-experimental-syms.sh

There is no easy way to unify MAPFILE for different build options,
instead add an input verification to script, and silently ignore wrong
values.

Fixes: a6ec31597a0b ("mk: add experimental tag check")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
5 years agotelemetry: fix shared link with make
Kevin Laatz [Wed, 7 Nov 2018 18:10:18 +0000 (18:10 +0000)]
telemetry: fix shared link with make

Currently, telemetry is not working for shared builds in make.

The --as-needed flag is preventing telemetry from being linked as there are
no direct API calls from the app to telemetry. This is causing the
--telemetry option to not be recognized by EAL.
Telemetry registers it's EAL option using the RTE_INIT constructor. Since
EAL's option parsing is done before the plugins init, the --telemetry
option isn't registered at the time of parsing, and as a result, the
--telemetry option is not being recognized.

This patch fixes this issue by explicitly linking telemetry to the
application by setting the "--no-as-needed" flag for the library in
mk/rte.app.mk.

Fixes: 8877ac688b52 ("telemetry: introduce infrastructure")

Reported-by: Yanjie Xu <yanjie.xu@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agoeal/x86: remove unused memcpy file
Thomas Monjalon [Fri, 9 Nov 2018 13:42:54 +0000 (14:42 +0100)]
eal/x86: remove unused memcpy file

The use of rte_memcpy_ptr was removed in revert below,
but it was missing removing the file arch/x86/rte_memcpy.c.

Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agodevargs: do not replace already inserted device
Thomas Monjalon [Wed, 7 Nov 2018 22:56:45 +0000 (23:56 +0100)]
devargs: do not replace already inserted device

The devargs of a device can be replaced by a newly allocated one
when trying to probe again the same device (multi-process or
multi-ports scenarios). This is breaking some pointer references.

It can be avoided by copying the new content, freeing the new devargs,
and returning the already inserted pointer.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
5 years agoexamples/fips_validation: fix uninitialized access
Marko Kovacevic [Wed, 7 Nov 2018 12:00:06 +0000 (12:00 +0000)]
examples/fips_validation: fix uninitialized access

Fixing a bug raised in coverity using uninitialized value.

Coverity issue: 325881
Fixes: 527cbf3d5ee3 ("examples/fips_validation: support TDES parsing")

Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
5 years agomem: fix DMA mask width sanity check
Alejandro Lucero [Wed, 7 Nov 2018 09:44:56 +0000 (09:44 +0000)]
mem: fix DMA mask width sanity check

Current code has different max DMA mask width values for 32 and 64
bits systems. IOMMU hardware could report a higher supported width
than current MAX_DMA_MASK_BITS when RTE_ARCH_64 is not defined. This
is actually true with a 32 bits kernel running in a 64 bits server
with IOMMU hardware. This could also be a problem with embedded systems
using an IOMMU designed for 64 bits in a 32 bits system.

This patch leaves a single max DMA mask width which will make sure the
mask width is within the range for 64 bits variables used for DMA mask.
This also will avoid wrong values because any value higher than
64 bits is likely wrong.

Fixes: 223b7f1d5ef6 ("mem: add function for checking memseg IOVA")

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agomem: fix use after free in legacy mem init
Anatoly Burakov [Tue, 6 Nov 2018 14:13:29 +0000 (14:13 +0000)]
mem: fix use after free in legacy mem init

Adding an additional failure path in DMA mask check has exposed an
issue where `hugepage` pointer may point to memory that has already
been unmapped, but pointer value is still not NULL, so failure
handler will attempt to unmap it second time if DMA mask check
fails. Fix it by setting `hugepage` pointer to NULL once it is no
longer needed.

Coverity issue: 325730
Fixes: 165c89b84538 ("mem: use DMA mask check for legacy memory")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agoexamples/fips_validation: fix unitialized variables
Marko Kovacevic [Tue, 6 Nov 2018 10:28:34 +0000 (10:28 +0000)]
examples/fips_validation: fix unitialized variables

Fixed compilation issue with variable which may
be used uninitialized.

Fixes: 527cbf3d5ee3 ("examples/fips_validation: support TDES parsing")

Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
5 years agoversion: 18.11-rc2
Thomas Monjalon [Tue, 6 Nov 2018 02:27:49 +0000 (03:27 +0100)]
version: 18.11-rc2

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
5 years agotest/hash: fix build
Dharmik Thakkar [Fri, 26 Oct 2018 21:43:03 +0000 (16:43 -0500)]
test/hash: fix build

Enable print_key_info() function compilation always.

Compilation error message:
'test_hash.c: In function ‘print_key_info’:
test_hash.c:90:15: error: cast discards ‘const’ qualifier from pointer
target type [-Werror=cast-qual]
  uint8_t *p = (uint8_t *)key;
               ^
cc1: all warnings being treated as errors'

Fixes: af75078fece36 ("first public release")
Cc: stable@dpdk.org
Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
5 years agotest/hash: reduce time for multiwriter test
Naga Suresh Somarowthu [Fri, 2 Nov 2018 12:01:06 +0000 (12:01 +0000)]
test/hash: reduce time for multiwriter test

Reduced test duration for hash_multiwriter_autotest.
Number of entries and total insertions are reduced
such that the duration is less than 10 seconds.

Signed-off-by: Naga Suresh Somarowthu <naga.sureshx.somarowthu@intel.com>
Acked-by: Herakliusz Lipiec <herakliusz.lipiec@intel.com>
5 years agotest: reduce time for function reentrancy test
Naga Suresh Somarowthu [Wed, 10 Oct 2018 13:15:12 +0000 (14:15 +0100)]
test: reduce time for function reentrancy test

Reduced test duration for func_reentrancy_autotest.
Reduced MAX_LPM_ITER_TIMES, introduced new macro
MAX_ITER_ONCE to reduce the unique key check and
altered the macro MAX_ITER_TIMES to MAX_ITER_MULTI.
Combined for loops thereby reduced snprintf calls
and repeated iterations.
Such that the duration is less than 10 seconds.

Signed-off-by: Naga Suresh Somarowthu <naga.sureshx.somarowthu@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
5 years agotest: allow taking extra arguments from environment
Bruce Richardson [Fri, 12 Oct 2018 15:34:04 +0000 (16:34 +0100)]
test: allow taking extra arguments from environment

When running unit tests automatically, either via script, from meson,
or otherwise, the same set of options may be used for each run, for
example to set a standard coremask to be used for all tests.

To facilitate this, this patch adds support for the test binary taking
additional EAL parameters from the environment and appending them to the
argc/argv list passed to eal init. This allows parameter modification
without having to edit test scripts etc.

There are now two environment variables which can be used for running
tests:
 * DPDK_TEST - (added previously) passes the test name to be run
               automatically rather than running the app interactively.
               Used by "meson test" when running tests individually or
               as part of a suite.

 * DPDK_TEST_PARAMS - new parameter to specify the commandline arguments
               to use with the test binary. For example to run a test,
               or tests, on only 16 lcores, and to skip pci scan we can
               set this to "-l 0-15 --no-pci".

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Luca Boccassi <bluca@debian.org>
5 years agoexamples/flow_filtering: remove VLAN item
Ori Kam [Mon, 5 Nov 2018 09:35:28 +0000 (09:35 +0000)]
examples/flow_filtering: remove VLAN item

Since the VLAN is not in use and some PMD can't support vlan = 0
this item was removed.

Fixes: 4a3ef59a10c8 ("examples/flow_filtering: add simple demo of flow API")
Cc: stable@dpdk.org
Signed-off-by: Ori Kam <orika@mellanox.com>
5 years agoexamples/flow_filtering: filter out unsupported offloads
Ori Kam [Mon, 5 Nov 2018 09:35:27 +0000 (09:35 +0000)]
examples/flow_filtering: filter out unsupported offloads

Some of the requested offloads are not supported by all devices.

This patch fixes this issue by setting only the supported offloads.

Fixes: feca6c428a5e ("examples/flow_filtering: add Tx queues setup process")
Cc: stable@dpdk.org
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Wei Zhao <wei.zhao1@intel.com>
5 years agobus/pci: fix config r/w access
Fan Zhang [Thu, 1 Nov 2018 12:10:09 +0000 (12:10 +0000)]
bus/pci: fix config r/w access

The recent change to rte_pci_read/write_config() missed
uio_pci_generic case.

Fixes: 630deed612ca ("bus/pci: compare kernel driver instead of interrupt handler")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
5 years agoip_frag: use key length for key comparison
Konstantin Ananyev [Mon, 5 Nov 2018 12:18:58 +0000 (12:18 +0000)]
ip_frag: use key length for key comparison

Right now reassembly code relies on src_dst[] being all zeroes to
determine is it  free/occupied entry in the fragments table.
This is suboptimal and error prone - user can crash DPDK ip_reassembly
app by something like the following scapy script:
x=Ether(src=...,dst=...)/IP(dst='0.0.0.0',src='0.0.0.0',id=0)/('X'*1000)
frags=fragment(x, fragsize=500)
sendp(frags, iface=...)
To overcome that issue and reduce overhead of
'key invalidate'  and 'key is empty' operations -
add key_len into keys comparision procedure.

Fixes: 4f1a8f633862 ("ip_frag: add IPv6 reassembly")
Cc: stable@dpdk.org
Reported-by: Ryan E Hall <ryan.e.hall@intel.com>
Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agoip_frag: check fragment length of incoming packet
Konstantin Ananyev [Mon, 5 Nov 2018 12:18:57 +0000 (12:18 +0000)]
ip_frag: check fragment length of incoming packet

Under some conditions ill-formed fragments might cause
reassembly code to corrupt mbufs and/or crash.
Let say the following fragments sequence:
<ofs=0,len=100, flags=MF>
<ofs=96,len=100, flags=MF>
<ofs=200,len=0,flags=MF>
<ofs=200,len=100,flags=0>
can trigger the problem.
To overcome such situation, added check that fragment length
of incoming value is greater than zero.

Fixes: 601e279df074 ("ip_frag: move fragmentation/reassembly headers into a library")
Fixes: 4f1a8f633862 ("ip_frag: add IPv6 reassembly")
Cc: stable@dpdk.org
Reported-by: Ryan E Hall <ryan.e.hall@intel.com>
Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agovhost: fix possible out of bound access
Ferruh Yigit [Sun, 28 Oct 2018 01:08:46 +0000 (01:08 +0000)]
vhost: fix possible out of bound access

Fixes: d7280c9fffcb ("vhost: support selective datapath")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
5 years agoservice: fix possible null access
Ferruh Yigit [Sun, 28 Oct 2018 01:08:45 +0000 (01:08 +0000)]
service: fix possible null access

Fixes: 21698354c832 ("service: introduce service cores concept")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
5 years agolib: fix shifting 32-bit signed variable 31 times
Ferruh Yigit [Sun, 28 Oct 2018 01:08:44 +0000 (01:08 +0000)]
lib: fix shifting 32-bit signed variable 31 times

Fix cppcheck warning by marking variable as unsigned.

Fixes: dc276b5780c2 ("acl: new library")
Fixes: 986ff526fb84 ("net: add CRC computation API")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agoethdev: remove experimental tag for iterator API
Thomas Monjalon [Thu, 1 Nov 2018 14:46:33 +0000 (15:46 +0100)]
ethdev: remove experimental tag for iterator API

After removing the function rte_eth_dev_attach(),
there are two replacement solutions possible:
one using probe event notification, and one using a new iterator.
So the application can get the new probed ports either asynchronously
or synchronously.

The iterator API is new in DPDK 18.11 so they got the experimental
tag by policy. It causes an issue for strict applications which do
not use experimental functions, and want to use the synchronous method.

The replacement for removed API should not be experimental.
That's why the experimental status of the ethdev iterator is removed.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
5 years agoeal: remove experimental tag for probe/remove
Thomas Monjalon [Thu, 1 Nov 2018 14:46:32 +0000 (15:46 +0100)]
eal: remove experimental tag for probe/remove

The functions rte_dev_probe() and rte_dev_remove() are new
in DPDK 18.11 so they got the experimental tag by policy.
However they are too much basic functions for being skipped
by strict applications which do not use experimental functions.

The alternative is to use rte_eal_hotplug_add() and
rte_eal_hotplug_remove(), but their API requires the application
to parse the devargs string in order to provide bus name,
device name and driver arguments.

The new function rte_dev_probe() is really simpler to use and
more flexible by accepting any devargs string.
Let's encourage applications to use it.

The old functions rte_eal_hotplug_* may be deprecated later.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
5 years agomalloc: fix invalid argument handling
Anatoly Burakov [Mon, 5 Nov 2018 17:26:56 +0000 (17:26 +0000)]
malloc: fix invalid argument handling

When adding memory to an external heap, do not go to unlock failure
handler because the memory hotplug lock hasn't been taken out yet.

Fixes: 7d75c31014f7 ("malloc: allow adding memory to named heaps")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
5 years agonet/netvsc: fix VF link update
Stephen Hemminger [Mon, 5 Nov 2018 18:51:15 +0000 (10:51 -0800)]
net/netvsc: fix VF link update

The netvsc device calls VF (if present) to update the link status
with the wrong device. This leads to errors in mlx5 device when it
can't find the ifindex.

Fixes: dc7680e8597c ("net/netvsc: support integrated VF")

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
5 years agonet/bnxt: fix uninitialized variable access
Ferruh Yigit [Sun, 28 Oct 2018 04:35:42 +0000 (04:35 +0000)]
net/bnxt: fix uninitialized variable access

ag_cons is used uninitialized, it is used when DEBUG enabled, remove
debug code.

Fixes: 0958d8b6435d ("net/bnxt: support LRO")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
5 years agonet/i40e: fix Rx instability with vector mode
Beilei Xing [Mon, 5 Nov 2018 03:18:12 +0000 (11:18 +0800)]
net/i40e: fix Rx instability with vector mode

Previously, there is instability during vector Rx if descriptor
number is not power of 2, e.g. process hang and some Rx packets
are unexpectedly empty. That's because vector Rx mode assumes Rx
descriptor number is power of 2 when doing bit mask.
This patch allows vector mode only when the number of Rx descriptor
is power of 2.

Fixes: 8e109464c022 ("i40e: allow vector Rx and Tx usage")
Fixes: a3c83a2527e1 ("net/i40e: enable runtime queue setup")
Cc: stable@dpdk.org
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
5 years agonet/avf/base: fix shifting 32-bit signed variable 31 times
Ferruh Yigit [Sun, 28 Oct 2018 03:51:33 +0000 (03:51 +0000)]
net/avf/base: fix shifting 32-bit signed variable 31 times

Fixes: e5b2a9e957e7 ("net/avf/base: add base code for avf PMD")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
5 years agonet/mlx5: support Rx queue count API
Tom Barbette [Sat, 27 Oct 2018 15:10:55 +0000 (17:10 +0200)]
net/mlx5: support Rx queue count API

This patch adds support for the rx_queue_count API in mlx5 driver

Signed-off-by: Tom Barbette <barbette@kth.se>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agoapp/testpmd: set default RSS key as null
Ophir Munk [Sat, 3 Nov 2018 15:54:45 +0000 (15:54 +0000)]
app/testpmd: set default RSS key as null

When creating an RSS rule without specifying a key (see [1]) it is
expected that the device will use the default key.
A NULL key is used to indicate to a PMD it should use
its default key, however testpmd assigns a non-NULL dummy key
(see [2]) instead.
This does not enable testing any PMD behavior when the RSS key is not
specified. This commit fixes this limitation by setting key to NULL.

[1]
RSS rule example without specifying a key:
flow create 0 ingress <pattern> / end actions rss queues 0 1 end / end
[2]
Testpmd default key assignment:
.key= "testpmd's default RSS hash key, "
"override it for better balancing"

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agodoc: clarify testpmd guide for flow API
Dekel Peled [Sun, 4 Nov 2018 10:15:45 +0000 (12:15 +0200)]
doc: clarify testpmd guide for flow API

The description of prefix for mask creation was misunderstood.
I updated the description, so it is clearly understood which
mask will be created by a certain prefix.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agonet/igb: update Tx offload mask
Zhirun Yan [Mon, 5 Nov 2018 12:56:44 +0000 (12:56 +0000)]
net/igb: update Tx offload mask

Tx offload mask is updated in following commit 1037ed842c37
("mbuf: fix Tx offload mask") Currently, the new added offload
flags are not supported in PMD and application will fail to call
PMD transmit prepare function.

This patch updates IGB_TX_OFFFLOAD_MASK.

Fixes: 1037ed842c37 ("mbuf: fix Tx offload mask")
Cc: stable@dpdk.org
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agonet/mlx5: remove flags setting from flow preparation
Yongseok Koh [Mon, 5 Nov 2018 07:20:47 +0000 (07:20 +0000)]
net/mlx5: remove flags setting from flow preparation

Even though flow_drv_prepare() takes item_flags and action_flags to be
filled in, those are not used and will be overwritten by parsing of
flow_drv_translate(). There's no reason to keep the flags and fill it.
Appropriate notes are added to the documentation of flow_drv_prepare() and
flow_drv_translate().

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
5 years agonet/mlx5: fix Direct Verbs flow tunnel
Yongseok Koh [Mon, 5 Nov 2018 07:20:45 +0000 (07:20 +0000)]
net/mlx5: fix Direct Verbs flow tunnel

1) Fix layer parsing
In translation of tunneled flows, dev_flow->layers must not be used to
check tunneled layer as it contains all the layers parsed from
flow_drv_prepare(). Checking tunneled layer is needed to distinguish
between outer and inner item. This should be based on dynamic parsing. With
dev_flow->layers on a tunneled flow, items will always be interpreted as
inner as dev_flow->layer already has all the items. Dynamic parsing
(item_flags) is added as there's no such code.

2) Refactoring code
- flow_dv_create_item() and flow_dv_create_action() are merged into
  flow_dv_translate() for consistency with Verbs and *_validate().

Fixes: 246636411536 ("net/mlx5: fix flow tunnel handling")
Fixes: d02cb0691299 ("net/mlx5: add Direct Verbs translate actions")
Fixes: fc2c498ccb94 ("net/mlx5: add Direct Verbs translate items")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
5 years agonet/mlx5: fix Verbs flow tunnel
Yongseok Koh [Mon, 5 Nov 2018 07:20:44 +0000 (07:20 +0000)]
net/mlx5: fix Verbs flow tunnel

1) Fix layer parsing
In translation of tunneled flows, dev_flow->layers must not be used to
check tunneled layer as it contains all the layers parsed from
flow_drv_prepare(). Checking tunneled layer is needed to set
IBV_FLOW_SPEC_INNER and it should be based on dynamic parsing. With
dev_flow->layers on a tunneled flow, items will always be interpreted as
inner as dev_flow->layer already has all the items.

2) Refactoring code
It is partly because flow_verbs_translate_item_*() sets layer flag. Same
code is repeating in multiple locations and that could be error-prone.

- Introduce VERBS_SPEC_INNER() to unify setting IBV_FLOW_SPEC_INNER.
- flow_verbs_translate_item_*() doesn't set parsing result -
  MLX5_FLOW_LAYER_*.
- flow_verbs_translate_item_*() doesn't set priority or adjust hashfields
  but does only item translation. Both have to be done outside.
- Make more consistent between Verbs and DV.

3) Remove flow_verbs_mark_update()
This code can never be reached as validation prohibits specifying mark and
flag actions together. No need to convert flag to mark.

Fixes: 84c406e74524 ("net/mlx5: add flow translate function")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
5 years agonet/mlx5: support default RSS key as null
Ophir Munk [Sun, 4 Nov 2018 12:10:20 +0000 (12:10 +0000)]
net/mlx5: support default RSS key as null

Applications which add RSS rules must supply an RSS key and length.
If an application is only interested in default RSS operation it
should not care about the exact RSS key.
By setting the key to NULL - the PMD will use the default RSS key.
In addition if the application does not care about the RSS type it can
set it to 0 and the PMD will use the default type (ETH_RSS_IP).

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: limit priority range for Linux TC flower driver
Yongseok Koh [Sat, 3 Nov 2018 17:10:33 +0000 (17:10 +0000)]
net/mlx5: limit priority range for Linux TC flower driver

Due to a limitation on driver/FW, priority ranges from 1 to 16 in kernel.
Priority in rte_flow attribute starts from 0 and is added by 1 in
translation. This is subject to be changed to determine the max priority
based on trial-and-error like Verbs driver once the restriction is lifted
or the range is extended.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: make vectorized Tx threshold configurable
Yongseok Koh [Thu, 1 Nov 2018 17:20:32 +0000 (17:20 +0000)]
net/mlx5: make vectorized Tx threshold configurable

Add txqs_max_vec parameter to configure the maximum number of Tx queues to
enable vectorized Tx. And its default value is set according to the
architecture and device type.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: move device spawn configuration to probing
Yongseok Koh [Thu, 1 Nov 2018 17:20:31 +0000 (17:20 +0000)]
net/mlx5: move device spawn configuration to probing

When a device is spawned, it does make more sense that the configuration
parameters are passed by callee. Furthermore, setting default value for
some configuration would need PCIe device ID which can be found in the
probe function.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: add E-switch VXLAN rule cleanup routines
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:46 +0000 (06:18 +0000)]
net/mlx5: add E-switch VXLAN rule cleanup routines

The last part of patchset contains the rule cleanup routines.
These ones is the part of outer interface initialization at
the moment of VXLAN VTEP attaching. These routines query
the list of attached VXLAN devices, the list of local IP
addresses with peer and link scope attribute and the list
of permanent neigh rules, then all found abovementioned
items on the specified outer device are flushed.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: add E-Switch VXLAN encapsulation rules
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:45 +0000 (06:18 +0000)]
net/mlx5: add E-Switch VXLAN encapsulation rules

VXLAN encap rules are applied to the VF ingress traffic and have the
VTEP as actual redirection destinations instead of outer PF.
The encapsulation rule should provide:
- redirection action VF->PF
- VF port ID
- some inner network parameters (MACs/IP)
- the tunnel outer source IP (v4/v6)
- the tunnel outer destination IP (v4/v6). Current
- VNI - Virtual Network Identifier

There is no direct way found to provide kernel with all required
encapsulatioh header parameters. The encapsulation VTEP is created
attached to the outer interface and assumed as default path for
egress encapsulated traffic. The outer tunnel IP address are
assigned to interface using Netlink, the implicit route is
created like this:

  ip addr add <src_ip> peer <dst_ip> dev <outer> scope link

Peer address provides implicit route, and scode link reduces
the risk of conflicts. At initialization time all local scope
link addresses are flushed from device (see next part of patchset).

The destination MAC address is provided via permenent neigh rule:

  ip neigh add dev <outer> lladdr <dst_mac> to <dst_ip> nud permanent

At initialization time all neigh rules of this type are flushed
from device (see the next part of patchset).

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: add E-switch VXLAN tunnel devices management
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:44 +0000 (06:18 +0000)]
net/mlx5: add E-switch VXLAN tunnel devices management

VXLAN interfaces are dynamically created for each local UDP port
of outer networks and then used as targets for TC "flower" filters
in order to perform encapsulation. These VXLAN interfaces are
system-wide, the only one device with given UDP port can exist
in the system (the attempt of creating another device with the
same UDP local port returns EEXIST), so PMD should support the
shared device instances database for PMD instances. These VXLAN
implicitly created devices are called VTEPs (Virtual Tunnel
End Points).

Creation of the VTEP occurs at the moment of rule applying. The
link is set up, root ingress qdisc is also initialized.

Encapsulation VTEPs are created on per port basis, the single
VTEP is attached to the outer interface and is shared for all
encapsulation rules on this interface. The source UDP port is
automatically selected in range 30000-60000.

For decapsulaton one VTEP is created per every unique UDP
local port to accept tunnel traffic. The name of created
VTEP consists of prefix "vmlx_" and the number of UDP port in
decimal digits without leading zeros (vmlx_4789). The VTEP
can be preliminary created in the system before the launching
application, it allows to share UDP ports between primary
and secondary processes.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: fix E-Switch flow counter deletion
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:43 +0000 (06:18 +0000)]
net/mlx5: fix E-Switch flow counter deletion

The counters for E-Switch rules were erroneously deleted in
flow_tcf_remove() routine. The counters deletion is moved to
flow_tcf_destroy() routine.

Fixes: e1114ff6a5ab ("net/mlx5: support e-switch flow count action")

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: update E-Switch VXLAN netlink routines
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:42 +0000 (06:18 +0000)]
net/mlx5: update E-Switch VXLAN netlink routines

This part of patchset updates Netlink exchange routine. Message
sequence numbers became not random ones, the multipart reply messages
are supported, not propagating errors to the following socket calls,
Netlink replies buffer size is increased to MNL_SOCKET_BUFFER_SIZE
and now is preallocated at context creation time instead of stack
usage. This update is needed to support Netlink query operations.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: add VXLAN to flow translate routine
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:41 +0000 (06:18 +0000)]
net/mlx5: add VXLAN to flow translate routine

This part of patchset adds support of VXLAN-related items and
actions to the flow translation routine. Later some tunnel types,
other than VXLAN can be addedd (GRE). No VTEP devices are created at
this point, the flow rule is just translated, not applied yet.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: add VXLAN to flow prepare routine
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:40 +0000 (06:18 +0000)]
net/mlx5: add VXLAN to flow prepare routine

The e-switch Flow prepare function is updated to support VXLAN
encapsulation/and decapsulation actions. The function calculates
buffer size for Netlink message and Flow description structures,
including optional ones for tunneling purposes.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: add E-Switch VXLAN to validation routine
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:39 +0000 (06:18 +0000)]
net/mlx5: add E-Switch VXLAN to validation routine

This patch adds VXLAN support for flow item/action lists validation.
The following entities are now supported:

- RTE_FLOW_ITEM_TYPE_VXLAN, contains the tunnel VNI

- RTE_FLOW_ACTION_TYPE_VXLAN_DECAP, if this action is specified
  the items in the flow items list treated as outer  network
  parameters for tunnel outer header match. The ethernet layer
  addresses always are treated as inner ones.

- RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP, contains the item list to
  build the encapsulation header. In current implementation the
  values is the subject for some constraints:
    - outer source MAC address will be always unconditionally
      set to the one of MAC addresses of outer egress interface
    - no way to specify source UDP port
    - all abovementioned parameters are ignored if specified
      in the rule, warning messages are sent to the log

Minimal tunneling support is also added. If VXLAN decapsulation
action is specified the ETH item can follow the VXLAN VNI item,
the content of this ETH item is treated as inner MAC addresses
and type. The outer ETH item for VXLAN decapsulation action
is always ignored.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: swap items/actions validations for E-Switch rules
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:38 +0000 (06:18 +0000)]
net/mlx5: swap items/actions validations for E-Switch rules

The rule validation function for E-Switch checks item list first,
then action list is checked. This patch swaps the validation order,
now actions are checked first. This is preparation for validation
function update with VXLAN tunnel actions. VXLAN decapsulation
action requires to check the items in special way. We could do
this special check in the single item check pass if the action
flags were gathered before. This is the reason to swap the
item/actions checking loops.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: add necessary structures for E-Switch VXLAN
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:37 +0000 (06:18 +0000)]
net/mlx5: add necessary structures for E-Switch VXLAN

This patch introduces the data structures needed to implement VXLAN
encapsulation/decapsulation hardware offload support for E-Switch.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: add necessary definitions for E-Switch VXLAN
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:36 +0000 (06:18 +0000)]
net/mlx5: add necessary definitions for E-Switch VXLAN

This patch contains tc flower related and some other definitions
needed to implement VXLAN encapsulation/decapsulation hardware
offload support for E-Switch.

mlx5 driver dynamically creates and manages the VXLAN virtual
tunnel endpoint devices, the following definitions control
the parameters of these network devices:

- MLX5_VXLAN_PORT_MIN - minimal allowed UDP port for VXLAN device
- MLX5_VXLAN_PORT_MAX - maximal allowed UDP port for VXLAN device
- MLX5_VXLAN_DEVICE_PFX - name prefix of driver created VXLAN device

The mlx5 drivers creates the VXLAN devices with UDP port within
specified range, devices have the names with specified prefix,
followed by decimal digits of UDP port.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: prepare meson build for adding E-Switch VXLAN
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:36 +0000 (06:18 +0000)]
net/mlx5: prepare meson build for adding E-Switch VXLAN

This patch updates meson.build before adding E-Switch VXLAN
encapsulation/decapsulation hardware offload support.
E-Switch rules are controlled via tc Netilnk commands,
so we need to include tc related headers, and check for
some tunnel specific key definitions.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/mlx5: prepare makefile for adding E-Switch VXLAN
Viacheslav Ovsiienko [Sat, 3 Nov 2018 06:18:35 +0000 (06:18 +0000)]
net/mlx5: prepare makefile for adding E-Switch VXLAN

This patch updates makefile before adding E-Switch VXLAN
encapsulation/decapsulation hardware offload support.
E-Switch rules are controlled via tc Netilnk commands,
so we need to include tc related headers, and check for
some tunnel specific key definitions.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/enic: use macro for attribute weak
Hyong Youb Kim [Fri, 2 Nov 2018 05:49:17 +0000 (22:49 -0700)]
net/enic: use macro for attribute weak

Fixes: 8a6ff33d6d36 ("net/enic: add AVX2 based vectorized Rx handler")

Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
5 years agovhost/crypto: fix inferred misuse of enum
Fan Zhang [Thu, 1 Nov 2018 14:15:04 +0000 (14:15 +0000)]
vhost/crypto: fix inferred misuse of enum

Fix inffered misuse of enum rte_crypto_cipher_algorithm and
rte_crypto_auth_algorithm

Coverity issue: 277202
Fixes: e80a98708166 ("vhost/crypto: add session message handler")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agoapp/testpmd: fix port status for new bonded devices
Radu Nicolau [Thu, 1 Nov 2018 11:20:32 +0000 (11:20 +0000)]
app/testpmd: fix port status for new bonded devices

Set port status to stopped for newly added devices.

Fixes: 2950a769315e ("bond: testpmd support")
Cc: stable@dpdk.org
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
5 years agodoc: clarify TSO Tx offload prerequisite
Jerin Jacob [Thu, 1 Nov 2018 08:46:42 +0000 (08:46 +0000)]
doc: clarify TSO Tx offload prerequisite

Based on the PKT_TX_TCP_SEG definition,
the application needs to update PKT_TX_IPV4 or PKT_TX_IPV6
based on IPV4 or IPV6 packet and PKT_TX_IP_CKSUM ol_flags
to enable Tx TSO offload.

Fixes: dad1ec72a377 ("doc: document NIC features")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agonet/bonding: fix crash on probe
Radu Nicolau [Wed, 31 Oct 2018 15:50:08 +0000 (15:50 +0000)]
net/bonding: fix crash on probe

After the patch below the call to rte_eth_bond_8023ad_agg_selection_set
from probe() segfaults; there is no need to call the function, just set
the mode directly.
Also, reverted 1620175b400e.

Fixes: 391797f04208 ("drivers/bus: move driver assignment to end of probing")
Fixes: 1620175b400e ("net/bonding: fix invalid port id")
Cc: stable@dpdk.org
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Chas Williams <chas3@att.com>
5 years agonet/ena: remove resources when port is being closed
Michal Krawczyk [Wed, 31 Oct 2018 14:53:16 +0000 (15:53 +0100)]
net/ena: remove resources when port is being closed

The new API introduced in 18.11 is suggesting, that the driver should
release all it's resources at the dev_close routine.

All resources previously released in uninit routine during PCI removal,
are now being released at the dev_close and the PMD is indicating that
it is supporting API changes by setting RTE_ETH_DEV_CLOSE_REMOVE flag.

As the device is not allocating MAC addresses dynamically, it is setting
mac_addrs field to NULL, so it wouldn't be released by the
rte_eth_dev_release_port().

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
5 years agonet/qede/base: fix to initialize HW for LLH filters
Rasesh Mody [Wed, 31 Oct 2018 00:27:03 +0000 (00:27 +0000)]
net/qede/base: fix to initialize HW for LLH filters

During initialization of leading PF, we need to initialize HW for LLH
filters. Set HW init parameter to set the engine affinity for
multiple engine adapters.

Fixes: 3eed444a9621 ("net/qede/base: changes for 100G")

Signed-off-by: Rasesh Mody <rasesh.mody@cavium.com>
5 years agoapp/testpmd: fix Tx offload flags
Ferruh Yigit [Sun, 28 Oct 2018 02:16:39 +0000 (02:16 +0000)]
app/testpmd: fix Tx offload flags

ol_flags can be wrong if DEV_TX_OFFLOAD_VLAN_INSERT is not set in
tx_offloads

Fixes: 3eecba267cd6 ("app/testpmd: cleanup internal Tx offloads flags field")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
5 years agoethdev: fix redundant function pointer check
Ferruh Yigit [Sun, 28 Oct 2018 01:46:50 +0000 (01:46 +0000)]
ethdev: fix redundant function pointer check

RTE_FUNC_PTR_OR_ERR_RET() already does the `ethdev_uninit` NULL check.

Fixes: e489007a411c ("ethdev: add generic create/destroy ethdev APIs")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
5 years agovhost: advertize packed ring layout support
Maxime Coquelin [Wed, 31 Oct 2018 10:26:40 +0000 (11:26 +0100)]
vhost: advertize packed ring layout support

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
5 years agovhost: add packed ring support to vring base requests
Maxime Coquelin [Wed, 31 Oct 2018 10:26:39 +0000 (11:26 +0100)]
vhost: add packed ring support to vring base requests

For packed ring layout, we need save avail index and its wrap
counter value. At restore time, the used index and its wrap counter
are set to available's ones, as the ring procressing is stopped
at vring base get time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
5 years agonet/virtio: do not re-enter clean up routines
Chas Williams [Mon, 17 Jul 2017 23:05:22 +0000 (19:05 -0400)]
net/virtio: do not re-enter clean up routines

.dev_uninit calls .dev_stop and .dev_close.  The work that is done in
those routines doesn't need repeated.  Use started and opened to track
the adapter's status.

Fixes: c1f86306a026 ("virtio: add new driver")
Cc: stable@dpdk.org
Signed-off-by: Chas Williams <ciwillia@brocade.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
5 years agonet/ixgbe: fix busy polling while fiber link update
Ilya Maximets [Thu, 1 Nov 2018 16:04:59 +0000 (19:04 +0300)]
net/ixgbe: fix busy polling while fiber link update

If the multispeed fiber link is in DOWN state, ixgbe_setup_link
could take around a second of busy polling. This is highly
inconvenient for the case where single thread periodically
checks the link statuses. For example, OVS main thread
periodically updates the link statuses and hangs for a really
long time busy waiting on ixgbe_setup_link() for a DOWN fiber
ports. For case with 3 down ports it hangs for a 3 seconds and
unable to do anything including packet processing.
Fix that by shifting that workaround to a separate thread by
alarm handler that will try to set up link if it is DOWN.

Fixes: c12d22f65b13 ("net/ixgbe: ensure link status is updated")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
5 years agonet/mlx5: enable loopback by configured mode
Dekel Peled [Thu, 1 Nov 2018 07:11:04 +0000 (09:11 +0200)]
net/mlx5: enable loopback by configured mode

Enable NIC loopback mode based on rte_eth_conf.lpbk_mode
configuration.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: add caching of encap/decap actions
Dekel Peled [Thu, 1 Nov 2018 09:37:33 +0000 (11:37 +0200)]
net/mlx5: add caching of encap/decap actions

Make flow encap and decap Verbs actions cacheable resources.
Store created actions in local database.
This enables MLX5 PMD reuse of existing actions.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: add raw data encap/decap to Direct Verbs
Dekel Peled [Thu, 1 Nov 2018 09:37:32 +0000 (11:37 +0200)]
net/mlx5: add raw data encap/decap to Direct Verbs

This patch implements the encap and decap actions, using raw data,
in DV flow for MLX5 PMD.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: add NVGRE decap action to Direct Verbs
Dekel Peled [Thu, 1 Nov 2018 09:37:31 +0000 (11:37 +0200)]
net/mlx5: add NVGRE decap action to Direct Verbs

This patch implements the NVGRE decap action in DV flow for MLX5 PMD.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: add NVGRE encap action to Direct Verbs
Dekel Peled [Thu, 1 Nov 2018 09:37:30 +0000 (11:37 +0200)]
net/mlx5: add NVGRE encap action to Direct Verbs

This patch implements the nvgre encap action in DV flow for MLX5 PMD.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: add VXLAN decap action to Direct Verbs
Dekel Peled [Thu, 1 Nov 2018 09:37:29 +0000 (11:37 +0200)]
net/mlx5: add VXLAN decap action to Direct Verbs

This patch implements the VXLAN decap action in DV flow for MLX5 PMD.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: add VXLAN encap action to Direct Verbs
Dekel Peled [Thu, 1 Nov 2018 09:37:28 +0000 (11:37 +0200)]
net/mlx5: add VXLAN encap action to Direct Verbs

This patch implements the VXLAN encap action in DV flow for MLX5 PMD.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: add flow action functions to glue
Dekel Peled [Thu, 1 Nov 2018 09:37:27 +0000 (11:37 +0200)]
net/mlx5: add flow action functions to glue

This patch adds glue functions for operations:
- Create packet reformat (encap/decap) flow action.
- Destroy flow action.

The new operations depend on HAVE_IBV_FLOW_DV_SUPPORT.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet: fix build with pedantic
Shahaf Shuler [Thu, 1 Nov 2018 12:46:45 +0000 (14:46 +0200)]
net: fix build with pedantic

The following error popped when compiling with -pedantic:

In file included from
 drivers/net/mlx5/mlx5_flow_dv.c:28:0:
 include/rte_gre.h:20:2:
 error: type of bit-field 'res2' is a GCC  extension [-Werror=pedantic]
 uint16_t res2:4; /**< Reserved */

Fixing by adding the __extension__ attribute.

Fixes: 894f71a3805d ("net: add GRE header structure")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
5 years agonet/mlx5: fix validation of MLPS-in-GRE
Yongseok Koh [Tue, 30 Oct 2018 07:53:07 +0000 (07:53 +0000)]
net/mlx5: fix validation of MLPS-in-GRE

Multiple tunnel isn't allowed but MPLS over GRE should be accepted.

Fixes: 23c1d42c7138 ("net/mlx5: split flow validation to dedicated function")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: add missing flow director delete
Yongseok Koh [Tue, 30 Oct 2018 07:51:27 +0000 (07:51 +0000)]
net/mlx5: add missing flow director delete

Deleting FDIR flow is not implemented by mistake. Also the name of static
functions are properly renamed.

Fixes: b42c000e37a8 ("net/mlx5: remove flow support")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
5 years agonet/mlx5: fix memory leak on Direct Verbs error
Dekel Peled [Mon, 29 Oct 2018 16:09:10 +0000 (18:09 +0200)]
net/mlx5: fix memory leak on Direct Verbs error

Add freeing of allocated memory before exiting on mlx5dv error.

Fixes: fc2c498ccb94 ("net/mlx5: add Direct Verbs translate items")

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
5 years agonet/virtio: fix guest announce support
Tiwei Bie [Mon, 29 Oct 2018 05:28:08 +0000 (13:28 +0800)]
net/virtio: fix guest announce support

We need to check the status field in virtio net config structure
instead of the bits read from ISR register to know whether we need
to do guest announce.

Fixes: 7365504f77e3 ("net/virtio: support guest announce")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
5 years agonet/virtio-user: simplify device features preparation
Tiwei Bie [Mon, 29 Oct 2018 05:28:07 +0000 (13:28 +0800)]
net/virtio-user: simplify device features preparation

Get rid of the duplicated code in device features preparation
which looks awful.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
5 years agonet/virtio-user: fix device features for server mode
Tiwei Bie [Mon, 29 Oct 2018 05:28:06 +0000 (13:28 +0800)]
net/virtio-user: fix device features for server mode

We need to save the supported frontend features (which won't be
announced by vhost backend), otherwise we will lost them when the
connection to vhost-user backend is established in server mode.

Fixes: 201a41651715 ("net/virtio-user: fix multiple queues fail in server mode")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
5 years agonet/virtio-user: do not reset owner when driver resets
Tiwei Bie [Mon, 29 Oct 2018 05:28:05 +0000 (13:28 +0800)]
net/virtio-user: do not reset owner when driver resets

When driver resets the device, virtio-user just needs to send
GET_VRING_BASE messages to stop the vhost backend, and that's
what QEMU does. With this change, we won't need to set owner
when starting virtio-user device anymore. This will help us to
get rid of below error message on startup:

vhost_kernel_ioctl(): VHOST_SET_OWNER failed: Device or resource busy

Fixes: bce7e9050f9b ("net/virtio-user: fix start with kernel vhost")
Fixes: 0d6a8752ac9d ("net/virtio-user: fix crash as features change")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
5 years agonet/virtio-user: do not make vhost channel non-block
Tiwei Bie [Mon, 29 Oct 2018 05:28:04 +0000 (13:28 +0800)]
net/virtio-user: do not make vhost channel non-block

There is no need to make the vhost user channel nonblock, and
making it nonblock will make vhost_user_read() fail with EAGAIN
when vhost messages need a reply.

Fixes: bd8f50a45d0f ("net/virtio-user: support server mode")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
5 years agonet/virtio-user: do not stop stopped device again
Tiwei Bie [Mon, 29 Oct 2018 05:28:03 +0000 (13:28 +0800)]
net/virtio-user: do not stop stopped device again

Without this change, virtio-user still works, but it will show
annoying error messages like this on shutdown:

vhost_kernel_set_backend(): VHOST_NET_SET_BACKEND fails, Operation not permitted
vhost_kernel_ioctl(): VHOST_RESET_OWNER failed: Operation not permitted

Fixes: e3b434818bbb ("net/virtio-user: support kernel vhost")
Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
Cc: stable@dpdk.org
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
5 years agonet/vhost: fix parameters string
Tiwei Bie [Thu, 25 Oct 2018 09:46:59 +0000 (17:46 +0800)]
net/vhost: fix parameters string

Add the missing params to the param string.

Fixes: 39cac2adcad0 ("net/vhost: add client option")
Fixes: 4ce97c6f6b4f ("net/vhost: add an option to enable dequeue zero copy")
Fixes: 447e0d379756 ("net/vhost: add parameter to enable IOMMU feature")
Fixes: 6d6e95cec455 ("net/vhost: add parameter to enable postcopy")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
5 years agonet/virtio: drop duplicated reset method
Tiwei Bie [Thu, 25 Oct 2018 09:46:58 +0000 (17:46 +0800)]
net/virtio: drop duplicated reset method

Drop the duplicated reset() method in virtio_pci_ops. Currently
vtpci_reset() is implemented on set_status() and get_status()
directly. The reset() method in virtio_pci_ops isn't used and
its implementation in the legacy device isn't right.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>