Currently, if configuration fails (for example if a 100G card is used
with an odd number of RX/TX queues) QEDE crashes due to a null pointer
dereference.
This commit fixes it by checking that the pointer is not NULL before
using it.
Fixes: 7105b24f4bb8 ("net/qede: fix memory alloc for multiple port reconfig") Cc: stable@dpdk.org Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Rasesh Mody <rasesh.mody@cavium.com>
Shahaf Shuler [Mon, 12 Nov 2018 05:58:22 +0000 (07:58 +0200)]
doc: add mlx5 Direct Verbs flow engine limitation
Would be good to add also a code which disable the dv_flow_en
the user requested. However such support will need to use new netlink
command to query the switchdev mode from the underlying kernel.
Considering the current 18.11 release is close to RC3, only a
documentation is added.
net/mlx5: fix VXLAN device rollback if rule apply fails
If rule contains tunneling action (like VXLAN encapsulation)
the VTEP (Virtual Tunneling EndPoint) device is pre-configured
before applying the rule. If kernel returns an error this
VTEP configuration should be rolled back to the origin state.
The patch adds the missing VTEP configuration restoration.
The VXLAN related rule cleanup routine queries and gathers all
existing local IP and neigh rules into buffer list. One buffer
may contain multiple rule deletion commands and is prepared
to send into Netlink as single message. But, if error occurs
for some deletion commands in the buffer, the multiple ACK
message with errors can be send back by the kernel. It breaks
the Netlink communication sequence numbers, because we expect
only one ACK message and it smashes out futher Netlik
communication.
The workaround of this problem is to send rule deletion commands
from buffer in one-by-one fashion and get ACK message for every
command sent. We do not expect too may rules preexist, so there
should not be critical performance degradation at VXLAN outer
interface initialization.
net/mlx5: add Netlink message size check in rule cleanup
This patch is preparation for the following fix, we are going to send
Netlink message from buffer in one-by-one fashion. It is highly
desirable to check multimessage buffer consistency for debug purposes.
net/mlx5: fix buffer allocation check in rule cleanup
The Netlink message buffer is allocated and there is the typo,
the other pointer is checked instead of returned one. If no
memory is allocated and NULL is returned by allocation routine
the bug causes segmentation fault. The patch fixes typo,
returned pointer is validated.
Dekel Peled [Thu, 8 Nov 2018 21:29:45 +0000 (23:29 +0200)]
net/mlx5: fix flow director add and delete
Fix the flow_fdir_cmp() function, used by flow_fdir_filter_lookup().
This function is used by flow_fdir_filter_add() to check if same rule
exists, and by flow_fdir_filter_delete() to find flow rule to delete.
The function compared actions conf pointers, changed to compare
actions type only.
Radu Nicolau [Thu, 8 Nov 2018 15:26:42 +0000 (15:26 +0000)]
net/bonding: fix crash when stopping mode 4 port
When stopping a bonded port all slaves are deactivated. Attempting
to deactivate a slave that was never activated will result in a segfault
when mode 4 is used.
Fixes: 7486331308f6 ("net/bonding: stop and deactivate slaves on stop") Cc: stable@dpdk.org Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Acked-by: Chas Williams <chas3@att.com>
Rasesh Mody [Thu, 8 Nov 2018 21:19:30 +0000 (21:19 +0000)]
net/bnx2x: fix VF link state update
In general the VF driver should not access the chip. For VF link status
update, VF driver should not use HW lock, use bnx2x_link_report_locked()
instead.
Add few prints for releasing previously held HW locks.
Rasesh Mody [Thu, 8 Nov 2018 21:19:26 +0000 (21:19 +0000)]
net/bnx2x: fix dynamic logging
Use rte_log() rather than RTE_LOG() for dynamic logging. Rearrange
dynamic log types to the top and configurable log types to bottom.
Remove unused RTE_LIBRTE_BNX2X_DEBUG_TX_FREE
Fixes: ba7eeb035a5f ("net/bnx2x: fix logging to include device name") Cc: stable@dpdk.org Signed-off-by: Rasesh Mody <rasesh.mody@cavium.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Fan Zhang [Tue, 6 Nov 2018 16:22:48 +0000 (16:22 +0000)]
vhost/crypto: fix packet copy in chaining mode
This patch fixes the incorrect packet content copy in the
chaining mode. Originally the content before cipher offset is
overwritten by all zeros. This patch fixes the problem by
making sure the correct write back source and destination
settings during set up.
Fixes: 3bb595ecd682 ("vhost/crypto: add request handler") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
net/thunderx: fix Tx desc corruption in scatter-gather mode
For performance reasons, word1 of send_hdr_s
sub descriptor was not cleared assuming it is always
having default value of zero since it comes from fixed
offsets of SQ buffer.
This is causing issues in case of SG mode because,
the size of send command might change and hence the word1
of send_hdr_s is not always at fixed offsets of the SQ buffer
and hence not having default value of zero.
This fixes the issue by clearing the word1 in case of SG mode
for every packet.
Fixes: 1c421f18e095 ("net/thunderx: add single and multi-segment Tx") Cc: stable@dpdk.org Signed-off-by: Subrahmanyam Nilla <snilla@caviumnetworks.com> Signed-off-by: Nithin Dabilpuram <nithin.dabilpuram@caviumnetworks.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
net/cxgbevf: fix illegal memory access when freeing MPS TCAM
Individual MPS TCAM entries are not allocated as separate entities.
All entries are allocated once as an array. So, fix bug with attempting
to free illegal memory location.
Also add missing MPS TCAM initialization for CXGBEVF.
Fixes: 6fda3f0ddda9 ("net/cxgbe: add API to program hardware MPS table") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
net/cxgbe: increase completion wait time for flow operations
Under heavy load, flow related operations can take more time to
complete. Increase max completion wait time to 10 seconds. Also
increase max receive budget to read more replies from firmware
in every cycle.
Thomas Monjalon [Wed, 7 Nov 2018 16:00:28 +0000 (17:00 +0100)]
net/mlx5: fix build on PPC64
The AltiVec header file breaks boolean type:
error: incompatible types when initializing type
'__vector _bool int' {aka '_vector(4) __bool int'} using type 'int'
If __APPLE_ALTIVEC__ is defined, then bool type is redefined
and conflicts with stdbool.h.
There is no good solution to fix it for the whole project without
breaking something else, so a workaround is inserted in mlx5 PMD.
This workaround is not compatible with C++ but there is no C++ in DPDK.
Suggested-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Tested-by: David Wilder <dwilder@us.ibm.com> Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Yunjian Wang [Tue, 6 Nov 2018 07:57:01 +0000 (15:57 +0800)]
net/e1000/base: fix uninitialized variable
This patch fixes the variable 'phy_word' may be used uninitialized.
Fixes: 5b6439cf03a4 ("e1000/base: support different EEARBC for i210") Cc: stable@dpdk.org Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Wei Zhao [Wed, 7 Nov 2018 06:14:29 +0000 (14:14 +0800)]
app/testpmd: fix Rx offload search
There is an error in function search_rx_offload(),
it will break when get unexpected return value from function
rte_eth_dev_rx_offload_name(), but rte_eth_dev_rx_offload_name()
will return some unexpected value indeed.
Fixes: c73a9071877a ("app/testpmd: add commands to test new offload API") Cc: stable@dpdk.org Signed-off-by: Wei Zhao <wei.zhao1@intel.com> Tested-by: Yuan Peng <yuan.peng@intel.com>
Yongseok Koh [Tue, 6 Nov 2018 08:14:18 +0000 (08:14 +0000)]
net/mlx5: fix L4 protocol validation
- Currently, no device supports partial mask for protocol in IP header.
- As there could be multiple IP items, next_protocol variable in flow
validation has to be reset for inner layer. Otherwise, inner TCP/UDP
will see protocol number of outer IP header.
- Remove redundant protocol checking for MPLS, which is done in
mlx5_flow_validate_item_mpls().
Static analysis tools don't like the fact that fd could be zero
in the error path. This won't happen in real world because
stdin would have to be closed, then other error occurring.
Coverity issue: 14079 Fixes: 02f96a0a82d1 ("net/tap: add TUN/TAP device PMD") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Keith Wiles <keith.wiles@intel.com>
Bruce Richardson [Mon, 12 Nov 2018 12:26:15 +0000 (12:26 +0000)]
mk: allow renaming of build directories
When building using make, the Makefile in the build directory contained
the name of the build directory to be passed as an "O=" parameter to
the DPDK SDK makefiles. Unfortunately, this meant that the compilation
would always fail if the build directory was renamed. To remove this
limitation, we can use $(CURDIR) instead of the directory name.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Bruce Richardson [Thu, 21 Dec 2017 16:53:35 +0000 (16:53 +0000)]
eal/x86: move header to standard BSD license
This updates the license on the rte_rtm.h file to be the standard
BSD-3-Clause license used for the rest of DPDK, thus bringing the file in
compliance with the DPDK licensing policy.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Bruce Richardson [Mon, 12 Nov 2018 10:47:19 +0000 (10:47 +0000)]
test/hash: improve output for r/w test
The hash read-write autotest generates a lot of text, which is very dense
on the screen. Even the summary at the end is hard to follow as everything
is very compact. We can improve readability by highlighting the starts of
the various sections, and by indenting the values within subsections.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Bruce Richardson [Mon, 12 Nov 2018 10:47:18 +0000 (10:47 +0000)]
eal/x86: reduce contention when retrying TSX
When TSX transactions abort, it is generally worth retrying a number of
times before falling back to the traditional locking path, as the
parallelism benefits from TSX can be worth it when a transaction does
succeed. For cases with multiple threads and high contention rates, it
can be useful to have increasing delays between retry attempts, so as to
avoid having the same threads repeatedly collided.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Thomas Monjalon [Sun, 11 Nov 2018 23:58:56 +0000 (00:58 +0100)]
pci: fix parsing of address without function number
If the last part of the PCI address (function number) is missing,
the parsing was successful, assuming function 0.
The call to strtoul is not returning an error in such a case,
so an explicit check is inserted before.
This bug has always been there in older parsing macros:
- GET_PCIADDR_FIELD
- GET_BLACKLIST_FIELD
Fixes: af75078fece3 ("first public release") Cc: stable@dpdk.org Reported-by: Wisam Jaddo <wisamm@mellanox.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The lock-free algorithm has caused significant lookup
performance regression for certain use cases. The
regression is attributed to the use of non-relaxed
memory orderings. 2 versions of the lookup functions
are created. One that uses the RW lock and the one that
is lock-free. This restores the performance regression
caused for use cases that used RW lock version of the
lookup function.
Fixes: e605a1d36 ("hash: add lock-free r/w concurrency") Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Gavin Hu [Fri, 9 Nov 2018 11:42:47 +0000 (19:42 +0800)]
ring/c11: relax ordering for load and store of the head
When calling __atomic_compare_exchange_n, use relaxed ordering for the
success case, as multiple producers/consumers do not release updates to
each other so no need for acquire or release ordering.
Because the thread fence in place, ordering for the first iteration can
be relaxed.
Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core,4 threads/core,2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i
Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34
With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.59
MP/MC bulk enq/dequeue (size: 8): 10.54
SP/SC bulk enq/dequeue (size: 32): 1.73
MP/MC bulk enq/dequeue (size: 32): 2.38
No significant improvement, nor regression was seen, as the optimisation
is not at the critical path.
Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Gavin Hu [Fri, 9 Nov 2018 11:42:46 +0000 (19:42 +0800)]
ring/c11: keep deterministic order allowing retry to work
Use case scenario:
1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some
reasons (running out of cpu time, preempted,...)
2) Thread 2 is enqueuing. It succeeds in enqueuing and moves prod.head
forward.
3) Thread 3 is dequeuing. It succeeds in dequeuing and moves the cons.tail
beyond the prod.head read by thread 1.
4) Thread 1 is re-scheduled. It reads cons.tail.
cpu1(producer) cpu2(producer) cpu3(consumer)
load r->prod.head
^ load r->prod.head
| load r->cons.tail
| store r->prod.head(+n)
stalled <-- enqueue ----->
| store r->prod.tail(+n)
| load r->cons.head
| load r->prod.tail
| store r->cons.head(+n)
| <...dequeue.....>
v store r->cons.tail(+n)
load r->cons.tail
For thread 1, the __atomic_compare_exchange_n detects the outdated
prod.head and retry the flow with the new one. This retry flow works ok on
strong ordering platform(eg:x86). But for weak ordering platforms(arm,
ppc), loading cons.tail and prod.head might be re-ordered, prod.head is new
but cons.tail becomes too old, the retry flow, based on the detection of
outdated head, does not trigger as expected, thus the outdate cons.tail
causes wrong free_entries.
Similarly, for dequeuing, outdated prod.tail leads to wrong avail_entries.
The fix is to keep the deterministic order of two loads allowing the retry
to work.
Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core, 4 threads/core, 2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i
Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.64
MP/MC bulk enq/dequeue (size: 8): 9.58
SP/SC bulk enq/dequeue (size: 32): 1.98
MP/MC bulk enq/dequeue (size: 32): 2.30
With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34
The results showed the thread fence degrade the performance slightly, but
it is required for correctness.
Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Ferruh Yigit [Tue, 6 Nov 2018 14:35:01 +0000 (14:35 +0000)]
test: fix build
With "make -C test/" command getting following warnings:
awk: cmd. line:1: fatal: cannot open file `/cmdline_test/cmdline_test/'
for reading (No such file or directory)
awk: cmd. line:1: fatal: cannot open file
`/test-pipeline/test-pipeline/' for reading (No such file or
directory)
awk: cmd. line:1: fatal: cannot open file `/test-acl/test-acl/'
for reading (No such file or directory)
This is because unexpected/invalid MAPFILE param passed to
check-experimental-syms.sh
There is no easy way to unify MAPFILE for different build options,
instead add an input verification to script, and silently ignore wrong
values.
Fixes: a6ec31597a0b ("mk: add experimental tag check") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>
Kevin Laatz [Wed, 7 Nov 2018 18:10:18 +0000 (18:10 +0000)]
telemetry: fix shared link with make
Currently, telemetry is not working for shared builds in make.
The --as-needed flag is preventing telemetry from being linked as there are
no direct API calls from the app to telemetry. This is causing the
--telemetry option to not be recognized by EAL.
Telemetry registers it's EAL option using the RTE_INIT constructor. Since
EAL's option parsing is done before the plugins init, the --telemetry
option isn't registered at the time of parsing, and as a result, the
--telemetry option is not being recognized.
This patch fixes this issue by explicitly linking telemetry to the
application by setting the "--no-as-needed" flag for the library in
mk/rte.app.mk.
Thomas Monjalon [Wed, 7 Nov 2018 22:56:45 +0000 (23:56 +0100)]
devargs: do not replace already inserted device
The devargs of a device can be replaced by a newly allocated one
when trying to probe again the same device (multi-process or
multi-ports scenarios). This is breaking some pointer references.
It can be avoided by copying the new content, freeing the new devargs,
and returning the already inserted pointer.
Current code has different max DMA mask width values for 32 and 64
bits systems. IOMMU hardware could report a higher supported width
than current MAX_DMA_MASK_BITS when RTE_ARCH_64 is not defined. This
is actually true with a 32 bits kernel running in a 64 bits server
with IOMMU hardware. This could also be a problem with embedded systems
using an IOMMU designed for 64 bits in a 32 bits system.
This patch leaves a single max DMA mask width which will make sure the
mask width is within the range for 64 bits variables used for DMA mask.
This also will avoid wrong values because any value higher than
64 bits is likely wrong.
Anatoly Burakov [Tue, 6 Nov 2018 14:13:29 +0000 (14:13 +0000)]
mem: fix use after free in legacy mem init
Adding an additional failure path in DMA mask check has exposed an
issue where `hugepage` pointer may point to memory that has already
been unmapped, but pointer value is still not NULL, so failure
handler will attempt to unmap it second time if DMA mask check
fails. Fix it by setting `hugepage` pointer to NULL once it is no
longer needed.
Coverity issue: 325730 Fixes: 165c89b84538 ("mem: use DMA mask check for legacy memory") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Dharmik Thakkar [Fri, 26 Oct 2018 21:43:03 +0000 (16:43 -0500)]
test/hash: fix build
Enable print_key_info() function compilation always.
Compilation error message:
'test_hash.c: In function ‘print_key_info’:
test_hash.c:90:15: error: cast discards ‘const’ qualifier from pointer
target type [-Werror=cast-qual]
uint8_t *p = (uint8_t *)key;
^
cc1: all warnings being treated as errors'
Reduced test duration for hash_multiwriter_autotest.
Number of entries and total insertions are reduced
such that the duration is less than 10 seconds.
Reduced test duration for func_reentrancy_autotest.
Reduced MAX_LPM_ITER_TIMES, introduced new macro
MAX_ITER_ONCE to reduce the unique key check and
altered the macro MAX_ITER_TIMES to MAX_ITER_MULTI.
Combined for loops thereby reduced snprintf calls
and repeated iterations.
Such that the duration is less than 10 seconds.
Bruce Richardson [Fri, 12 Oct 2018 15:34:04 +0000 (16:34 +0100)]
test: allow taking extra arguments from environment
When running unit tests automatically, either via script, from meson,
or otherwise, the same set of options may be used for each run, for
example to set a standard coremask to be used for all tests.
To facilitate this, this patch adds support for the test binary taking
additional EAL parameters from the environment and appending them to the
argc/argv list passed to eal init. This allows parameter modification
without having to edit test scripts etc.
There are now two environment variables which can be used for running
tests:
* DPDK_TEST - (added previously) passes the test name to be run
automatically rather than running the app interactively.
Used by "meson test" when running tests individually or
as part of a suite.
* DPDK_TEST_PARAMS - new parameter to specify the commandline arguments
to use with the test binary. For example to run a test,
or tests, on only 16 lcores, and to skip pci scan we can
set this to "-l 0-15 --no-pci".
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Tested-by: Luca Boccassi <bluca@debian.org>
Right now reassembly code relies on src_dst[] being all zeroes to
determine is it free/occupied entry in the fragments table.
This is suboptimal and error prone - user can crash DPDK ip_reassembly
app by something like the following scapy script:
x=Ether(src=...,dst=...)/IP(dst='0.0.0.0',src='0.0.0.0',id=0)/('X'*1000)
frags=fragment(x, fragsize=500)
sendp(frags, iface=...)
To overcome that issue and reduce overhead of
'key invalidate' and 'key is empty' operations -
add key_len into keys comparision procedure.
Fixes: 4f1a8f633862 ("ip_frag: add IPv6 reassembly") Cc: stable@dpdk.org Reported-by: Ryan E Hall <ryan.e.hall@intel.com> Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Under some conditions ill-formed fragments might cause
reassembly code to corrupt mbufs and/or crash.
Let say the following fragments sequence:
<ofs=0,len=100, flags=MF>
<ofs=96,len=100, flags=MF>
<ofs=200,len=0,flags=MF>
<ofs=200,len=100,flags=0>
can trigger the problem.
To overcome such situation, added check that fragment length
of incoming value is greater than zero.
Fixes: 601e279df074 ("ip_frag: move fragmentation/reassembly headers into a library") Fixes: 4f1a8f633862 ("ip_frag: add IPv6 reassembly") Cc: stable@dpdk.org Reported-by: Ryan E Hall <ryan.e.hall@intel.com> Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Thomas Monjalon [Thu, 1 Nov 2018 14:46:33 +0000 (15:46 +0100)]
ethdev: remove experimental tag for iterator API
After removing the function rte_eth_dev_attach(),
there are two replacement solutions possible:
one using probe event notification, and one using a new iterator.
So the application can get the new probed ports either asynchronously
or synchronously.
The iterator API is new in DPDK 18.11 so they got the experimental
tag by policy. It causes an issue for strict applications which do
not use experimental functions, and want to use the synchronous method.
The replacement for removed API should not be experimental.
That's why the experimental status of the ethdev iterator is removed.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Kevin Traynor <ktraynor@redhat.com> Tested-by: Kevin Traynor <ktraynor@redhat.com>
Thomas Monjalon [Thu, 1 Nov 2018 14:46:32 +0000 (15:46 +0100)]
eal: remove experimental tag for probe/remove
The functions rte_dev_probe() and rte_dev_remove() are new
in DPDK 18.11 so they got the experimental tag by policy.
However they are too much basic functions for being skipped
by strict applications which do not use experimental functions.
The alternative is to use rte_eal_hotplug_add() and
rte_eal_hotplug_remove(), but their API requires the application
to parse the devargs string in order to provide bus name,
device name and driver arguments.
The new function rte_dev_probe() is really simpler to use and
more flexible by accepting any devargs string.
Let's encourage applications to use it.
The old functions rte_eal_hotplug_* may be deprecated later.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Kevin Traynor <ktraynor@redhat.com> Tested-by: Kevin Traynor <ktraynor@redhat.com>
The netvsc device calls VF (if present) to update the link status
with the wrong device. This leads to errors in mlx5 device when it
can't find the ifindex.
Fixes: dc7680e8597c ("net/netvsc: support integrated VF") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Beilei Xing [Mon, 5 Nov 2018 03:18:12 +0000 (11:18 +0800)]
net/i40e: fix Rx instability with vector mode
Previously, there is instability during vector Rx if descriptor
number is not power of 2, e.g. process hang and some Rx packets
are unexpectedly empty. That's because vector Rx mode assumes Rx
descriptor number is power of 2 when doing bit mask.
This patch allows vector mode only when the number of Rx descriptor
is power of 2.
Ophir Munk [Sat, 3 Nov 2018 15:54:45 +0000 (15:54 +0000)]
app/testpmd: set default RSS key as null
When creating an RSS rule without specifying a key (see [1]) it is
expected that the device will use the default key.
A NULL key is used to indicate to a PMD it should use
its default key, however testpmd assigns a non-NULL dummy key
(see [2]) instead.
This does not enable testing any PMD behavior when the RSS key is not
specified. This commit fixes this limitation by setting key to NULL.
[1]
RSS rule example without specifying a key:
flow create 0 ingress <pattern> / end actions rss queues 0 1 end / end
[2]
Testpmd default key assignment:
.key= "testpmd's default RSS hash key, "
"override it for better balancing"
Dekel Peled [Sun, 4 Nov 2018 10:15:45 +0000 (12:15 +0200)]
doc: clarify testpmd guide for flow API
The description of prefix for mask creation was misunderstood.
I updated the description, so it is clearly understood which
mask will be created by a certain prefix.
Zhirun Yan [Mon, 5 Nov 2018 12:56:44 +0000 (12:56 +0000)]
net/igb: update Tx offload mask
Tx offload mask is updated in following commit 1037ed842c37
("mbuf: fix Tx offload mask") Currently, the new added offload
flags are not supported in PMD and application will fail to call
PMD transmit prepare function.
Yongseok Koh [Mon, 5 Nov 2018 07:20:47 +0000 (07:20 +0000)]
net/mlx5: remove flags setting from flow preparation
Even though flow_drv_prepare() takes item_flags and action_flags to be
filled in, those are not used and will be overwritten by parsing of
flow_drv_translate(). There's no reason to keep the flags and fill it.
Appropriate notes are added to the documentation of flow_drv_prepare() and
flow_drv_translate().
Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>
Yongseok Koh [Mon, 5 Nov 2018 07:20:45 +0000 (07:20 +0000)]
net/mlx5: fix Direct Verbs flow tunnel
1) Fix layer parsing
In translation of tunneled flows, dev_flow->layers must not be used to
check tunneled layer as it contains all the layers parsed from
flow_drv_prepare(). Checking tunneled layer is needed to distinguish
between outer and inner item. This should be based on dynamic parsing. With
dev_flow->layers on a tunneled flow, items will always be interpreted as
inner as dev_flow->layer already has all the items. Dynamic parsing
(item_flags) is added as there's no such code.
2) Refactoring code
- flow_dv_create_item() and flow_dv_create_action() are merged into
flow_dv_translate() for consistency with Verbs and *_validate().
Fixes: 246636411536 ("net/mlx5: fix flow tunnel handling") Fixes: d02cb0691299 ("net/mlx5: add Direct Verbs translate actions") Fixes: fc2c498ccb94 ("net/mlx5: add Direct Verbs translate items") Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>
Yongseok Koh [Mon, 5 Nov 2018 07:20:44 +0000 (07:20 +0000)]
net/mlx5: fix Verbs flow tunnel
1) Fix layer parsing
In translation of tunneled flows, dev_flow->layers must not be used to
check tunneled layer as it contains all the layers parsed from
flow_drv_prepare(). Checking tunneled layer is needed to set
IBV_FLOW_SPEC_INNER and it should be based on dynamic parsing. With
dev_flow->layers on a tunneled flow, items will always be interpreted as
inner as dev_flow->layer already has all the items.
2) Refactoring code
It is partly because flow_verbs_translate_item_*() sets layer flag. Same
code is repeating in multiple locations and that could be error-prone.
- Introduce VERBS_SPEC_INNER() to unify setting IBV_FLOW_SPEC_INNER.
- flow_verbs_translate_item_*() doesn't set parsing result -
MLX5_FLOW_LAYER_*.
- flow_verbs_translate_item_*() doesn't set priority or adjust hashfields
but does only item translation. Both have to be done outside.
- Make more consistent between Verbs and DV.
3) Remove flow_verbs_mark_update()
This code can never be reached as validation prohibits specifying mark and
flag actions together. No need to convert flag to mark.
Fixes: 84c406e74524 ("net/mlx5: add flow translate function") Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>
Ophir Munk [Sun, 4 Nov 2018 12:10:20 +0000 (12:10 +0000)]
net/mlx5: support default RSS key as null
Applications which add RSS rules must supply an RSS key and length.
If an application is only interested in default RSS operation it
should not care about the exact RSS key.
By setting the key to NULL - the PMD will use the default RSS key.
In addition if the application does not care about the RSS type it can
set it to 0 and the PMD will use the default type (ETH_RSS_IP).
Yongseok Koh [Sat, 3 Nov 2018 17:10:33 +0000 (17:10 +0000)]
net/mlx5: limit priority range for Linux TC flower driver
Due to a limitation on driver/FW, priority ranges from 1 to 16 in kernel.
Priority in rte_flow attribute starts from 0 and is added by 1 in
translation. This is subject to be changed to determine the max priority
based on trial-and-error like Verbs driver once the restriction is lifted
or the range is extended.
Yongseok Koh [Thu, 1 Nov 2018 17:20:32 +0000 (17:20 +0000)]
net/mlx5: make vectorized Tx threshold configurable
Add txqs_max_vec parameter to configure the maximum number of Tx queues to
enable vectorized Tx. And its default value is set according to the
architecture and device type.
Yongseok Koh [Thu, 1 Nov 2018 17:20:31 +0000 (17:20 +0000)]
net/mlx5: move device spawn configuration to probing
When a device is spawned, it does make more sense that the configuration
parameters are passed by callee. Furthermore, setting default value for
some configuration would need PCIe device ID which can be found in the
probe function.
The last part of patchset contains the rule cleanup routines.
These ones is the part of outer interface initialization at
the moment of VXLAN VTEP attaching. These routines query
the list of attached VXLAN devices, the list of local IP
addresses with peer and link scope attribute and the list
of permanent neigh rules, then all found abovementioned
items on the specified outer device are flushed.
VXLAN encap rules are applied to the VF ingress traffic and have the
VTEP as actual redirection destinations instead of outer PF.
The encapsulation rule should provide:
- redirection action VF->PF
- VF port ID
- some inner network parameters (MACs/IP)
- the tunnel outer source IP (v4/v6)
- the tunnel outer destination IP (v4/v6). Current
- VNI - Virtual Network Identifier
There is no direct way found to provide kernel with all required
encapsulatioh header parameters. The encapsulation VTEP is created
attached to the outer interface and assumed as default path for
egress encapsulated traffic. The outer tunnel IP address are
assigned to interface using Netlink, the implicit route is
created like this:
ip addr add <src_ip> peer <dst_ip> dev <outer> scope link
Peer address provides implicit route, and scode link reduces
the risk of conflicts. At initialization time all local scope
link addresses are flushed from device (see next part of patchset).
The destination MAC address is provided via permenent neigh rule:
ip neigh add dev <outer> lladdr <dst_mac> to <dst_ip> nud permanent
At initialization time all neigh rules of this type are flushed
from device (see the next part of patchset).
VXLAN interfaces are dynamically created for each local UDP port
of outer networks and then used as targets for TC "flower" filters
in order to perform encapsulation. These VXLAN interfaces are
system-wide, the only one device with given UDP port can exist
in the system (the attempt of creating another device with the
same UDP local port returns EEXIST), so PMD should support the
shared device instances database for PMD instances. These VXLAN
implicitly created devices are called VTEPs (Virtual Tunnel
End Points).
Creation of the VTEP occurs at the moment of rule applying. The
link is set up, root ingress qdisc is also initialized.
Encapsulation VTEPs are created on per port basis, the single
VTEP is attached to the outer interface and is shared for all
encapsulation rules on this interface. The source UDP port is
automatically selected in range 30000-60000.
For decapsulaton one VTEP is created per every unique UDP
local port to accept tunnel traffic. The name of created
VTEP consists of prefix "vmlx_" and the number of UDP port in
decimal digits without leading zeros (vmlx_4789). The VTEP
can be preliminary created in the system before the launching
application, it allows to share UDP ports between primary
and secondary processes.
This part of patchset updates Netlink exchange routine. Message
sequence numbers became not random ones, the multipart reply messages
are supported, not propagating errors to the following socket calls,
Netlink replies buffer size is increased to MNL_SOCKET_BUFFER_SIZE
and now is preallocated at context creation time instead of stack
usage. This update is needed to support Netlink query operations.
This part of patchset adds support of VXLAN-related items and
actions to the flow translation routine. Later some tunnel types,
other than VXLAN can be addedd (GRE). No VTEP devices are created at
this point, the flow rule is just translated, not applied yet.
The e-switch Flow prepare function is updated to support VXLAN
encapsulation/and decapsulation actions. The function calculates
buffer size for Netlink message and Flow description structures,
including optional ones for tunneling purposes.
net/mlx5: add E-Switch VXLAN to validation routine
This patch adds VXLAN support for flow item/action lists validation.
The following entities are now supported:
- RTE_FLOW_ITEM_TYPE_VXLAN, contains the tunnel VNI
- RTE_FLOW_ACTION_TYPE_VXLAN_DECAP, if this action is specified
the items in the flow items list treated as outer network
parameters for tunnel outer header match. The ethernet layer
addresses always are treated as inner ones.
- RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP, contains the item list to
build the encapsulation header. In current implementation the
values is the subject for some constraints:
- outer source MAC address will be always unconditionally
set to the one of MAC addresses of outer egress interface
- no way to specify source UDP port
- all abovementioned parameters are ignored if specified
in the rule, warning messages are sent to the log
Minimal tunneling support is also added. If VXLAN decapsulation
action is specified the ETH item can follow the VXLAN VNI item,
the content of this ETH item is treated as inner MAC addresses
and type. The outer ETH item for VXLAN decapsulation action
is always ignored.
net/mlx5: swap items/actions validations for E-Switch rules
The rule validation function for E-Switch checks item list first,
then action list is checked. This patch swaps the validation order,
now actions are checked first. This is preparation for validation
function update with VXLAN tunnel actions. VXLAN decapsulation
action requires to check the items in special way. We could do
this special check in the single item check pass if the action
flags were gathered before. This is the reason to swap the
item/actions checking loops.