Maxime Coquelin [Tue, 19 Mar 2019 10:54:17 +0000 (11:54 +0100)]
vhost: support requests only handled by external backend
External backends may have specific requests to handle, and so
we don't want the vhost-user lib to handle these requests as
errors.
This patch also changes the experimental API by introducing
RTE_VHOST_MSG_RESULT_NOT_HANDLED so that vhost-user lib
can report an error if a message is handled neither by
the vhost-user library nor by the external backend.
The logic changes a bit so that if the callback returns
with ERR, OK or REPLY, it is considered the message
is handled by the external backend so it won't be
handled by the vhost-user library.
It is still possible for an external backend to listen
to requests that have to be handled by the vhost-user
library like SET_MEM_TABLE, but the callback have to
return NOT_HANDLED in that case.
Vhost-crypto backend is also adapted to this API change.
Tiwei Bie [Tue, 19 Mar 2019 06:43:05 +0000 (14:43 +0800)]
net/virtio: add barrier in interrupt enable
Typically, after enabling Rx interrupt, a check should be done
to make sure that there is no new incoming packets before going
to sleep. So a barrier is needed to make sure that any following
check won't happen before the interrupt is actually enabled.
Jiayu Hu [Sun, 17 Mar 2019 06:38:32 +0000 (14:38 +0800)]
vhost: fix interrupt suppression for the split ring
The VIRTIO_RING_F_EVENT_IDX feature of split ring might
be broken, as the value of signalled_used is invalid
after live migration, start up and virtio driver reload.
This patch fixes it by using signalled_used_valid.
In addition, this patch makes the VIRTIO_RING_F_EVENT_IDX
implementation of split ring match kernel backend to suppress
more interrupts.
Fixes: e37ff954405a ("vhost: support virtqueue interrupt/notification suppression") Cc: stable@dpdk.org Signed-off-by: Jiayu Hu <jiayu.hu@intel.com> Tested-by: Yinan Wang <yinan.wang@intel.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Tiwei Bie [Tue, 12 Mar 2019 07:13:07 +0000 (15:13 +0800)]
net/virtio-user: fix multiqueue with vhost kernel
The multiqueue support in virtio-user with vhost kernel backend
is broken when tap name isn't specified by users explicitly,
because the tap name returned by ioctl(TUNSETIFF) isn't saved
properly, and multiple tap interfaces will be created in this
case. Fix this by saving the dynamically allocated tap name
first before reusing the ifr structure. Besides, also make it
possible to support the format string in tap name (e.g. foo%d)
specified by users explicitly.
Fixes: 791b43e08842 ("net/virtio-user: specify MAC of the tap") Cc: stable@dpdk.org Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Maxime Coquelin [Thu, 28 Feb 2019 17:57:04 +0000 (18:57 +0100)]
vhost: prevent disabled rings to be processed with zero-copy
The vhost-user spec says that once the vring is disabled, the
client has to stop processing it. But it can happen when
dequeue zero-copy is enabled if outstanding descriptors buffers
are still being processed by an external NIC or another guest.
The fix consists in draining the zmbufs list to ensure no more
descriptors buffers are in the wild.
Note that this fix is only working in the case REPLY_ACK
protocol feature is enabled, which is not the case by default
for now (it is only enabled when IOMMU feature is enabled in
the vhost library).
Fixes: b0a985d1f340 ("vhost: add dequeue zero copy") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Hyong Youb Kim [Thu, 14 Mar 2019 11:05:32 +0000 (04:05 -0700)]
net/enic: fix max MTU calculation
The maximum packet length (max_pkt_len) from the firmware does not
include CRC, so do not subtract 4 when deriving the max MTU. This
change effectively increases the max MTU by 4B. Apps often assume max
MTU = max_rx_pkt_len - 14 (ethernet header), and attempt to set the
MTU to that value (i.e. set MTU to max HW value). This change
incidentally allows such apps to change MTU to max value successfully.
Fixes: bb34ffb848a0 ("net/enic: determine max egress packet size and max MTU") Cc: stable@dpdk.org Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
Thomas Monjalon [Wed, 13 Mar 2019 10:09:09 +0000 (11:09 +0100)]
examples/ethtool: remove query of default config
The default config is used if the setup parameter is NULL.
No need to query the default config with rte_eth_dev_info_get().
The function call will be removed with another useless info.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Rami Rosen <ramirose@gmail.com>
Igor Russkikh [Tue, 12 Mar 2019 15:25:05 +0000 (15:25 +0000)]
net/atlantic: eliminate excessive log levels on Rx/Tx
Default rxtx logging used ERR level, that caused logger to always
trigger. That may cause perf degradation even if logger was not enabled
but compiled in.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Igor Russkikh [Tue, 12 Mar 2019 15:25:03 +0000 (15:25 +0000)]
net/atlantic: fix link configuration
In case link speed is re configured after port start, it does not
takes the requested speed value, but instead just sets full autoneg
mask.
Fixes: 7943ba05f67c ("net/atlantic: add link status and interrupt management") Cc: stable@dpdk.org Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Pavel Belous [Tue, 12 Mar 2019 15:24:59 +0000 (15:24 +0000)]
net/atlantic: use EEPROM magic as a device address
Default dev addr is replaced with magic field from the request.
Length is allowed to be less than maximum.
SMBUS access bit definitions also better organised now.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Pavel Belous [Tue, 12 Mar 2019 15:24:57 +0000 (15:24 +0000)]
net/atlantic: fix buffer overflow
Found by Coverity scan. This is a real memory corruption.
There is no need in extra RTE_ALIGN macros since the
request/result structures are 4-byte aligned by definition.
Coverity issue: 323518, 323520 Fixes: ce4e8d418097 ("net/atlantic: implement EEPROM get/set") Cc: stable@dpdk.org Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Shahed Shaikh [Tue, 12 Mar 2019 16:51:14 +0000 (09:51 -0700)]
net/qede: fix Rx packet drop
There is a corner case in which driver won't post
receive buffers when driver has processed all received packets
in single loop (i.e. hw_consumer == sw_consumer) and then
HW will start dropping packets since it did not see new receive
buffers posted.
This corner case is seen when size of Rx ring is less than or equals
Rx packet burst count for dev->rx_pkt_burst().
Dekel Peled [Sun, 17 Mar 2019 06:23:03 +0000 (08:23 +0200)]
net/mlx5: support new representor naming format
Kernel update [1] introduce new format of representors names.
This patch implements RFC [2], updating MLX5 PMD to support the new
format, while maintaining support of the existing format.
Ruifeng Wang [Tue, 12 Mar 2019 05:35:27 +0000 (13:35 +0800)]
app/testpmd: optimize MAC swap for Arm
Improved MAC swap performance for ARM platform.
The improvement was achieved by using neon intrinsics
to save CPU cycles and doing swap for four packets
at a time.
The optimization had 15% - 20% throughput boost
in testpmd MAC swap mode.
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
The driver multiple rxq allocation logs a message at error level
but it really is a debug message.
Fixes: 51fafb89a9a0 ("net/bnxt: get rid of ff pools and use VNIC info array") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
When using bnxt on bare-metal with vfio-pci, the driver logs an
unnecessary warning. Hardware works fine, message is not urgent.
Change it to INFO level.
Fixes: 62196f4e0941 ("mem: rename address mapping function to IOVA") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Rami Rosen <ramirose@gmail.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Shahaf Shuler [Sun, 10 Mar 2019 08:14:10 +0000 (10:14 +0200)]
net/mlx5: fix packet inline on Tx queue wraparound
Inlining a packet to WQE that cross the WQ wraparound, i.e. the WQE
starts on the end of the ring and ends on the beginning, is not
supported and blocked by the data path logic.
However, in case of TSO, an extra inline header is required before
inlining. This inline header is not taken into account when checking if
there is enough room left for the required inline size.
On some corner cases were
(ring_tailroom - inline header) < inline size < ring_tailroom ,
this can lead to WQE being written outsize of the ring buffer.
Fixing it by always assuming the worse case that inline of packet will
require the inline header.
Kevin Traynor [Fri, 8 Mar 2019 09:28:55 +0000 (09:28 +0000)]
net/qede: support IOVA VA mode
Set RTE_PCI_DRV_IOVA_AS_VA in drv_flags. This allows initializing qede
PMD as non-root also on Linux v4.x, where /proc/self/pagemap can't be
acccessed without CAP_SYS_ADMIN privileges.
The flag was introduced generically but not in pmds in:
commit 815c7deaed2d ("pci: get IOMMU class on Linux")
The set_port_owner was copying a string between structures of the
same type, therefore the name could never be truncated (unless source
string was not null terminated). Use strlcpy which does it better.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Qi Zhang [Mon, 11 Mar 2019 07:42:20 +0000 (15:42 +0800)]
net/i40e: fix time sync for 25G
Time sync increment value is not configured for 25G device.
The patch fix this issue by setting the same value as 40G, this
aligned with kernel driver's behaviour.
Fixes: 75d133dd3296 ("net/i40e: enable 25G device") Cc: stable@dpdk.org Reported-by: Michael Luo <michael.luo@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Tested-by: Michael Luo <michael.luo@intel.com>
Pablo Cascón [Fri, 8 Mar 2019 15:40:47 +0000 (15:40 +0000)]
net/nfp: fix setting MAC address
Some firmwares, mostly for VFs, do not advertise the feature /
capability of changing the MAC address while the interface is up. With
such firmware a request to change the MAC address that at the same
time also tries to enable the not available feature will be denied by
the firmware resulting in an error message like:
nfp_net_reconfig(): Error nfp_net reconfig for ctrl: 80000000 update: 800
Fix set_mac_addr by not trying to enable a feature if it is not
advertised by the firmware.
Fixes: 2fe669f4bcd2 ("net/nfp: support MAC address change") Cc: stable@dpdk.org Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com> Acked-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Kevin Traynor [Tue, 5 Mar 2019 16:30:39 +0000 (16:30 +0000)]
net/i40e: update queue number check for rounding
Since rounding up the requested queue pairs to allow the VF to
request a non-aligned number was added, it may happen that the
requested number is less than the available num of queues but the
rounded up number is greater. In this case, it is not caught with
the usual checks but later when there is a reset and failed setup.
By rounding earlier the checks can be done before a failed reset
occurs, and a rounded max amount of available queues can be returned
to the VF.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Tomasz Jozwiak [Fri, 1 Mar 2019 08:46:16 +0000 (09:46 +0100)]
malloc: add NUMA-aware realloc function
Currently, rte_realloc will not respect original allocation's
NUMA node when memory cannot be resized, and there is no
NUMA-aware equivalent of rte_realloc. This patch adds such a function.
The new API will ensure that reallocated memory stays on
requested NUMA node, as well as allow moving allocated memory
to a different NUMA node.
Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Pavan Nikhilesh [Mon, 11 Mar 2019 06:49:28 +0000 (06:49 +0000)]
app/eventdev: configure optimum timers per adapter
Previously, the total number of event timers per adapter was set to an
arbitrary value, set it to mempool size instead as it defines the max
event timers that can be armed.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Pavan Nikhilesh [Fri, 1 Mar 2019 07:16:42 +0000 (07:16 +0000)]
examples/eventdev: probe max events
Some eventdevs support configuring max events to be -1 (open system).
Check eventdev and event port configuration with eventdev info before
configuring them.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Bruce Richardson [Mon, 11 Mar 2019 10:57:32 +0000 (10:57 +0000)]
git: ignore build directories
test-meson-build.sh generates multiple build directories for various
targets. As these follow a known pattern, and since they don't need
to be tracked in git, we can add them to the gitignore file,
along with the default build directory "build".
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Rami Rosen <ramirose@gmail.com>
Bruce Richardson [Mon, 11 Mar 2019 10:57:31 +0000 (10:57 +0000)]
git: ignore hidden files
Generally hidden files are hidden for good reason and we don't want to
track them in git. They can always be manually added to git tracking
individually if needed.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Bruce Richardson [Tue, 12 Mar 2019 10:18:28 +0000 (10:18 +0000)]
devtools: fix meson build test to exit on failure
When piping the ninja command through cat, we lose the error value from
the call to ninja in the case of failure. This prevents the script from
exiting at the first broken build. Fix this by setting the "pipefail"
shell option.
Fixes: 4bcb9b768604 ("devtools: add verbose option to meson build test") Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Rather than using linuxapp and bsdapp everywhere, we can change things to
use the, more readable, terms "linux" and "freebsd" in our build configs.
Rather than renaming the configs we can just duplicate the existing ones
with the new names using symlinks, and use the new names exclusively
internally. ["make showconfigs" also only shows the new names to keep the
list short] The result is that backward compatibility is kept fully but any
new builds or development can be done using the newer names, i.e. both
"make config T=x86_64-native-linuxapp-gcc" and "T=x86_64-native-linux-gcc"
work.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Arek Kusztal [Thu, 7 Feb 2019 10:54:39 +0000 (11:54 +0100)]
crypto/openssl: fix big numbers after computations
After performing mod exp and mod inv big numbers (BIGNUM) should
be cleared as data already is copied into op fields and this BNs would
very likely contain private information for unspecified amount of time
(duration of the session).
Fiona Trahe [Thu, 7 Feb 2019 18:46:27 +0000 (18:46 +0000)]
doc: fix table of kernel drivers in qat guide
Added missing line informing which kernel driver can
be used for device DH895xcc for compression service.
Moved service columns to start of table for better visibility
and to prepare for future asymmetric crypto service.
Fixes: e2e35849ea78 ("compress/qat: add compression on DH895x") Cc: stable@dpdk.org Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>
Wei Zhao [Fri, 8 Mar 2019 02:46:17 +0000 (10:46 +0800)]
net/ixgbe: support VF promiscuous by PF driver
The patch adds the PF counterpart changes to support VF promiscuous
mode by DPDK PF driver.
For ixgbe, in order to support VF VLAN promiscuous or unicast
promiscuous, PF need to set register PFVML2FLT of bit UPE and VPE.
The patch aligned to kernel driver's implementation.
Dekel Peled [Thu, 28 Feb 2019 15:20:30 +0000 (17:20 +0200)]
net/mlx5: fix sync when handling Tx completions
Function mlx5_tx_complete() reads completion entry information
from Tx queue.
For some processors not having strongly-ordered memory model,
there has to be a memory barrier between reading the entry index
and the entry fields, in order to guarantee data is valid.
Fixes: 54d3fe948dba ("net/mlx5: poll completion queue once per a call") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Dekel Peled [Thu, 28 Feb 2019 15:18:45 +0000 (17:18 +0200)]
net/mlx5: fix hex dump of error completion
struct mlx5_cqe is defined in MLX5 PMD code (mlx5_prm.h).
It includes 64 bytes padding in case of (RTE_CACHE_LINE_SIZE == 128).
struct mlx5_err_cqe is defined in kernel, and doesn't include padding.
When running in debug mode, in case an error CQE is detected
it is printed using rte_hexdump().
The size of data to print should be sizeof(*cqe) instead of
sizeof(*err_cqe), to handle the case of (RTE_CACHE_LINE_SIZE == 128),
and print the full data in any case.
Dekel Peled [Sun, 24 Feb 2019 09:41:09 +0000 (11:41 +0200)]
net/mlx4: fix default flow rule create
Original patch changed logic of function mlx4_flow_merge_eth().
The setting of flow->promisc was wrongly removed.
This patch adds the removed setting of flow->promisc, to restore
the required behavior.
Fixes: c0d239263156 ("net/mlx4: support flow w/o ETH spec and with VLAN") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>
The mlx5 PMD probes the Verbs flow priorities supported with
ibv_create_flow() function. If rdma-core or kernel fails for
some reason, the returned error causes the drop queue is not
destroyed, and pd is locked by not freed resource.
Also the mlx5_flow_discover_priorities() returned negative value
as error, and this code was reported "as is", without sign
changing (eventually causing assert(err > 0)).
Yunjian Wang [Wed, 13 Feb 2019 02:48:52 +0000 (10:48 +0800)]
net/ixgbe: fix crash on remove
The NIC's interrupt source has some active handler when the
port removed. We should cancel the delay handler before removing
dev to prevent executing the delay handler.
Call Trace:
#0 ixgbe_disable_intr (hw=0x0, hw=0x0)
at /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:852
#1 ixgbe_dev_interrupt_delayed_handler (param=0xadb9c0
<rte_eth_devices@@DPDK_2.2+33024>)
at /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:4386
#2 0x00007f05782147af in eal_alarm_callback (arg=<optimized out>)
at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
eal_alarm.c:90
#3 0x00007f057821320a in eal_intr_process_interrupts (nfds=1,
events=0x7f056cbf3e88) at /usr/src/debug/dpdk-18.11/lib/
librte_eal/linuxapp/eal/eal_interrupts.c:838
#4 eal_intr_handle_interrupts (totalfds=<optimized out>, pfd=18)
at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
eal_interrupts.c:885
#5 eal_intr_thread_main (arg=<optimized out>)
at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
eal_interrupts.c:965
#6 0x00007f05708a0e45 in start_thread () from /usr/lib64/libpthread.so.0
#7 0x00007f056eb4ab5d in clone () from /usr/lib64/libc.so.6
Fixes: 2866c5f1b87e ("ixgbe: support port hotplug") Cc: stable@dpdk.org Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Rami Rosen [Fri, 1 Mar 2019 11:10:10 +0000 (13:10 +0200)]
doc: fix tag for inner RSS feature
This patch fixes a wrong tag in guides/nics/features.rst.
The features tags should be, according to the
"Features Overview" section in this doc, one of the following:
"uses", "implements", "provides", or "related".
Hence in Inner RSS section, it should be "uses"
instead of "users".
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:51 +0000 (02:42 -0800)]
net/enic: fix inner packet matching
Inner packet matching is currently buggy in many cases.
1. Mishandling null spec ("match any").
The copy_item functions do nothing if spec is null. This is incorrect,
as all patterns should be appended to the L5 pattern buffer even for
null spec (treated as all zeros).
2. Accessing null spec causing segfault.
3. Not setting protocol fields.
The NIC filter API currently has no flags for "match inner IPv4, IPv6,
UDP, TCP, and so on". So, the driver needs to explicitly set EtherType
and IP protocol fields in the L5 pattern buffer to avoid false
positives (e.g. reporting IPv6 as IPv4).
Instead of keep adding "if inner, do something differently" cases to
the existing copy_item functions, introduce separate functions for
inner packet patterns and address the above issues in those
functions. The changes to the previous outer-packet copy_item
functions are mechanical, due to reduced indentation.
Fixes: 6ced137607d0 ("net/enic: flow API for NICs with advanced filters enabled") Cc: stable@dpdk.org Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:50 +0000 (02:42 -0800)]
net/enic: fix endianness in VLAN match
The VLAN fields in the NIC filter use little endian. The VLAN item is
in big endian, so swap bytes.
Fixes: 6ced137607d0 ("net/enic: flow API for NICs with advanced filters enabled") Cc: stable@dpdk.org Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:49 +0000 (02:42 -0800)]
net/enic: fix VXLAN match
The filter API does not have flags for "match VXLAN". Explicitly set
the UDP destination port and mask in the L4 pattern. Otherwise, UDP
packets with non-VXLAN ports may be falsely reported as VXLAN.
1400 series VIC adapters have hardware VXLAN parsing. The L5 buffer on
the NIC starts with the inner Ethernet header, and the VXLAN header is
now in the L4 buffer following the UDP header. So the VXLAN spec/mask
needs to be in the L4 pattern, not L5. Older models still expect the
VXLAN spec/mask in the L5 pattern. Fix up the L4/L5 patterns
accordingly.
Fixes: 6ced137607d0 ("net/enic: flow API for NICs with advanced filters enabled") Cc: stable@dpdk.org Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:48 +0000 (02:42 -0800)]
net/enic: reset VXLAN port regardless of overlay offload
Currently, the driver resets the vxlan port register only if overlay
offload is enabled. But, the register is actually tied to hardware
vxlan parsing, which is an independent feature and is always enabled
even if overlay offload is disabled. If left uninitialized, it can
affect flow rules that match vxlan. So always reset the port number
when HW vxlan parsing is available.
Fixes: 8a4efd17410c ("net/enic: add handlers to add/delete vxlan port number") Cc: stable@dpdk.org Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:47 +0000 (02:42 -0800)]
net/enic: enable limited support for raw flow item
Some apps like VPP use a raw item to match UDP tunnel headers like
VXLAN or GENEVE. The NIC hardware supports such usage via L5 match,
which does pattern match on packet data immediately following the
outer L4 header. Accept raw items for these limited use cases.
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:46 +0000 (02:42 -0800)]
net/enic: move arguments into struct
There are many copy_item functions, all with the same arguments, which
makes it difficult to add/change arguments. Move the arguments into a
struct to help subsequent commits that will add/fix features. Also
remove self-explanatory verbose comments for these local functions.
These changes are purely mechanical and have no impact on
functionalities.
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:45 +0000 (02:42 -0800)]
net/enic: enable limited passthru flow action
Some apps like VPP use PASSTHRU+MARK flow rules to offload packet
matching to the NIC. Just like MARK+RSS used by OVS-DPDK and others,
PASSTHRU+MARK is used to "mark and then receive normally". Recent VIC
adapters support such flow rules, so enable PASSTHRU for this limited
use case.
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:44 +0000 (02:42 -0800)]
net/enic: enable limited RSS flow action
Some apps like OVS-DPDK use MARK+RSS flow rules in order to offload
packet matching to the NIC. The RSS action in such flow rules simply
indicates "receive packet normally", not trying to override the port
wide RSS. The action is included in the flow rules simply to terminate
them, as MARK is not a fate-deciding action. And, the RSS action has a
most basic config: default hash, level, types, null key, and identity
queue mapping.
Recent VIC adapters can support these "mark and receive" flow
rules. So, enable support for RSS action for this limited use case.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:43 +0000 (02:42 -0800)]
net/enic: check for unsupported flow item types
Currently a pattern with an unsupported item type causes segfault,
because the flow handler is using the type as an array index without
checking bounds. Add an explicit check for unsupported item types and
avoid out-of-bound accesses.
Fixes: 6ced137607d0 ("net/enic: flow API for NICs with advanced filters enabled") Cc: stable@dpdk.org Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:42 +0000 (02:42 -0800)]
net/enic: allow flow mark ID 0
The driver currently accepts mark ID 0 but does not report it in
matching packet's mbuf. For example, the following testpmd command
succeeds. But, the mbuf of a matching IPv4 UDP packet does not have
PKT_RX_FDIR_ID set.
flow create 0 ingress pattern ... actions mark id 0 / queue index 0 / end
The problem has to do with mapping mark IDs (32-bit) to NIC filter
IDs. Filter ID is currently 16-bit, so values greater than 0xffff are
rejected. The firmware reserves filter ID 0 for filters that do not
mark (e.g. steer w/o mark). And, the driver reserves 0xffff for the
flag action. This leaves 1...0xfffe for app use.
It is possible to simply reject mark ID 0 as unsupported. But, 0 is
commonly used (e.g. OVS-DPDK and VPP). So, when adding a filter, set
filter ID = mark ID + 1 to support mark ID 0. The receive handler
subtracts 1 from filter ID to get back the original mark ID.
Fixes: dfbd6a9cb504 ("net/enic: extend flow director support for 1300 series") Cc: stable@dpdk.org Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:41 +0000 (02:42 -0800)]
net/enic: fix SCTP match for flow API
The driver needs to explicitly set the protocol number (132) in the IP
header pattern, as the current firmware filter API lacks "match SCTP
packet" flag. Otherwise, the resulting NIC filter may lead to false
positives (i.e. NIC reporting non-SCTP packets as SCTP packets). The
flow director handler does the same (enic_clsf.c).
Fixes: 6ced137607d0 ("net/enic: flow API for NICs with advanced filters enabled") Cc: stable@dpdk.org Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:40 +0000 (02:42 -0800)]
net/enic: fix flow director SCTP matching
The firmware filter API does not have flags indicating "match SCTP
packet". Instead, the driver needs to explicitly add an IP match and
set the protocol number (132 for SCTP) in the IP header.
The existing code (copy_fltr_v2) has two bugs.
1. It sets the protocol number (132) in the match value, but not the
mask. The mask remains 0, so the match becomes a wildcard match. The
NIC ends up matching all protocol numbers (i.e. thinks non-SCTP
packets are SCTP).
2. It modifies the input argument (rte_eth_fdir_input). The driver
tracks filters using rte_hash_{add,del}_key(input). So, addding
(RTE_ETH_FILTER_ADD) and deleting (RTE_ETH_FILTER_DELETE) must use the
same input argument for the same filter. But, overwriting the protocol
number while adding the filter breaks this assumption, and causes
delete operation to fail.
So, set the mask as well as protocol value. Do not modify the input
argument, and use const in function signatures to make the intention
clear. Also move a couple function declarations to enic_clsf.c from
enic.h as they are strictly local.
Fixes: dfbd6a9cb504 ("net/enic: extend flow director support for 1300 series") Cc: stable@dpdk.org Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
Hyong Youb Kim [Sat, 2 Mar 2019 10:42:39 +0000 (02:42 -0800)]
net/enic: remove unused functions
Remove unused functions. Specifically, vnic_set_rss_key() is
obsolete. enic_{add,del}_vlan() has never been supported in the
firmware. And, remove vnic_rss.c altogether as it becomes empty. These
were discovered by cppcheck.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
David Marchand [Wed, 13 Feb 2019 20:06:59 +0000 (21:06 +0100)]
eal: fix core list validation with disabled cores
-l and -c options are two ways to select the cores used by DPDK.
Their format differs, but the checks on the selected cores are the same.
Use an intermediate array to separate the specific parsing checks from
the common consistency checks.
The parsing functions now concentrate on validating the passed string
and do nothing more.
We can report all invalid core indexes rather than only the first error.
In the error log message, reporting [0, cfg->lcore_count - 1] as a valid
range is then wrong when the core list is not continuous.
Example on my 8 cpus laptop with core 2 and 6 disabled.
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu6/online
Before:
./master/app/testpmd -l 0-7 --no-huge -m 512 -- --total-num-mbufs 2048
EAL: Detected 6 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: invalid core list, please check core numbers are in [0, 5] range
...
After:
./master/app/testpmd -l 0-7 --no-huge -m 512 -- --total-num-mbufs 2048
EAL: Detected 6 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: lcore 2 unavailable
EAL: lcore 6 unavailable
EAL: invalid core list, please check specified cores are part of 0-1,3-5,7
...
Fixes: d888cb8b9613 ("eal: add core list input format") Fixes: b38693b612b4 ("eal: fix core number validation") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com>
David Marchand [Tue, 19 Feb 2019 20:41:11 +0000 (21:41 +0100)]
eal: restrict control threads to startup CPU affinity
Spawning the ctrl threads on anything that is not part of the eal
coremask is not that polite to the rest of the system, especially
when you took good care to pin your processes on cpu resources with
tools like taskset (linux) / cpuset (freebsd).
Rather than introduce yet another eal options to control on which cpu
those ctrl threads are created, let's take the startup cpu affinity
as a reference and remove the eal coremask from it.
If no cpu is left, then we default to the master core.
The cpuset is computed once at init before the original cpu affinity
is lost.
Introduced a RTE_CPU_AND macro to abstract the differences between linux
and freebsd respective macros.