Neil Horman [Mon, 22 Jan 2018 01:48:04 +0000 (20:48 -0500)]
compat: add experimental tag macro
The __rte_experimental macro tags a given exported function as being part of
the EXPERIMENTAL api. Use of this tag will cause any caller of the
function (that isn't removed by dead code elimination) to emit a warning
that the user is making use of an API whos stabilty isn't guaranteed.
It also places the function in the .text.experimental section, which is
used to validate the tag against the corresponding library version map
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Neil Horman [Mon, 22 Jan 2018 01:48:03 +0000 (20:48 -0500)]
buildtools: add script to check experimental API exports
This tools reads the given version map for a directory, and checks to
ensure that, for each symbol listed in the export list, the corresponding
definition is tagged as __rte_experimental, erroring out if its not. In this
way, we can ensure that the EXPERIMENTAL api is kept in sync with the tags
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Bruce Richardson [Thu, 25 Jan 2018 11:12:25 +0000 (11:12 +0000)]
pmdinfogen: allow using stdin and stdout
Rather than having to work off files all the time, allow stdin and stdout
to be used as the source and destination for pmdinfogen. This will allow
other possible usages from scripts, e.g. taking files from ar archive and
building a single .pmd.c file from all the .o files in it.
for f in `ar t librte_pmd_xyz.a` ; do
ar p librte_pmd_xyz.a $f | pmdinfogen - - >> xyz_info.c
done
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>
Harry van Haaren [Mon, 29 Jan 2018 16:37:32 +0000 (16:37 +0000)]
app/procinfo: call EAL cleanup before exit
This patch adds a call to the newly introduced cleanup()
function just before quitting the app.
Adding this function call before quitting from a secondary processes
is important, as otherwise it will leak hugepage memory. For a secondary
process that is run multiple times, this could cause hugepage memory
to become depleted and stop a secondary process from starting.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Vipin Varghese <vipin.varghese@intel.com>
Harry van Haaren [Mon, 29 Jan 2018 16:37:31 +0000 (16:37 +0000)]
app/pdump: call EAL cleanup before exit
This patch adds a call to the newly introduced cleanup()
function just before quitting the pdump app.
Adding this function call before quitting from a secondary processes
is important, as otherwise it will leak hugepage memory. For a secondary
process that is run multiple times, this could cause hugepage memory
to become depleted and stop a secondary process from starting.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Vipin Varghese <vipin.varghese@intel.com>
Harry van Haaren [Mon, 29 Jan 2018 16:37:30 +0000 (16:37 +0000)]
eal: add function to release internal resources
This commit adds a new function rte_eal_cleanup().
The function serves as a hook to allow DPDK to release
internal resources (e.g.: hugepage allocations).
This function allows DPDK to become more like an ordinary
library, where the library context itself can be initialized
and cleaned up by the application.
The rte_exit() and rte_panic() functions must be considered,
particularly if they should call rte_eal_cleanup() to release any
resources or not. This patch adds the cleanup to rte_exit(),
but does not clean up on rte_panic(). The reason to not clean
up on panicing is that the developer may wish to inspect the
exact internal state of EAL and hugepages.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Vipin Varghese <vipin.varghese@intel.com>
Harry van Haaren [Mon, 29 Jan 2018 16:37:29 +0000 (16:37 +0000)]
service: restrict finalize to internal usage
This commit moves the rte_service_finalize() function
to be in the component header, and marks it as @internal.
The function is only called internally by rte_eal_finalize().
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Vipin Varghese <vipin.varghese@intel.com>
Hemant Agrawal [Mon, 29 Jan 2018 08:10:45 +0000 (13:40 +0530)]
mbuf: add pool ops selection functions
This patch add support for various mempool ops config helper APIs.
1.User defined mempool ops
2.Platform detected HW mempool ops (active).
3.Best selection of mempool ops by looking into user defined,
platform registered and compile time configured.
Olivier Matz [Mon, 29 Jan 2018 09:37:07 +0000 (10:37 +0100)]
mbuf: rename Tx VLAN flags
For consistency with the Rx flags, the flags PKT_TX_VLAN_PKT and
PKT_TX_QINQ_PKT are respectively renamed as PKT_TX_VLAN and
PKT_TX_QINQ. The old defines are deprecated but will stay for some time
for compatibility.
eal/x86: use lock-prefixed instructions for SMP barrier
On x86 it is possible to use lock-prefixed instructions to get
the similar effect as mfence.
As pointed by Java guys, on most modern HW that gives a better
performance than using mfence:
https://shipilev.net/blog/2014/on-the-fence-with-dependencies/
That patch adopts that technique for rte_smp_mb() implementation.
On BDW 2.2 mb_autotest on single lcore reports 2X cycle reduction,
i.e. from ~110 to ~55 cycles per operation.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Simple functional test for rte_smp_mb() implementations.
Also when executed on a single lcore could be used as rough
estimation how many cycles particular implementation of rte_smp_mb()
might take.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Jerin Jacob [Sun, 3 Dec 2017 12:37:30 +0000 (18:07 +0530)]
config/thunderx: disable C11 memory model ring
On thunderx and octeontx, ring_perf_autotest and
ring_pmd_perf_autotest test shows better performance
when disabling CONFIG_RTE_RING_USE_C11_MEM_MODEL.
On the other hand, Enabling CONFIG_RTE_RING_USE_C11_MEM_MODEL
shows better performance on thunderx2.
Since thunderx2 is using the default armv8 config,
no particular change is required.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Jia He [Mon, 22 Jan 2018 04:41:28 +0000 (20:41 -0800)]
ring: introduce C11 memory model barrier option
This patch is to support C11 memory model barrier in librte_ring.
There are 2 barrier implementation options in librte_ring (suggested
by Jerin).
1. use rte_smp_rmb
2. use load_acquire/store_release(refer to [1]).
The reason why providing 2 options is the performance benchmark
difference in different arm machines, refer to [2].
CONFIG_RTE_RING_USE_C11_MEM_MODEL is provided, and by default it is "n"
on any architectures and only "y" on arm64 so far.
Jia He [Mon, 22 Jan 2018 04:41:26 +0000 (20:41 -0800)]
eal/arm64: remove the braces in memory barrier macros
for the code as follows:
if (condition)
rte_smp_rmb();
else
rte_smp_wmb();
Without this patch, compiler will report this error:
error: 'else' without a previous 'if'
Yongseok Koh [Thu, 25 Jan 2018 21:02:50 +0000 (13:02 -0800)]
net/mlx5: fix synchronization on polling Rx completions
Polling a new packet is basically sensing the generation bit in a
completion entry. For some processors not having strongly-ordered memory
model, there has to be a memory barrier between reading the generation bit
and other fields of the entry in order to guarantee data is not stale.
Yongseok Koh [Thu, 25 Jan 2018 21:02:43 +0000 (13:02 -0800)]
eal: introduce coherent I/O memory barriers
This commit introduces rte_cio_wmb() and rte_cio_rmb(), in order to
guarantee the ordering of coherent shared memory between the CPU and a DMA
capable device.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Pavan Nikhilesh [Fri, 26 Jan 2018 05:04:49 +0000 (10:34 +0530)]
eal: introduce integer divide through reciprocal
In some use cases of integer division, denominator remains constant and
numerator varies. It is possible to optimize division for such specific
scenarios.
The librte_sched uses rte_reciprocal to optimize division so, moving it to
eal/common would allow other libraries and applications to use it.
Vipin Varghese [Fri, 26 Jan 2018 20:55:36 +0000 (02:25 +0530)]
service: fix memory leak with new function
The rte_service_finalize routine checks if service is initialized
or not. If yes; releases internal memory for services and lcore
states are freed. This routine is to be invoked at end of application
termination.
Fixes: 21698354c832 ("service: introduce service cores concept") Cc: stable@dpdk.org Signed-off-by: Vipin Varghese <vipin.varghese@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Xueming Li [Fri, 19 Jan 2018 18:16:10 +0000 (02:16 +0800)]
cmdline: fix dynamic tokens parsing
When using dynamic tokens, the result buffer contains pointers to some
location inside the result buffer. When the content of the temporary
buffer is copied in the final one, these pointers still point to the
temporary buffer.
This works until the temporary buffer is kept intact, but the next
commit introduces a memset() that breaks this assumption.
This commit keeps the successfully parsed buffers, and ensures that the
pointers point to the valid location, by using temp buffer for following
parsing.
Harry van Haaren [Wed, 24 Jan 2018 17:02:47 +0000 (17:02 +0000)]
service: fix possible mem leak on initialize
This commit ensures that if that if we run out of memory
during the initialization of the service library, that the
first allocated memory is correctly freed instead of leaked.
Fixes: 21698354c832 ("service: introduce service cores concept") Cc: stable@dpdk.org Reported-by: Vipin Varghese <vipin.varghese@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Marko Kovacevic [Wed, 24 Jan 2018 15:07:45 +0000 (15:07 +0000)]
doc: fix build of bbdev test guide
Fix build issue with pdf guides. Some indentations in the bbdev test
application doc were causing build failures. Latex Log message:
doc.log:! LaTeX Error: Too deeply nested.
Fixes: f714a18885a6 ("app/testbbdev: add test application for bbdev") Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com> Tested-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Ophir Munk [Wed, 24 Jan 2018 14:12:13 +0000 (14:12 +0000)]
net/vdev_netvsc: fix build without C11 and pedantic
Remove CFLAGS -std=c11 and -pedantic in order to guarantee
a successful vdev_netvsc compilation on old Linux distributions.
Otherwise old GCC compilers may complain as follows:
cc1: error: unrecognized command line option -std=c11
Ophir Munk [Tue, 23 Jan 2018 21:54:09 +0000 (21:54 +0000)]
net/tap: use local eBPF definitions
eBPF has a graceful approach: it must successfully compile on all Linux
distributions. If a specific kernel cannot support eBPF it will gracefully
refuse the eBPF netlink message sent to it.
The kernel header file linux/bpf.h (if present) on different Linux
distributions may not include all definitions required for TAP
compilation.
In order to guarantee a successful eBPF compilation everywhere all the
required definitions for TAP have been locally added instead of including
file <linux/bpf.h>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Tested-by: Harry van Haaren <harry.van.haaren@intel.com>
Harry van Haaren [Mon, 22 Jan 2018 10:04:10 +0000 (10:04 +0000)]
event/opdl: rework loops to comply with dpdk style
This commit reworks the loop counter variable declarations
to be in line with the DPDK source code.
Fixes: 3c7f3dcfb099 ("event/opdl: add PMD main body and helper function") Fixes: 8ca8e3b48eff ("event/opdl: add event queue config get/set") Fixes: d548ef513cd7 ("event/opdl: add unit tests") Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Liang Ma <liang.j.ma@intel.com>
Jerin Jacob [Fri, 19 Jan 2018 06:38:29 +0000 (12:08 +0530)]
event/sw: fix debug logging config option
align the config option name with config/common_base
Fixes: aaa4a221da26 ("event/sw: add new software-only eventdev driver") Cc: stable@dpdk.org Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Ferruh Yigit [Mon, 22 Jan 2018 00:16:23 +0000 (00:16 +0000)]
ethdev: separate internal structures into own header
rte_ethdev_core.h created. Internal data structures are moved here.
These structures are mostly intended to be used by drivers, but they
need to be in the public header file because of the inline functions
in the ethdev.h header, and those inline functions are preferred to
kept because of the performance concerns.
The accessibility of the data structures are not changed, only logically
grouped to show that they are not intended to be used by applications.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Ferruh Yigit [Mon, 22 Jan 2018 00:16:22 +0000 (00:16 +0000)]
ethdev: separate driver APIs
Create a rte_ethdev_driver.h file and move PMD specific APIs here.
Drivers updated to include this new header file.
There is no update in header content and since ethdev.h included by
ethdev_driver.h, nothing changed from driver point of view, only
logically grouping of APIs. From applications point of view they can't
access to driver specific APIs anymore and they shouldn't.
More PMD specific data structures still remain in ethdev.h because of
inline functions in header use them. Those will be handled separately.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Matan Azrad [Sat, 20 Jan 2018 21:12:24 +0000 (21:12 +0000)]
net/failsafe: fix removed device handling
There is time between the physical removal of the device until
sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and
applications still don't know about the removal and may call sub-device
control operation which should return an error.
In previous code this error is reported to the application contrary to
fail-safe principle that the app should not be aware of device removal.
Add an removal check in each relevant control command error flow and
prevent an error report to application when the sub-device is removed.
Matan Azrad [Sat, 20 Jan 2018 21:12:19 +0000 (21:12 +0000)]
ethdev: add devop to check removal status
There is time between the physical removal of the device until PMDs get
a RMV interrupt. At this time DPDK PMDs and applications still don't
know about the removal.
Current removal detection is achieved only by registration to device RMV
event and the notification comes asynchronously. So, there is no option
to detect a device removal synchronously.
Applications and other DPDK entities may want to check a device removal
synchronously and to take an immediate decision accordingly.
Add new dev op called is_removed to allow DPDK entities to check an
Ethernet device removal status immediately.
Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Ophir Munk [Sat, 20 Jan 2018 21:11:36 +0000 (21:11 +0000)]
net/tap: implement RSS using eBPF
TAP PMD is required to support RSS queue mapping based on rte_flow API. An
example usage for this requirement is failsafe transparent switching from a
PCI device to TAP device while keep redirecting packets to the same RSS
queues on both devices.
TAP RSS implementation is based on eBPF programs sent to Linux kernel
through BPF system calls and using netlink messages to reference the
programs as part of traffic control commands.
TC uses eBPF programs as classifiers and actions.
eBPF classification: packets marked with an RSS queue will be directed
to this queue using TC with "skbedit" action.
BPF classifiers are downloaded to the kernel once on TAP creation for
each TAP Rx queue.
eBPF action: calculate the Toeplitz RSS hash based on L3 addresses and
L4 ports. Mark the packet with the RSS queue according the resulting
RSS hash, then reclassify the packet.
BPF actions are downloaded to the kernel for each new RSS rule.
TAP eBPF requires Linux version 4.9 configured with BPF. TAP PMD will
successfully compile on systems with old or non-BPF configured kernels but
RSS rules creation on TAP devices will not be successful
Ophir Munk [Sat, 20 Jan 2018 21:11:35 +0000 (21:11 +0000)]
net/tap: add eBPF API
This commit include BPF API to be used by TAP.
tap_flow_bpf_cls_q() - download to kernel BPF program that classifies
packets to their matching queues
tap_flow_bpf_calc_l3_l4_hash() - download to kernel BPF program that
calculates per packet layer 3 and layer 4 RSS hash
tap_flow_bpf_rss_map_create() - create BPF RSS map for storing RSS
parameters per RSS rule
tap_flow_bpf_update_rss_elem() - update BPF map entry with RSS rule
parameters
Ophir Munk [Sat, 20 Jan 2018 21:11:34 +0000 (21:11 +0000)]
net/tap: add eBPF bytes code
File tap_bpf_insns.h was added. It includes eBPF bytes code
which corresponds to source file tap_bpf_program.c
(see "net/tap: add eBPF program file").
The bytes code is in the format of C arrays of struct bpf_insn and
was generated from the C file tap_bpf_program.c
1. The C file was compiled via LLVM into an object file in ELF
format as:
clang -O2 -emit-llvm -c tap_bpf_program.c -o - | llc -march=bpf \
-filetype=obj -o <tap_bpf_program.o>
clang version must be 3.7 and above
The C functions are under different ELF sections and are considered
different BPF programs to be downloaded to the kernel
2. Using an external tool the ELF sections are parsed and the C arrays
of struct bpf_insn are generated. Each C array (corresponding to a
different function under an ELF section) is downloaded to the kernel
using an BPF systm call. The external tool that generates the C arrays
will be added in separate commits.
Ophir Munk [Sat, 20 Jan 2018 21:11:33 +0000 (21:11 +0000)]
net/tap: add eBPF program file
File tap_bpf_program.c was added with two ELF sections
corresponding to two BPF programs and one BPF map.
Section cls_q - BPF classifier to classify packets to their
corresponding queue after an RSS hash was calculated on the packet
and saved in skb->cb[1]
Section l3_l4 - BPF action to calculate RSS hash on packet
layers 3 and 4
This file is not part of DPDK tree compilation.
Ophir Munk [Sat, 20 Jan 2018 21:11:32 +0000 (21:11 +0000)]
net/tap: support actions for different classifiers
Add a generic TC actions handling for TC actions: "mirred",
"gact", "skbedit". This will be useful when introducing
BPF actions, as it uses TCA_BPF_ACT instead of TCA_FLOWER_ACT
Yongseok Koh [Fri, 19 Jan 2018 07:52:55 +0000 (23:52 -0800)]
net/mlx5: fix memory region lookup
This patch reverts:
commit 3a6f2eb8c5c5 ("net/mlx5: fix Memory Region registration")
Although granularity of chunks in a mempool is a cacheline, addresses are
extended to align to page boundary for performance reason in device when
registering a MR (Memory Region). This could make some regions overlap,
then can cause Tx completion error due to incorrect LKEY search. If the
error occurs, the Tx queue will get stuck. It is because buffer address is
compared against aligned addresses for Memory Region. Saving original
addresses of mempool for comparison doesn't create any overlap.
Fixes: b0b093845793 ("net/mlx5: use buffer address for LKEY search") Fixes: 3a6f2eb8c5c5 ("net/mlx5: fix Memory Region registration") Cc: stable@dpdk.org Reported-by: Xueming Li <xuemingl@mellanox.com> Signed-off-by: Xueming Li <xuemingl@mellanox.com> Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Beilei Xing [Fri, 19 Jan 2018 07:50:04 +0000 (15:50 +0800)]
net/i40e: fix fail to update packet type table
Fail to update SW ptype mapping table when loading
PPP profile, though profile can be loaded successfully.
It will cause fail to parse SW ptype during receiving
packets. This patch fixes this issue.
Beilei Xing [Fri, 19 Jan 2018 05:23:44 +0000 (13:23 +0800)]
net/i40e: fix flow director Rx resource defect
FDIR Rx ring isn't initialized and Rx queue HW tail isn't updated
when there's error detected during programming FDIR flow. There'll
be some potential risk.
This patch updates FDIR Rx resource.
Fixes: a778a1fa2e4e ("i40e: set up and initialize flow director") Fixes: 05999aab4ca6 ("i40e: add or delete flow director") Cc: stable@dpdk.org Signed-off-by: Beilei Xing <beilei.xing@intel.com> Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Matan Azrad [Thu, 18 Jan 2018 13:51:43 +0000 (13:51 +0000)]
net/vdev_netvsc: implement core functionality
As described in more details in the attached documentation (see patch
contents), this virtual device driver manages NetVSC interfaces in virtual
machines hosted by Hyper-V/Azure platforms.
This driver does not manage traffic nor Ethernet devices directly; it acts
as a thin configuration layer that automatically instantiates and controls
fail-safe PMD instances combining tap and PCI sub-devices, so that each
NetVSC interface is exposed as a single consolidated port to DPDK
applications.
PCI sub-devices being hot-pluggable (e.g. during VM migration),
applications automatically benefit from increased throughput when present
and automatic fallback on NetVSC otherwise without interruption thanks to
fail-safe's hot-plug handling.
Once initialized, the sole job of the vdev_netvsc driver is to regularly
scan for PCI devices to associate with NetVSC interfaces and feed their
addresses to corresponding fail-safe instances.
This patch lays the groundwork for this driver (draft documentation,
copyright notices, code base skeleton and build system hooks). While it can
be successfully compiled and invoked, it's an empty shell at this stage.
Andy Moreton [Fri, 19 Jan 2018 06:47:06 +0000 (06:47 +0000)]
net/sfc/base: fix unused argument warning
The type_data argument to ef10_rx_qcreate is only used
in builds with EFSYS_OPT_RX_PACKED_STREAM. note this as
an unused argument to avoid warnings in builds without
packed stream support.
Fixes: b749646dade4 ("net/sfc/base: add function to create packed stream RxQ") Signed-off-by: Andy Moreton <amoreton@solarflare.com> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Yong Wang [Thu, 18 Jan 2018 11:48:56 +0000 (06:48 -0500)]
net/dpaa: fix potential memory leak
There are several func calls to rte_zmalloc() which don't do null
pointer check on the return value. And before return, the memory is not
freed. Fix it by adding null pointer check and rte_free().
Fixes: 37f9b54bd3cf ("net/dpaa: support Tx and Rx queue setup") Fixes: 62f53995caaf ("net/dpaa: add frame count based tail drop with CGR") Cc: stable@dpdk.org Signed-off-by: Yong Wang <wang.yong19@zte.com.cn> Reviewed-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Yongseok Koh [Wed, 17 Jan 2018 17:44:13 +0000 (09:44 -0800)]
net/mlx5: fix handling link status event
Even though link of a port gets down, device still can receive traffic.
That is the reason why mlx5_set_link_up/down() switches rx/tx_pkt_burst().
However, if link gets down by an external command (e.g. ifconfig), it isn't
effective. It is better to change burst functions when link status change
is detected.
Fixes: 62072098b54e ("mlx5: support setting link up or down") Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Victor Kaplansky [Wed, 17 Jan 2018 13:49:25 +0000 (15:49 +0200)]
vhost: protect active rings from async ring changes
When performing live migration or memory hot-plugging,
the changes to the device and vrings made by message handler
done independently from vring usage by PMD threads.
This causes for example segfaults during live-migration
with MQ enable, but in general virtually any request
sent by qemu changing the state of device can cause
problems.
These patches fixes all above issues by adding a spinlock
to every vring and requiring message handler to start operation
only after ensuring that all PMD threads related to the device
are out of critical section accessing the vring data.
Each vring has its own lock in order to not create contention
between PMD threads of different vrings and to prevent
performance degradation by scaling queue pair number.
See https://bugzilla.redhat.com/show_bug.cgi?id=1450680
Cc: stable@dpdk.org Signed-off-by: Victor Kaplansky <victork@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Junjie Chen [Wed, 17 Jan 2018 15:45:53 +0000 (10:45 -0500)]
vhost: fix mbuf free
dequeue zero copy change buf_addr and buf_iova of mbuf, and return
to mbuf pool without restore them, it breaks vm memory if others allocate
mbuf from same pool since mbuf reset doesn't reset buf_addr and buf_iova.
Fixes: b0a985d1f340 ("vhost: add dequeue zero copy") Cc: stable@dpdk.org Signed-off-by: Junjie Chen <junjie.j.chen@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Xiao Wang [Thu, 18 Jan 2018 02:20:38 +0000 (10:20 +0800)]
net/virtio: support guest announce
When live migration is done, for the backup VM, either the virtio
frontend or the vhost backend needs to send out gratuitous RARP packet
to announce its new network location.
This patch enables VIRTIO_NET_F_GUEST_ANNOUNCE feature to support live
migration scenario where the vhost backend doesn't have the ability to
generate RARP packet.
Brief introduction of the work flow:
1. QEMU finishes live migration, pokes the backup VM with an interrupt.
2. Virtio interrupt handler reads out the interrupt status value, and
realizes it needs to send out RARP packet to announce its location.
3. Pause device to stop worker thread touching the queues.
4. Inject a RARP packet into a Tx Queue.
5. Ack the interrupt via control queue.
6. Resume device to continue packet processing.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Xiao Wang [Thu, 18 Jan 2018 02:32:24 +0000 (10:32 +0800)]
net: fix RARP generation
Due to a mistake operation from me, older version (v10) was merged to
master branch. It's the v11 should be applied. However, the master branch
is not rebase-able. Thus, this patch is made, from the diff between v10
and v11.
The diffs are:
- Add check for parameter and tailroom in rte_net_make_rarp_packet
- Allocate mbuf in rte_net_make_rarp_packet
Besides that, a link error is fixed when shared lib is enabled.
Fixes: 45ae05df824c ("net: add a helper for making RARP packet") Fixes: c3ffdba0e88a ("vhost: use API to make RARP packet") Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Junjie Chen [Mon, 15 Jan 2018 11:32:19 +0000 (06:32 -0500)]
vhost: do deep copy while reallocating queue
When vhost reallocate dev and vq for NUMA enabled case, it doesn't perform
deep copy, which lead to 1) zmbuf list not valid 2) remote memory access.
This patch is to re-initlize the zmbuf list and also do the deep copy.
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com> Reviewed-by: Zhiyong Yang <zhiyong.yang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Tomasz Duszynski [Thu, 18 Jan 2018 10:57:36 +0000 (11:57 +0100)]
net/mrvl: allow changing MTU before port init
DPDK updates MTU once mtu_set() callback returns success.
Since PMD changes port's MTU to dev->mtu every time device is
started it is safe to call mtu_set() before MUSDK ppio was initialized.
Fixes: c0511a8f741f ("net/mrvl: check if ppio is initialized") Cc: stable@dpdk.org Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>
Ivan Malov [Thu, 18 Jan 2018 09:44:31 +0000 (09:44 +0000)]
net/sfc: convert to new Tx offload API
Ethdev Tx offloads API has changed since:
commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")
This commit support the new Tx offloads API.
The code which fills in txq_flags in default_txconf is preserved
because rte_eth_dev_info_get() lacks conversion between offloads
and txq_flags fields which means that a legacy application which
relies on default_txconf will fail to configure Tx queues in the
case when some bits in txq_flags are mandatory.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Ivan Malov [Thu, 18 Jan 2018 09:44:30 +0000 (09:44 +0000)]
net/sfc: factor out function to report Tx capabilities
The patch adds a separate function to report supported
Tx capabilities because this function will be required
in more places across the code in the upcoming patches.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Ivan Malov [Thu, 18 Jan 2018 09:44:28 +0000 (09:44 +0000)]
net/sfc: factor out function to report Rx capabilities
The patch adds a separate function to report supported
Rx capabilities because this function will be required
in more places across the code in the upcoming patches.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>