Rafal Kozik [Fri, 14 Dec 2018 13:18:43 +0000 (14:18 +0100)]
net/ena: add new way of getting Rx drops
The Rx drops cannot be acquired using the older API. Now, it must be
read in keep alive message.
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Solganik Alexander [Fri, 14 Dec 2018 13:18:42 +0000 (14:18 +0100)]
net/ena: expose extended stats
ENA PMD is having it's own custom statistics counters. They are exposed
to the application by using the xstats DPDK API.
The deprecated and unused statistics are removed, together with old API.
Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com>
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Michal Krawczyk [Fri, 14 Dec 2018 13:18:41 +0000 (14:18 +0100)]
net/ena: add per-queue software counters stats
Those counters provide information regards sent/received bytes and
packets per queue.
Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com>
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Mon, 17 Dec 2018 11:06:18 +0000 (12:06 +0100)]
net/ena: fix cleanup for out of order packets
When wrong req_id is detected some previous mbufs could be used for
receiving different segments of received packets. In such cases chained
mbufs will be twice returned to pool.
To prevent it chained mbuf is now freed just after error detection.
To simplify cleaning, pointers taken for Rx ring are set to NULL.
As after ena_rx_queue_release_bufs and ena_tx_queue_release_bufs queues
are not used updating of next_to_clean pointer is not necessary.
Fixes:
c2034976673d ("net/ena: add Rx out of order completion")
Cc: stable@dpdk.org
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:39 +0000 (14:18 +0100)]
net/ena: fix invalid reference to variable in union
Use empty_rx_reqs instead of empty_tx_reqs.
As those two variables are part of union this not cause
any failure, but for consistency should be changed.
Fixes:
c2034976673d ("net/ena: add Rx out of order completion")
Cc: stable@dpdk.org
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:38 +0000 (14:18 +0100)]
net/ena: add supported RSS offloads types
The PMD was not passing RSS offloads values although it was supporting
the RSS. To allow application to probe the PMD for RSS support, the
missing information was added.
Fixes:
1173fca25af9 ("ena: add polling-mode driver")
Cc: stable@dpdk.org
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:37 +0000 (14:18 +0100)]
net/ena: adjust new line in log messages
Only PMD_*_LOG is adding new line character to the log message.
All printouts were adjusted for consistency.
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:36 +0000 (14:18 +0100)]
net/ena: do not reconfigure queues on reset
Reset function should return the port to initial state, in which no Tx
and Rx queues are setup. Then application should reconfigure the queues.
According to DPDK documentation the rte_eth_dev_reset() itself is a
generic function which only does some hardware reset operations through
calling dev_unint() and dev_init().
ena_com_dev_reset which perform NIC registers reset should be called
during stop.
Fixes:
2081d5e2e92d ("net/ena: add reset routine")
Cc: stable@dpdk.org
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:35 +0000 (14:18 +0100)]
net/ena: destroy queues if start failed
If start function fails, previously created queues have to be removed.
ena_queue_restart_all() and ena_queue_restart() are renamed to
ena_queue_start_all() and ena_queue_start().
ena_free_io_queues_all() is renamed to ena_queue_stop_all().
Fixes:
df238f84c0a2 ("net/ena: recreate HW IO rings on start and stop")
Cc: stable@dpdk.org
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:34 +0000 (14:18 +0100)]
net/ena: call additional doorbells if needed
Before sending next packet, check if calling doorbell is needed.
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:33 +0000 (14:18 +0100)]
net/ena: increase maximum Rx ring size
Some of ENA devices supports 8k Rx rings. Maximum supported size is
received upon device initialization.
As ENA_DEFAULT_RING_SIZE_RX macro is upper limit, it needs to be
adjusted.
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Michal Krawczyk [Fri, 14 Dec 2018 13:18:32 +0000 (14:18 +0100)]
net/ena: support LLQv2
LLQ (Low Latency Queue) is the feature that allows pushing header
directly to the device through PCI before even DMA is triggered.
It reduces latency, because device can start preparing packet before
payload is sent through DMA.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:31 +0000 (14:18 +0100)]
net/ena: skip packet with wrong request id
When invalid req_id is received, the reset should be handled by the
application, as it is indicating invalid rings state, so further Rx
is not making any sense.
Fixes:
c2034976673d ("net/ena: add Rx out of order completion")
Cc: stable@dpdk.org
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:30 +0000 (14:18 +0100)]
net/ena: add HW queues depth setup
The device now allows driver to reconfigure Tx and Rx queues depth
independently. Moreover, maximum size for Tx and Rx can be different.
Those maximum values are received from the device.
After reset, previous ring configuration is restored.
If number of descriptor is set to RTE_ETH_DEV_FALLBACK_RX_RINGSIZE
or RTE_ETH_DEV_FALLBACK_TX_RINGSIZE, the maximum value is restored.
Remove checks, if provided number is not too big, as this is done in
generic functions (rte_eth_rx_queue_setup and rte_eth_tx_queue_setup).
Maximum number of segments is being set for Rx packets and provided to
ena_com_rx_pkt() for validation.
Unused definitions were removed.
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:29 +0000 (14:18 +0100)]
net/ena: add reset reason in Rx error
Whenever the driver will receive too many descriptors from the device,
it should trigger the device reset with reset reason set to
ENA_REGS_RESET_TOO_MANY_RX_DESCS.
Fixes:
241da076b1f7 ("net/ena: adjust error checking and cleaning")
Cc: stable@dpdk.org
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:28 +0000 (14:18 +0100)]
net/ena: pass number of CPUs to the host info structure
The new ena_com allows the number of CPUs to be passed to the device in
the host info structure.
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
Rafal Kozik [Fri, 14 Dec 2018 13:18:27 +0000 (14:18 +0100)]
net/ena/base: update communication layer for the ENAv2
ena_com is the communication layer provided by the vendor and common to
all ENA drivers.
This patch updates it to version from 2018.09.26.
It adds support for ENAv2 device together with LLQ feature, adds
doorbell optimization and reconfiguration of HW queues depth
independently.
The driver was adjusted to the new changes in the HAL.
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Qi Zhang [Sun, 16 Dec 2018 00:58:35 +0000 (08:58 +0800)]
app/testpmd: batch MAC swap for performance on x86
Do four packets macswap in same loop iterate to squeeze more
CPU cycles.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Qi Zhang [Sun, 16 Dec 2018 00:58:34 +0000 (08:58 +0800)]
app/testpmd: improve MAC swap performance for x86
The patch optimizes the mac swap operation by taking advantage
of SSE instructions, it only impacts x86 platform.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Qi Zhang [Sun, 16 Dec 2018 00:58:33 +0000 (08:58 +0800)]
app/testpmd: move MAC swap functions
Move macswap workload to dedicate function, so we can further enable
platform specific optimized version.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Xiao Wang [Tue, 18 Dec 2018 08:02:07 +0000 (16:02 +0800)]
doc: update ifc NIC document
Add the SW assisted VDPA live migration feature into NIC doc.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xiao Wang [Tue, 18 Dec 2018 08:02:06 +0000 (16:02 +0800)]
net/ifc: support SW assisted VDPA live migration
In SW assisted live migration mode, driver will stop the device and
setup a mediated virtio ring to relay the communication between the
virtio driver and the VDPA device.
This data path intervention will allow SW to help on guest dirty page
logging for live migration.
This SW fallback is event driven relay thread, so when the network
throughput is low, this SW fallback will take little CPU resource, but
when the throughput goes up, the relay thread's CPU usage will goes up
accordingly.
User needs to take all the factors including CPU usage, guest perf
degradation, etc. into consideration when selecting the live migration
support mode.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xiao Wang [Tue, 18 Dec 2018 08:02:05 +0000 (16:02 +0800)]
net/ifc: use vhost lib function for used ring logging
Vhost lib has already provided a helper for used ring logging, driver
could use it to reduce code.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xiao Wang [Tue, 18 Dec 2018 08:02:04 +0000 (16:02 +0800)]
net/ifc: add LM mode parameter
This patch series enables a new method for live migration, i.e. software
assisted live migration. This patch provides a device argument for user
to choose the methold.
When "sw-live-migration=1", driver/device will do live migration with a
relay thread dealing with dirty page logging. Without this parameter,
device will do dirty page logging and there's no relay thread consuming
CPU resource.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xiao Wang [Tue, 18 Dec 2018 08:02:03 +0000 (16:02 +0800)]
net/ifc: detect if VDPA mode is specified
If user wants the VF to be used in VDPA (vhost data path acceleration)
mode, then the user can add a "vdpa=1" parameter for the device.
So if driver does not find this option, it should quit and let the bus
continue the probe.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xiao Wang [Tue, 18 Dec 2018 08:02:02 +0000 (16:02 +0800)]
net/ifc: store only registered device instance
If driver fails to register ifc VF device into vhost lib, then this
device should not be stored.
Fixes:
a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver")
Cc: stable@dpdk.org
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xiao Wang [Tue, 18 Dec 2018 08:02:01 +0000 (16:02 +0800)]
net/ifc: add probing error logs
Driver probe may fail for different causes, debug message is helpful for
debugging issue.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xiao Wang [Tue, 18 Dec 2018 08:02:00 +0000 (16:02 +0800)]
vhost: provide helpers for virtio ring relay
This patch provides two helpers for vdpa device driver to perform a
relay between the guest virtio ring and a mediated virtio ring.
The available ring relay will synchronize the available entries, and
help to do desc validity checking.
The used ring relay will synchronize the used entries from mediated ring
to guest ring, and help to do dirty page logging for live migration.
The later patch will leverage these two helpers.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xiao Wang [Tue, 18 Dec 2018 08:01:59 +0000 (16:01 +0800)]
vhost: provide helper for host notifier ctrl
VDPA driver can decide if it needs to enable/disable the host notifier
mapping, so exposing a API can allow flexibility. A later patch will
base on this.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Xiao Wang [Tue, 18 Dec 2018 08:01:58 +0000 (16:01 +0800)]
vhost: remove unused function
vhost_detach_vdpa_device() is internally defined but not used, remove
it in this patch.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Jens Freimann [Mon, 17 Dec 2018 21:31:39 +0000 (22:31 +0100)]
net/virtio: enable packed virtqueues by default
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Jens Freimann [Mon, 17 Dec 2018 21:31:38 +0000 (22:31 +0100)]
net/virtio-user: fail if cq used with packed vq
Until we have support for control virtqueues let's disable it and
fail device initalization if specified as a parameter.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Yuanhan Liu [Mon, 17 Dec 2018 21:31:37 +0000 (22:31 +0100)]
net/virtio-user: add option to use packed queues
Add option to enable packed queue support for virtio-user
devices.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Jens Freimann [Mon, 17 Dec 2018 21:31:36 +0000 (22:31 +0100)]
net/virtio: support packed queue in send command
Use packed virtqueue format when reading and writing descriptors
to/from the ring.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Jens Freimann [Mon, 17 Dec 2018 21:31:35 +0000 (22:31 +0100)]
net/virtio: implement Rx path for packed queues
Implement the receive part.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Jens Freimann [Mon, 17 Dec 2018 21:31:34 +0000 (22:31 +0100)]
net/virtio: implement Tx path for packed queues
This implements the transmit path for devices with
support for packed virtqueues.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Jens Freimann [Mon, 17 Dec 2018 21:31:33 +0000 (22:31 +0100)]
net/virtio: dump packed virtqueue data
Add support to dump packed virtqueue data to the
VIRTQUEUE_DUMP() macro.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Jens Freimann [Mon, 17 Dec 2018 21:31:32 +0000 (22:31 +0100)]
net/virtio: vring init for packed queues
Add and initialize descriptor data structures.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Jens Freimann [Mon, 17 Dec 2018 21:31:31 +0000 (22:31 +0100)]
net/virtio: add packed virtqueue helpers
Add helper functions to set/clear and check descriptor flags.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Jens Freimann [Mon, 17 Dec 2018 21:31:30 +0000 (22:31 +0100)]
net/virtio: add packed virtqueue defines
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Matthias Gatto [Thu, 6 Dec 2018 16:00:07 +0000 (16:00 +0000)]
vhost: fix race condition when adding fd in the fdset
fdset_add can call fdset_shrink_nolock which call fdset_move
concurrently to poll that is call in fdset_event_dispatch.
This patch add a mutex to protect poll from been call at the same time
fdset_add call fdset_shrink_nolock.
Fixes:
1b815b89599c ("vhost: try to shrink pfdset when fdset_add fails")
Cc: stable@dpdk.org
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tiago Lam [Mon, 17 Dec 2018 09:14:22 +0000 (09:14 +0000)]
doc: add af_packet PMD guide
As of commit
364e08f2bbc0, DPDK allows an application to send and
receive raw packets using an AF_PACKET and PACKET_MMAP, when using
Linux Kernel. This complements it by adding a simple guide with the
following information:
- An introduction, where a brief explanation of this driver is given,
pointing out the dependency on PACKET_MMAP;
- Which options are supported at configuration time, while setting up an
interface, and it's inherent limitations;
- What the prerequisites are;
- A command line example of how to set up a DPDK port using the
af_packet driver.
Since there's a dependency in PACKET_MMAP, the guide also points to the
original Kernel documentation, so the reader can get more details.
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Stephen Hemminger [Fri, 14 Dec 2018 01:26:21 +0000 (17:26 -0800)]
net/netvsc: fix probe when VF not found
It is possible that the VF device exists but DPDK doesn't know
about it. This could happen if device was blacklisted or more
likely the necessary device (Mellanox) was not part of the DPDK
configuration.
In either case, the right thing to do is just keep working
but only with the slower para-virtual device.
Fixes:
dc7680e8597c ("net/netvsc: support integrated VF")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Stephen Hemminger [Fri, 14 Dec 2018 01:26:20 +0000 (17:26 -0800)]
net/netvsc: fix transmit descriptor pool cleanup
On device close or startup errors, the transmit descriptor pool
was being left behind.
Fixes:
4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Stephen Hemminger [Fri, 14 Dec 2018 01:26:19 +0000 (17:26 -0800)]
net/netvsc: support receive without VLAN strip
In some cases, VLAN stripping is not desireable. If necessary
re-insert stripped VLAN tag.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Xiaoyun Li [Thu, 6 Dec 2018 06:03:42 +0000 (14:03 +0800)]
net/i40e: fix statistics inconsistency
While calculating the input packet count per port, discarded packets
should be reduced, right now only PF VSI discarded packets are reduced.
But while calculating the input byte count per port, Rx byte count is
used, which should take all discarded packets into account, including
VF VSI ones.
This will cause inconsistency in stat counters in some cases.
This patch would take all VSI stats as packet and byte count to address
the issue.
Fixes:
763de290cbd1 ("net/i40e: fix packet count for PF")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Zhirun Yan [Thu, 13 Dec 2018 15:46:45 +0000 (15:46 +0000)]
net/i40e: clear VF reset flags after reset
The reset flags vf->vf_reset and vf->pend_msg are set when VF received
VIRTCHNL_EVENT_RESET_IMPENDING. So after resetting done, these flags
should be cleared.
Fixes:
8cacf78469a7 ("net/i40e: fix VF initialization error")
Cc: stable@dpdk.org
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Anatoly Burakov [Fri, 21 Dec 2018 11:29:03 +0000 (11:29 +0000)]
test/mem: check non-heap external memory API
Currently, extmem autotest only covers the external malloc heap
API. Extend it to also cover the non-heap, register/unregister
external memory API.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Fri, 21 Dec 2018 11:29:02 +0000 (11:29 +0000)]
test/mem: check external memseg list
Extend the extmem autotest to check whether the memseg lists for
externally allocated memory are always marked as external.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Fri, 21 Dec 2018 11:29:01 +0000 (11:29 +0000)]
test/mem: check external memory without IOVA table
Currently, only scenario with valid IOVA table is tested. Fix this
by also testing without IOVA table - in these cases, EAL should
always return RTE_BAD_IOVA for all memsegs, and contiguous memzone
allocation should fail.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Fri, 21 Dec 2018 11:29:00 +0000 (11:29 +0000)]
test/mem: refactor and rename functions
We will be adding a new extmem test that will behave roughly similar
to already existing, so clarify function names to distinguish between
these tests, as well as factor out the common parts.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Fri, 21 Dec 2018 12:26:05 +0000 (12:26 +0000)]
malloc: fix deadlock when reading stats
Currently, malloc statistics and external heap creation code
use memory hotplug lock as a way to synchronize accesses to
heaps (as in, locking the hotplug lock to prevent list of heaps
from changing under our feet). At the same time, malloc
statistics code will also lock the heap because it needs to
access heap data and does not want any other thread to allocate
anything from that heap.
In such scheme, it is possible to enter a deadlock with the
following sequence of events:
thread 1 thread 2
rte_malloc()
rte_malloc_dump_stats()
take heap lock
take hotplug lock
failed to allocate,
attempt to take
hotplug lock
attempt to take heap lock
Neither thread will be able to continue, as both of them are
waiting for the other one to drop the lock. Adding an
additional lock will require an ABI change, so instead of
that, make malloc statistics calls thread-unsafe with
respect to creating/destroying heaps.
Fixes:
72cf92b31855 ("malloc: index heaps using heap ID rather than NUMA node")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Arnon Warshavsky [Tue, 18 Dec 2018 15:19:00 +0000 (17:19 +0200)]
devtools: fix return of forbidden addition checks
Explicitly collect the error code of the multiple awk script calls.
Bugzilla ID: 165
Fixes:
4d4c612e6a30 ("devtools: check wrong svg include in guides")
Cc: stable@dpdk.org
Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
Honnappa Nagarahalli [Thu, 22 Nov 2018 02:51:56 +0000 (20:51 -0600)]
hash: fix out-of-bound write while freeing key slot
Add a debug check for out-of-bound write while freeing the key slot.
Coverity issue: 325733
Fixes:
e605a1d36ca7 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Jeff Shaw [Sat, 8 Dec 2018 00:01:26 +0000 (16:01 -0800)]
hash: fix return of bulk lookup
The __rte_hash_lookup_bulk() function returns void, and therefore
should not return with an expression. This commit fixes the following
compiler warning when attempting to compile with "-pedantic -std=c11".
warning: ISO C forbids ‘return’ with expression, in function
returning void [-Wpedantic]
Fixes:
9eca8bd7a61c ("hash: separate lock-free and r/w lock lookup")
Cc: stable@dpdk.org
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Liang Ma [Thu, 20 Dec 2018 14:43:42 +0000 (14:43 +0000)]
power: add p-state driver compatibility
Previously, in order to use the power library, it was necessary
for the user to disable the intel_pstate driver by adding
“intel_pstate=disable” to the kernel command line for the system,
which causes the acpi_cpufreq driver to be loaded in its place.
This patch adds the ability for the power library use the intel-pstate
driver.
It adds a new suite of functions behind the current power library API,
and will seamlessly set up the user facing API function pointers to
the relevant functions depending on whether the system is running with
acpi_cpufreq kernel driver, intel_pstate kernel driver or in a guest,
using kvm. The library API and ABI is unchanged.
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Qi Zhang [Thu, 20 Dec 2018 12:51:14 +0000 (20:51 +0800)]
eal: close multi-process socket during cleanup
When secondary process quit, the mp_socket* file still exist, that
cause rte_mp_request_sync fail when try to send message on a floating
socket.
The patch fix the issue by introduce a function rte_mp_channel_cleanup.
This function will be called by rte_eal_cleanup and it will close the
mp socket and delete the mp_socket* file.
Fixes:
bacaa2754017 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Anatoly Burakov [Thu, 20 Dec 2018 12:09:50 +0000 (12:09 +0000)]
eal: add 64-bit log2 function
Add missing implementation for 64-bit log2 function, and extend
the unit test to test this new function. Also, remove duplicate
reimplementation of this function from testpmd and memalloc.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Thu, 20 Dec 2018 12:09:49 +0000 (12:09 +0000)]
eal: add 64-bit fls function
Add missing implementation for 64-bit fls function, and extend
unit test to test the new function as well.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Thu, 20 Dec 2018 12:09:48 +0000 (12:09 +0000)]
eal: add 64-bit bsf and 32-bit safe bsf functions
Add an rte_bsf64 function that follows the convention of existing
rte_bsf32 function. Also, add missing implementation for safe
version of rte_bsf32, and implement unit tests for all recently
added bsf varieties.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Thu, 20 Dec 2018 12:09:47 +0000 (12:09 +0000)]
bitmap: remove deprecated 64-bit bsf function
The function rte_bsf64 was deprecated in a previous release, so
remove the function, and the deprecation notice associated with
it.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Thu, 20 Dec 2018 11:11:48 +0000 (11:11 +0000)]
eal: fix runtime directory cleanup in noshconf mode
When using --no-shconf or --in-memory modes, there is no runtime
directory to be created, so there is no point in attempting to
clean it.
Fixes:
0a529578f162 ("eal: clean up unused files on initialization")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Wed, 5 Dec 2018 16:26:00 +0000 (16:26 +0000)]
test/fbarray: add to meson
Currently, fbarray autotest is only built by make, but is
missing from meson build files.
Fixes:
7985860c18af ("test/fbarray: add autotests")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Anatoly Burakov [Wed, 5 Dec 2018 16:25:48 +0000 (16:25 +0000)]
test/mem: add external mem autotest to meson
The 'external_mem_autotest' was defined in the meson build, but
the actual source file was not being compiled by meson.
Fixes:
b270daa43b3d ("test: support external memory")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Thu, 6 Dec 2018 09:30:33 +0000 (09:30 +0000)]
doc: remove note on memory mode limitation in multi-process
Memory mode flags are now shared between primary and secondary
processes, so the in documentation about limitations is no longer
necessary.
Fixes:
64cdfc35aaad ("mem: store memory mode flags in shared config")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Thu, 13 Dec 2018 11:43:19 +0000 (11:43 +0000)]
test/mem: check segment fd API
Use memory autotest to also test segment fd API. This will not do
any checks - just see if the relevant API's return success or
indicate that the API is not supported.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Anatoly Burakov [Thu, 13 Dec 2018 11:43:18 +0000 (11:43 +0000)]
mem: use memfd for no-huge mode
When running in no-huge mode, we anonymously allocate our memory.
While this works for regular NICs and vdev's, it's not suitable
for memory sharing scenarios such as virtio with vhost_user
backend.
To fix this, allocate no-huge memory using memfd, and register
it with memalloc just like any other memseg fd. This will enable
using rte_memseg_get_fd() API with --no-huge EAL flag.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Anatoly Burakov [Thu, 13 Dec 2018 11:43:17 +0000 (11:43 +0000)]
mem: allow setting up segment list fd
Currently, only segment fd's for multi-file segments are supported,
while for memfd-backed no-huge memory we need single-file segments
mode. Add support for single-file segments in the internal API.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Anatoly Burakov [Thu, 13 Dec 2018 11:43:16 +0000 (11:43 +0000)]
mem: check for memfd support in segment fd API
If memfd support was not compiled, or hugepage memfd support
is not available at runtime, the API will now return proper
error code, indicating that this API is unsupported. This
changes the API, so document the changes.
Fixes:
41dbdb68723b ("mem: add external API to retrieve page fd")
Fixes:
3a44687139eb ("mem: allow querying offset into segment fd")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Anatoly Burakov [Thu, 13 Dec 2018 11:43:15 +0000 (11:43 +0000)]
mem: fix segment fd API error code for external segment
Segment fd API does not support getting segment fd's from
externally allocated memory, so return proper error code
on any attempts to do so. This changes API behavior, so
document the change as well.
Fixes:
5282bb1c3695 ("mem: allow memseg lists to be marked as external")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Anatoly Burakov [Thu, 20 Dec 2018 15:32:41 +0000 (15:32 +0000)]
mem: allow usage of non-heap external memory in multiprocess
Add multiprocess support for externally allocated memory areas that
are not added to DPDK heap (and add relevant doc sections).
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Anatoly Burakov [Thu, 20 Dec 2018 15:32:40 +0000 (15:32 +0000)]
mem: allow registering external memory areas
The general use-case of using external memory is well covered by
existing external memory API's. However, certain use cases require
manual management of externally allocated memory areas, so this
memory should not be added to the heap. It should, however, be
added to DPDK's internal structures, so that API's like
``rte_virt2memseg`` would work on such external memory segments.
This commit adds such an API to DPDK. The new functions will allow
to register and unregister externally allocated memory areas, as
well as documentation for them.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Anatoly Burakov [Thu, 20 Dec 2018 15:32:39 +0000 (15:32 +0000)]
malloc: separate destroying memseg list and heap data
Currently, destroying external heap chunk and its memseg list is
part of one process. When we will gain the ability to unregister
external memory from DPDK that doesn't have any heap structures
associated with it, we need to be able to find and destroy
memseg lists as well as heap data separately.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Anatoly Burakov [Thu, 20 Dec 2018 15:32:38 +0000 (15:32 +0000)]
malloc: separate creating memseg list and malloc heap
Currently, creating external malloc heap involves also creating
a memseg list backing that malloc heap. We need to have them as
separate functions, to allow creating memseg lists without
creating a malloc heap.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Anatoly Burakov [Fri, 14 Dec 2018 11:54:01 +0000 (11:54 +0000)]
malloc: make alignment requirements more stringent
The external heaps API already implicitly expects start address
of the external memory area to be page-aligned, but it is not
enforced or documented. Fix this by implementing additional
parameter checks at memory add call, and document the page
alignment requirement explicitly.
Fixes:
7d75c31014f7 ("malloc: allow adding memory to named heaps")
Cc: stable@dpdk.org
Suggested-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Anatoly Burakov [Tue, 11 Dec 2018 16:48:28 +0000 (16:48 +0000)]
malloc: fix duplicate mem event notification
We already trigger a mem event notification inside the walk function,
no need to do it twice.
Fixes:
f32c7c9de961 ("malloc: enable event callbacks for external memory")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Seth Howell [Fri, 7 Dec 2018 20:10:42 +0000 (13:10 -0700)]
malloc: notify primary process about hotplug in secondary
When secondary process hotplugs memory, it sends a request
to primary, which then performs the real mmap() and sends
sync requests to all secondary processes. Upon receiving
such sync request, each secondary process will notify the
upper layers of hotplugged memory (and will call all
locally registered event callbacks).
In the end we'll end up with memory event callbacks fired
in all the processes except the primary, which is a bug.
This gets critical if memory is hotplugged while a VFIO
device is attached, as the VFIO memory registration -
which is done from a memory event callback present in the
primary process only - is never called.
After this patch, a primary process fires memory event
callbacks before secondary processes start their
synchronizations - both for hotplug and hotremove.
Fixes:
07dcbfe0101f ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org
Signed-off-by: Seth Howell <seth.howell@intel.com>
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Yongseok Koh [Wed, 12 Dec 2018 11:10:54 +0000 (03:10 -0800)]
malloc: fix finding maximum contiguous IOVA size
malloc_elem_find_max_iova_contig() could return invalid size due to a
missing sanity check. The following gdb output shows how 'cur_size' can be
invalid in find_biggest_element().
(gdb) p/x cur_size
$4 = 0xffffffffffe42900
(gdb) p elem
$1 = (struct malloc_elem *) 0x12e842000
(gdb) p *elem
$2 = {heap = 0x7ffff7ff387c, prev = 0x12e831fc0, next =
0x12e842900, free_list = {le_next = 0x109538000, le_prev =
0x7ffff7ff3894}, msl = 0x7ffff7ff107c, state = ELEM_FREE,
pad = 0, size = 2304}
(gdb) p *elem->msl
$5 = {{base_va = 0x100200000, addr_64 =
4297064448}, page_sz =
2097152, socket_id = 0, version = 790, len =
17179869184,
external = 0, memseg_arr = {name = "memseg-2048k-0-0",
'\000' <repeats 47 times>, count = 493, len = 8192, elt_sz
= 48, data = 0x10002e000, rwlock = {cnt = 0}}}
Fixes:
9fe6bceafd51 ("malloc: add finding biggest free IOVA-contiguous element")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Jim Harris [Fri, 14 Dec 2018 17:13:03 +0000 (10:13 -0700)]
malloc: add option --match-allocations
SPDK uses the rte_mem_event_callback_register API to
create RDMA memory regions (MRs) for newly allocated regions
of memory. This is used in both the SPDK NVMe-oF target
and the NVMe-oF host driver.
DPDK creates internal malloc_elem structures for these
allocated regions. As users malloc and free memory, DPDK
will sometimes merge malloc_elems that originated from
different allocations that were notified through the
registered mem_event callback routine. This results
in subsequent allocations that can span across multiple
RDMA MRs. This requires SPDK to check each DPDK buffer to
see if it crosses an MR boundary, and if so, would have to
add considerable logic and complexity to describe that
buffer before it can be accessed by the RNIC. It is somewhat
analagous to rte_malloc returning a buffer that is not
IOVA-contiguous.
As a malloc_elem gets split and some of these elements
get freed, it can also result in DPDK sending an
RTE_MEM_EVENT_FREE notification for a subset of the
original RTE_MEM_EVENT_ALLOC notification. This is also
problematic for RDMA memory regions, since unregistering
the memory region is all-or-nothing. It is not possible
to unregister part of a memory region.
To support these types of applications, this patch adds
a new --match-allocations EAL init flag. When this
flag is specified, malloc elements from different
hugepage allocations will never be merged. Memory will
also only be freed back to the system (with the requisite
memory event callback) exactly as it was originally
allocated.
Since part of this patch is extending the size of struct
malloc_elem, we also fix up the malloc autotests so they
do not assume its size exactly fits in one cacheline.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Gao Feng [Fri, 7 Dec 2018 01:20:08 +0000 (09:20 +0800)]
memzone: fix unlock on initialization failure
The RTE_PROC_PRIMARY error handler lost the unlock statement in the
current codes. Now unlock and return in one place to fix it.
Fixes:
49df3db84883 ("memzone: replace memzone array with fbarray")
Cc: stable@dpdk.org
Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tone Zhang [Wed, 12 Dec 2018 11:25:59 +0000 (19:25 +0800)]
vfio: support 64KB kernel page size
With a larger PAGE_SIZE it is possible for the MSI table to very
close to the end of the BAR s.t. when we align the start and end
of the MSI table to the PAGE_SIZE, the end offset of the MSI
table is out of the PCI BAR boundary.
This patch addresses the issue by comparing both the start and the
end offset of the MSI table with the BAR size, and skip the mapping
if it is out of Bar scope.
The patch fixes the debug log as below:
EAL: Skipping BAR0
Signed-off-by: Tone Zhang <tone.zhang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Gao Feng [Wed, 5 Dec 2018 06:19:24 +0000 (14:19 +0800)]
eal: check peer allocation in multi-process request
Add the check for null peer pointer like the bundle pointer in the mp request
handler. They should follow same style. And add some logs for nomem cases.
Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Gao Feng [Wed, 5 Dec 2018 02:50:25 +0000 (10:50 +0800)]
eal: fix leak on multi-process request error
When rte_eal_alarm_set failed, need to free the bundle mem in the
error handler of handle_primary_request and handle_secondary_request.
Fixes:
244d5130719c ("eal: enable hotplug on multi-process")
Fixes:
ac9e4a17370f ("eal: support attach/detach shared device from secondary")
Cc: stable@dpdk.org
Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Gaetan Rivet [Mon, 17 Dec 2018 09:25:59 +0000 (10:25 +0100)]
eal: fix detection of duplicate option register
Missing brackets around the if means that the loop will end at
its first iteration.
Fixes:
2395332798d0 ("eal: add option register infrastructure")
Cc: stable@dpdk.org
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Keith Wiles [Sun, 16 Dec 2018 23:01:03 +0000 (17:01 -0600)]
eal: fix missing newline in a log
Add a missing newline to a RTE_LOG message.
Fixes:
2395332798d0 ("eal: add option register infrastructure")
Cc: stable@dpdk.org
Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Chas Williams [Tue, 27 Nov 2018 04:56:13 +0000 (23:56 -0500)]
ip_frag: fix IPv6 when MTU sizes not aligned to 8 bytes
The same issue was fixed on for the ipv4 version of this routine in
commit
8d4d3a4f7337 ("ip_frag: handle MTU sizes not aligned to 8 bytes").
Briefly, the size of an ipv6 header is always 40 bytes. With an MTU of
1500, this will never produce a multiple of 8 bytes for the frag_size
and this routine can never succeed. Since RTE_ASSERTS are disabled by
default, this failure is typically ignored.
To fix this, round down to the nearest 8 bytes and use this when
producing the fragments.
Fixes:
0aa31d7a5929 ("ip_frag: add IPv6 fragmentation support")
Cc: stable@dpdk.org
Signed-off-by: Chas Williams <chas3@att.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Wei Zhao [Wed, 21 Nov 2018 09:49:17 +0000 (17:49 +0800)]
examples/ipv4_multicast: remove useless mbuf info copy
There is no need for these useless information and
it had better be removed in order to not confuse users.
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
David Hunt [Fri, 14 Dec 2018 13:31:37 +0000 (13:31 +0000)]
examples/power: increase max cores to 256
Increase the number of addressable cores from 64 to 256. Also remove the
warning that incresing this number beyond 64 will cause problems (because
of the previous use of uint64_t masks). Now this number can be increased
significantly without causing problems.
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
David Hunt [Fri, 14 Dec 2018 13:31:36 +0000 (13:31 +0000)]
examples/power: allow VM to use lcores over 63
Extending the functionality to allow vms to power manage cores beyond 63.
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
David Hunt [Fri, 14 Dec 2018 13:31:35 +0000 (13:31 +0000)]
examples/power: remove mask functions
Since we're moving to allowing greater than 64 cores, the mask functions
that use uint64_t to perform functions on a masked set of cores are no
longer needed, so removing them.
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
David Hunt [Fri, 14 Dec 2018 13:31:34 +0000 (13:31 +0000)]
examples/power: change 64-bit masks to arrays
vm_power_manager currently makes use of uint64_t masks to keep track of
cores in use, limiting use of the app to only being able to manage the
first 64 cores in a multi-core system. Many modern systems have core
counts greater than 64, so this limitation needs to be removed.
This patch converts the relevant 64-bit masks to character arrays.
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Konstantin Ananyev [Wed, 19 Dec 2018 18:07:17 +0000 (18:07 +0000)]
test/rwlock: add new test-cases
Add few functional and perfomance tests
for rte_rwlock_read_trylock() and rte_rwlock_write_trylock().
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Konstantin Ananyev [Wed, 19 Dec 2018 18:07:16 +0000 (18:07 +0000)]
rwlock: introduce try semantics
Introduce rte_rwlock_read_trylock() and rte_rwlock_write_trylock().
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Erik Gabriel Carrillo [Wed, 19 Dec 2018 16:09:34 +0000 (10:09 -0600)]
timer: fix race condition
rte_timer_manage() adds expired timers to a "run list", and walks the
list, transitioning each timer from the PENDING to the RUNNING state.
If another lcore resets or stops the timer at precisely this
moment, the timer state would instead be set to CONFIG by that other
lcore, which would cause timer_manage() to skip over it. This is
expected behavior.
However, if a timer expires quickly enough, there exists the
following race condition that causes the timer_manage() routine to
misinterpret a timer in CONFIG state, resulting in lost timers:
- Thread A:
- starts a timer with rte_timer_reset()
- the timer is moved to CONFIG state
- the spinlock associated with the appropriate skiplist is acquired
- timer is inserted into the skiplist
- the spinlock is released
- Thread B:
- executes rte_timer_manage()
- find above timer as expired, add it to run list
- walk run list, see above timer still in CONFIG state, unlink it from
run list and continue on
- Thread A:
- move timer to PENDING state
- return from rte_timer_reset()
- timer is now in PENDING state, but not actually linked into a
pending list or a run list and will never get processed further
by rte_timer_manage()
This commit fixes this race condition by only releasing the spinlock
after the timer state has been transitioned from CONFIG to PENDING,
which prevents rte_timer_manage() from seeing an incorrect state.
Fixes:
9b15ba895b9f ("timer: use a skip list")
Cc: stable@dpdk.org
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Amr Mokhtar [Wed, 19 Dec 2018 10:00:27 +0000 (10:00 +0000)]
bbdev: add missing experimental tags and map entries
- add missing APIs to map file
- add experimental tag to all bbdev APIs
Signed-off-by: Amr Mokhtar <amr.mokhtar@intel.com>
Tomasz Jozwiak [Wed, 12 Dec 2018 12:08:05 +0000 (13:08 +0100)]
app/compress-perf: refactor code
Code refactoring to separate validation from benchmarking part.
Added op's status checking after rte_compressdev_dequeue_burst
function.
Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
Acked-by: Shally Verma <shally.verma@caviumnetworks.com>
Tomasz Jozwiak [Wed, 12 Dec 2018 12:08:04 +0000 (13:08 +0100)]
app/compress-perf: add dynamic compression test
Added dynamic compression feature into compression perf. test.
Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
Acked-by: Shally Verma <shally.verma@caviumnetworks.com>
Tomasz Jozwiak [Wed, 12 Dec 2018 12:08:03 +0000 (13:08 +0100)]
doc: add details for compress-perf app
Added:
- initial version of compression performance test
description file.
- release note in release_18_11.rst
Updated index.rst file
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
Acked-by: Shally Verma <shally.verma@caviumnetworks.com>
Tomasz Jozwiak [Wed, 12 Dec 2018 12:08:02 +0000 (13:08 +0100)]
app/compress-perf: add performance measurement
Added performance measurement part into compression perf. test.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
Acked-by: Shally Verma <shally.verma@caviumnetworks.com>
Tomasz Jozwiak [Wed, 12 Dec 2018 12:08:01 +0000 (13:08 +0100)]
app/compress-perf: add parser
Added parser part into compression perf. test.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
Acked-by: Shally Verma <shally.verma@caviumnetworks.com>