dpdk.git
7 years agovhost: add user callbacks for socket open/close
Dariusz Stojaczyk [Wed, 30 Aug 2017 10:50:58 +0000 (12:50 +0200)]
vhost: add user callbacks for socket open/close

Added new callbacks to notify about socket connection status.
As destroy_device is used for virtqueue processing *pause* as well as
connection close, the user has no distinction between those.

Consider the following scenario:
rte_vhost: received SET_VRING_BASE message,
           calling destroy_device() as usual

user:  end-user asks to remove the device (together with socket file),
       OK, device is not *in use* - that's NOT the behavior we want
       calling rte_vhost_driver_unregister() etc.

Instead of changing new_device/destroy_device callbacks and breaking
the ABI, a set of new functions new_connection/destroy_connection
has been added.

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
7 years agovhost: check poll error code
Kuba Kozak [Fri, 22 Sep 2017 12:17:40 +0000 (14:17 +0200)]
vhost: check poll error code

Add return value check for poll() call.

Coverity issue: 140740
Fixes: 59317cef249c ("vhost: allow many vhost-user ports")
Cc: stable@dpdk.org
Signed-off-by: Kuba Kozak <kubax.kozak@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio-user: fix TAP name string termination
Sebastian Basierski [Tue, 19 Sep 2017 11:41:04 +0000 (13:41 +0200)]
net/virtio-user: fix TAP name string termination

Fix calling strncpy with the a maximum size equal of destination
array size.

Coverity issue: 140732
Fixes: e3b434818bbb ("net/virtio-user: support kernel vhost")
Cc: stable@dpdk.org
Signed-off-by: Sebastian Basierski <sebastianx.basierski@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: use pointer to replace memcpy
Zhiyong Yang [Fri, 11 Aug 2017 02:13:18 +0000 (10:13 +0800)]
net/virtio: use pointer to replace memcpy

To use pointer instead of memcpy can save many cycles in the funciton
virtio_send_command.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: fix a typo
Jay Zhou [Tue, 22 Aug 2017 02:34:36 +0000 (10:34 +0800)]
net/virtio: fix a typo

Fixed a comment in struct virtionet_ctl, referring to the ring type

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: enable IOMMU support
Maxime Coquelin [Thu, 5 Oct 2017 08:36:27 +0000 (10:36 +0200)]
vhost: enable IOMMU support

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: invalidate vring in case of matching IOTLB invalidate
Maxime Coquelin [Thu, 5 Oct 2017 08:36:26 +0000 (10:36 +0200)]
vhost: invalidate vring in case of matching IOTLB invalidate

As soon as a page used by a ring is invalidated, the access_ok flag
is cleared, so that processing threads try to map them again.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: postpone device creation until rings are mapped
Maxime Coquelin [Thu, 5 Oct 2017 08:36:25 +0000 (10:36 +0200)]
vhost: postpone device creation until rings are mapped

Translating the start addresses of the rings is not enough, we need to
be sure all the ring is made available by the guest.

It depends on the size of the rings, which is not known on SET_VRING_ADDR
reception. Furthermore, we need to be be safe against vring pages
invalidates.

This patch introduces a new access_ok flag per virtqueue, which is set
when all the rings are mapped, and cleared as soon as a page used by a
ring is invalidated. The invalidation part is implemented in a following
patch.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: translate ring addresses when IOMMU enabled
Maxime Coquelin [Thu, 5 Oct 2017 08:36:24 +0000 (10:36 +0200)]
vhost: translate ring addresses when IOMMU enabled

When IOMMU is enabled, the ring addresses set by the
VHOST_USER_SET_VRING_ADDR requests are guest's IO virtual addresses,
whereas Qemu virtual addresses when IOMMU is disabled.

When enabled and the required translation is not in the IOTLB cache,
an IOTLB miss request is sent, but being called by the vhost-user
socket handling thread, the function does not wait for the requested
IOTLB update.

The function will be called again on the next IOTLB update message
reception if matching the vring addresses.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: postpone rings addresses translation
Maxime Coquelin [Thu, 5 Oct 2017 08:36:23 +0000 (10:36 +0200)]
vhost: postpone rings addresses translation

This patch postpones rings addresses translations and checks, as
addresses sent by the master shuld not be interpreted as long as
ring is not started and enabled[0].

When protocol features aren't negotiated, the ring is started in
enabled state, so the addresses translations are postponed to
vhost_user_set_vring_kick().
Otherwise, it is postponed to when ring is enabled, in
vhost_user_set_vring_enable().

[0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04355.html

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: fix dereferencing invalid pointer after realloc
Maxime Coquelin [Thu, 5 Oct 2017 08:36:22 +0000 (10:36 +0200)]
vhost: fix dereferencing invalid pointer after realloc

numa_realloc() reallocates the virtio_net device structure and
updates the vhost_devices[] table with the new pointer if the rings
are allocated different NUMA node.

Problem is that vhost_user_msg_handler() still dereferences old
pointer afterward.

This patch prevents this by fetching again the dev pointer in
vhost_devices[] after messages have been handled.

Fixes: af295ad4698c ("vhost: realloc device and queues to same numa node as vring desc")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: enable rings at the right time
Maxime Coquelin [Thu, 5 Oct 2017 08:36:21 +0000 (10:36 +0200)]
vhost: enable rings at the right time

When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the ring is not
enabled when started, but enabled through dedicated
VHOST_USER_SET_VRING_ENABLE request.

When not negotiated, the ring is started in enabled state, at
VHOST_USER_SET_VRING_KICK request time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: use the guest IOVA to host VA helper
Maxime Coquelin [Thu, 5 Oct 2017 08:36:20 +0000 (10:36 +0200)]
vhost: use the guest IOVA to host VA helper

Replace rte_vhost_gpa_to_vva() calls with vhost_iova_to_vva(), which
requires to also pass the mapped len and the access permissions needed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: introduce guest IOVA to backend VA helper
Maxime Coquelin [Thu, 5 Oct 2017 08:36:19 +0000 (10:36 +0200)]
vhost: introduce guest IOVA to backend VA helper

This patch introduces vhost_iova_to_vva() function to translate
guest's IO virtual addresses to backend's virtual addresses.

When IOMMU is enabled, the IOTLB cache is queried to get the
translation. If missing from the IOTLB cache, an IOTLB_MISS request
is sent to Qemu, and IOTLB cache is queried again on IOTLB event
notification.

When IOMMU is disabled, the passed address is a guest's physical
address, so the legacy rte_vhost_gpa_to_vva() API is used.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: handle IOTLB update and invalidate requests
Maxime Coquelin [Thu, 5 Oct 2017 08:36:18 +0000 (10:36 +0200)]
vhost: handle IOTLB update and invalidate requests

Vhost-user device IOTLB protocol extension introduces
VHOST_USER_IOTLB message type. The associated payload is the
vhost_iotlb_msg struct defined in Kernel, which in this was can
be either an IOTLB update or invalidate message.

On IOTLB update, the virtqueues get notified of a new entry.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: initialize vrings IOTLB caches
Maxime Coquelin [Thu, 5 Oct 2017 08:36:17 +0000 (10:36 +0200)]
vhost: initialize vrings IOTLB caches

The per-virtqueue IOTLB cache init is done at virtqueue
init time. init_vring_queue() now takes vring id as parameter,
so that the IOTLB cache mempool name can be generated.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: support IOTLB miss slave requests
Maxime Coquelin [Thu, 5 Oct 2017 08:36:16 +0000 (10:36 +0200)]
vhost: support IOTLB miss slave requests

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: add pending IOTLB miss request list and helpers
Maxime Coquelin [Thu, 5 Oct 2017 08:36:15 +0000 (10:36 +0200)]
vhost: add pending IOTLB miss request list and helpers

In order to be able to handle other ports or queues while waiting
for an IOTLB miss reply, a pending list is created so that waiter
can return and restart later on with sending again a miss request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: add IOTLB helper functions
Maxime Coquelin [Thu, 5 Oct 2017 08:36:14 +0000 (10:36 +0200)]
vhost: add IOTLB helper functions

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: add IOMMU-related macros for old kernels
Maxime Coquelin [Thu, 5 Oct 2017 08:36:13 +0000 (10:36 +0200)]
vhost: add IOMMU-related macros for old kernels

These defines and enums have been introduced in upstream kernel v4.8,
and backported to RHEL 7.4.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: support slave requests channel
Maxime Coquelin [Thu, 5 Oct 2017 08:36:12 +0000 (10:36 +0200)]
vhost: support slave requests channel

Currently, only QEMU sends requests, the backend sends
replies. In some cases, the backend may need to send
requests to QEMU, like IOTLB miss events when IOMMU is
supported.

This patch introduces a new channel for such requests.
QEMU sends a file descriptor of a new socket using
VHOST_USER_SET_SLAVE_REQ_FD.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: prepare for slave requests
Maxime Coquelin [Thu, 5 Oct 2017 08:36:11 +0000 (10:36 +0200)]
vhost: prepare for slave requests

send_vhost_message() is currently only used to send
replies, so it modifies message flags to perpare the
reply.

With upcoming channel for backend initiated request,
this function can be used to send requests.

This patch introduces a new send_vhost_reply() that
does the message flags modifications, and makes
send_vhost_message() generic.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: make error handling consistent in Rx path
Maxime Coquelin [Thu, 5 Oct 2017 08:36:10 +0000 (10:36 +0200)]
vhost: make error handling consistent in Rx path

In the non-mergeable receive case, when copy_mbuf_to_desc()
call fails the packet is skipped, the corresponding used element
len field is set to vnet header size, and it continues with next
packet/desc. It could be a problem because it does not know why
it failed, and assume the desc buffer is large enough.

In mergeable receive case, when copy_mbuf_to_desc_mergeable()
fails, packets burst is simply stopped.

This patch makes the non-mergeable error path to behave as the
mergeable one, as it seems the safest way. Also, doing this way
will simplify pending IOTLB miss requests handling.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: revert workaround MQ fails to startup
Maxime Coquelin [Thu, 5 Oct 2017 08:36:09 +0000 (10:36 +0200)]
vhost: revert workaround MQ fails to startup

This reverts commit 04d81227960b ("vhost: workaround MQ fails to
startup").

As agreed when this workaround was introduced, it can be reverted
as Qemu v2.10 that fixes the issue is now out.

The reply-ack feature is required for vhost-user IOMMU support.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: fix untrusted scalar value
Daniel Mrzyglod [Fri, 22 Sep 2017 15:21:49 +0000 (17:21 +0200)]
net/virtio: fix untrusted scalar value

The unscrutinized value may be incorrectly assumed to be within a certain
range by later operations.

In vhost_user_read: An unscrutinized value from an untrusted source used
in a trusted context - the value of sz_payload may be harmfull and we need
limit them to the max value of payload.

Coverity issue: 139601
Fixes: 6a84c37e3975 ("net/virtio-user: add vhost-user adapter layer")
Cc: stable@dpdk.org
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: fix Rx handler when checksum is requested
Olivier Matz [Thu, 7 Sep 2017 12:13:47 +0000 (14:13 +0200)]
net/virtio: fix Rx handler when checksum is requested

The simple Rx handler is selected even if Rx checksum offload is
requested by the application, but this handler does not support
offloads. This results in broken received packets (no checksum flag but
invalid checksum in the mbuf data).

Disable the simple Rx handler in that case.

Fixes: 96cb6711939e ("net/virtio: support Rx checksum offload")

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: keep Rx handler whatever the Tx queue config
Olivier Matz [Thu, 7 Sep 2017 12:13:46 +0000 (14:13 +0200)]
net/virtio: keep Rx handler whatever the Tx queue config

Split use_simple_rxtx into use_simple_rx and use_simple_tx,
and ensure that only use_simple_tx is updated when txq flags
forces to use the standard Tx handler.

This change is also useful for next commit (disable simple Rx
path when Rx checksum is requested).

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: remove SSE check
Olivier Matz [Thu, 7 Sep 2017 12:13:45 +0000 (14:13 +0200)]
net/virtio: remove SSE check

Since commit f27769f796a0 ("mk: require SSE4.2 support on all x86
platforms"), SSE4.2 is a requirement when compiling on x86 platforms.

We can remove this check in the virtio driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: rationalize setting of Rx/Tx handlers
Olivier Matz [Thu, 7 Sep 2017 12:13:44 +0000 (14:13 +0200)]
net/virtio: rationalize setting of Rx/Tx handlers

The selection of Rx/Tx handlers is done at several places,
group them in one function set_rxtx_funcs().

The update of hw->use_simple_rxtx is also rationalized:
- initialized to 1 (prefer simple path)
- in dev configure or rx/tx queue setup, if something prevents from
  using the simple path, change it to 0.
- in dev start, set the handlers according to hw->use_simple_rxtx.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: fix queue setup consistency
Olivier Matz [Thu, 7 Sep 2017 12:13:43 +0000 (14:13 +0200)]
net/virtio: fix queue setup consistency

In rx/tx queue setup functions, some code is executed only if
use_simple_rxtx == 1. The value of this variable can change depending on
the offload flags or sse support. If Rx queue setup is called before Tx
queue setup, it can result in an invalid configuration:

- dev_configure is called: use_simple_rxtx is initialized to 0
- rx queue setup is called: queues are initialized without simple path
  support
- tx queue setup is called: use_simple_rxtx switch to 1, and simple
  Rx/Tx handlers are selected

Fix this by postponing a part of Rx/Tx queue initialization in
dev_start(), as it was the case in the initial implementation.

Fixes: 48cec290a3d2 ("net/virtio: move queue configure code to proper place")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: fix mbuf port for simple Rx function
Olivier Matz [Thu, 7 Sep 2017 12:13:42 +0000 (14:13 +0200)]
net/virtio: fix mbuf port for simple Rx function

The mbuf->port was was not properly set for the first received
mbufs. Fix this by setting it in virtqueue_enqueue_recv_refill_simple(),
which is used to enqueue the first mbuf in the ring.

The function virtio_rxq_rearm_vec(), which is used to rearm the ring
with new mbufs, is correct and does not need to be updated.

Fixes: cab0461234e7 ("virtio: fill Rx avail ring with blank mbufs")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: fix log levels in configure
Olivier Matz [Thu, 7 Sep 2017 12:13:41 +0000 (14:13 +0200)]
net/virtio: fix log levels in configure

On error, we should log with error level.

Fixes: 9f4f2846ef76 ("virtio: support vlan filtering")
Fixes: 86d59b21468a ("net/virtio: support LRO")
Fixes: 96cb6711939e ("net/virtio: support Rx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agodoc: fix description of L4 Rx checksum offload
Olivier Matz [Thu, 7 Sep 2017 12:13:40 +0000 (14:13 +0200)]
doc: fix description of L4 Rx checksum offload

As described in API documentation, the field hw_ip_checksum
requests both L3 and L4 offload.

Fixes: dad1ec72a377 ("doc: document NIC features")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: revert not claiming IP checksum offload
Olivier Matz [Thu, 7 Sep 2017 12:13:39 +0000 (14:13 +0200)]
net/virtio: revert not claiming IP checksum offload

This reverts
commit 4dab342b7522 ("net/virtio: do not falsely claim to do IP checksum").

The description of rxmode->hw_ip_checksum is:

     hw_ip_checksum   : 1, /**< IP/UDP/TCP checksum offload enable. */

Despite its name, this field can be set by an application to enable L3
and L4 checksums. In case of virtio, only L4 checksum is supported and
L3 checksums flags will always be set to "unknown".

Fixes: 4dab342b7522 ("net/virtio: do not falsely claim to do IP checksum")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: revert not claiming LRO support
Olivier Matz [Thu, 7 Sep 2017 12:13:38 +0000 (14:13 +0200)]
net/virtio: revert not claiming LRO support

This reverts
commit 701a64622c26 ("net/virtio: do not claim to support LRO")

Setting rxmode->enable_lro is a way to tell the host that the guest is
ok to receive tso packets. From the guest point of view, it is like
enabling LRO on a physical driver.

Fixes: 701a64622c26 ("net/virtio: do not claim to support LRO")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio-user: send kick notify backend on init
Steven Luong [Tue, 1 Aug 2017 16:17:36 +0000 (09:17 -0700)]
net/virtio-user: send kick notify backend on init

Acccording to the vhost-user spec [0], client must start ring
upon receiving a kick (that is, detecting that file descriptor
is reachable) on the descriptor specified by VHOST_USER_SET_VRING_KICK.

The code sends a kick to the rx queue. It is missing sending a
kick for the tx queue. This patch is to add the missing code to
comply with the spec.

[0]: https://fossies.org/linux/qemu/docs/specs/vhost-user.txt

Signed-off-by: Steven Luong <sluong@cisco.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovhost: batch small guest memory copies
Tiwei Bie [Fri, 8 Sep 2017 12:50:46 +0000 (20:50 +0800)]
vhost: batch small guest memory copies

This patch adaptively batches the small guest memory copies.
By batching the small copies, the efficiency of executing the
memory LOAD instructions can be improved greatly, because the
memory LOAD latency can be effectively hidden by the pipeline.
We saw great performance boosts for small packets PVP test.

This patch improves the performance for small packets, and has
distinguished the packets by size. So although the performance
for big packets doesn't change, it makes it relatively easy to
do some special optimizations for the big packets too.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: replace magic number with PCI constant
Zhiyong Yang [Thu, 3 Aug 2017 01:21:50 +0000 (09:21 +0800)]
net/virtio: replace magic number with PCI constant

To use macro instead of magic number in order to enhance code
readability.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agonet/virtio: fix indent
Zhiyong Yang [Thu, 3 Aug 2017 01:21:49 +0000 (09:21 +0800)]
net/virtio: fix indent

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
7 years agovfio: refactor PCI BAR mapping
Jonas Pfefferle [Fri, 6 Oct 2017 14:40:20 +0000 (16:40 +0200)]
vfio: refactor PCI BAR mapping

Split pci_vfio_map_resource for primary and secondary processes.
Save all relevant mapping data in primary process to allow
the secondary process to perform mappings.

Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
7 years agovfio: fix sPAPR IOMMU DMA window size
Jonas Pfefferle [Tue, 8 Aug 2017 11:16:42 +0000 (13:16 +0200)]
vfio: fix sPAPR IOMMU DMA window size

DMA window size needs to be big enough to span all memory segment's
physical addresses. We do not need multiple levels of IOMMU tables
as we already span ~70TB of physical memory with 16MB hugepages.

Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
7 years agobus/dpaa: fix memory allocation during scan
Shreyansh Jain [Tue, 10 Oct 2017 09:34:58 +0000 (15:04 +0530)]
bus/dpaa: fix memory allocation during scan

With the IOVA auto detection changes, bus scan is performed before
memory initialization. DPAA bus scan must not use rte_malloc in
its path.

Fixes: cf408c22476c ("eal: auto detect IOVA mode")

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
7 years agodoc: add use of mlockall to programmers guide
Eelco Chaudron [Mon, 2 Oct 2017 10:01:40 +0000 (12:01 +0200)]
doc: add use of mlockall to programmers guide

When I was adding mlockall() to the testpmd application it was
suggested to add a reference to the use case of mlockall(). This patch
adds is.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
7 years agoapp/testpmd: avoid pages being swapped out
Eelco Chaudron [Fri, 29 Sep 2017 08:11:10 +0000 (10:11 +0200)]
app/testpmd: avoid pages being swapped out

Call the mlockall() function, to attempt to lock all of its process
memory into physical RAM, and preventing the kernel from paging any
of its memory to disk.

When using testpmd for performance testing, depending on the code path
taken, we see a couple of page faults in a row. These faults effect
the overall drop-rate of testpmd. On Linux the mlockall() call will
prefault all the pages of testpmd (and the DPDK libraries if linked
dynamically), even without LD_BIND_NOW.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agoeal: copy raw strings taken from command line
Patrick MacArthur [Fri, 4 Aug 2017 18:53:57 +0000 (14:53 -0400)]
eal: copy raw strings taken from command line

Normally, command line argument strings are considered immutable, but
SPDK [1] and urdma [2] construct argv arrays to pass to rte_eal_init().
These strings are allocated using malloc() and freed after DPDK
initialization with free(). However, in the case of --file-prefix and
--huge-dir, DPDK takes the pointer to these strings in argv directly. If
a secondary process calls rte_eal_pci_probe() after rte_eal_init()
returns, as is done by SPDK, this causes a use-after-free error because
the strings have been freed by the calling code immediately after
rte_eal_init() returns.

This problem was observed when running SPDK example programs as a
secondary process and causes the secondary processes to fail:

Starting DPDK 16.11.1 initialization...
[ DPDK EAL parameters: identify -c 4 --file-prefix=spdk3260 --base-virtaddr=0x1000000000 --proc-type=auto ]
EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: SECONDARY
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:81:00.0 on NUMA socket 1
EAL:   probe driver: 8086:953 spdk_nvme
EAL:   cannot connect to primary process!
EAL: Error - exiting with code: 1
Cause: Requested device 0000:81:00.0 cannot be used

Running strace shows that the file prefix has been zero'd out by the
time that the secondary process attempts to probe the NVMe device.

The use-after-free errors can be easily detected with valgrind:

==8489== Invalid read of size 1
==8489==    at 0x4C30D22: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489==    by 0x58DB955: vfprintf (vfprintf.c:1637)
==8489==    by 0x59A4685: __vsnprintf_chk (vsnprintf_chk.c:63)
==8489==    by 0x59A45E7: __snprintf_chk (snprintf_chk.c:34)
==8489==    by 0x1246AB: get_socket_path.constprop.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x124B09: vfio_mp_sync_connect_to_primary (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x123BE4: vfio_get_group_fd.part.1 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x124366: vfio_setup_device (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x126C8A: pci_vfio_map_resource (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x12B115: pci_probe_all_drivers.part.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x12B596: rte_eal_pci_probe (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x11D5B5: spdk_pci_enumerate (pci.c:147)
==8489==  Address 0x63f362e is 14 bytes inside a block of size 32 free'd
==8489==    at 0x4C2ED5B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489==    by 0x11E6FB: spdk_free_args (init.c:136)
==8489==    by 0x11EBF5: spdk_env_init (init.c:309)
==8489==    by 0x10D2AA: main (identify.c:976)
==8489==  Block was alloc'd at
==8489==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489==    by 0x11E7D7: _sprintf_alloc (init.c:76)
==8489==    by 0x11EA78: spdk_build_eal_cmdline (init.c:251)
==8489==    by 0x11EA78: spdk_env_init (init.c:282)
==8489==    by 0x10D2AA: main (identify.c:976)
==8489==

Fix this by using strdup() to create separate memory buffers for these
strings. Note that this patch will cause valgrind to report memory
leaks of these buffers as there is nowhere to free them. Using static
buffers is an option but would make these strings have a fixed maximum
length whereas there is currently no limit defined by the API.

[1] http://spdk.io
[2] https://github.com/zrlio/urdma

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Patrick MacArthur <patrick@patrickmacarthur.net>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
7 years agomem: check mmap failure
Seth Howell [Mon, 28 Aug 2017 21:49:12 +0000 (14:49 -0700)]
mem: check mmap failure

If mmap fails, it will return the value MAP_FAILED. Checking for this
return code allows us to properly identify mmap failures and report
them as such to the calling function.

Signed-off-by: Seth Howell <seth.howell@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
7 years agomem: fix malloc element free in debug mode
Xueming Li [Sat, 9 Sep 2017 07:33:19 +0000 (15:33 +0800)]
mem: fix malloc element free in debug mode

malloc_elem_free() is clearing(setting to 0) the trailer cookie when
RTE_MALLOC_DEBUG is enabled. In case of joining free neighbor element,
part of joined memory is not getting cleared due to missing the length
of trailer cookie in the middle.

This patch fixes calculation of free memory length to be cleared in
malloc_elem_free() by including trailer cookie.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
7 years agomem: fix malloc debug config
Xueming Li [Sat, 9 Sep 2017 07:33:18 +0000 (15:33 +0800)]
mem: fix malloc debug config

This patch replaces broken macro RTE_LIBRTE_MALLOC_DEBUG with
RTE_MALLOC_DEBUG.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
7 years agoconfig: add option to enable asserts
Xueming Li [Thu, 24 Aug 2017 08:23:10 +0000 (16:23 +0800)]
config: add option to enable asserts

Currently, enabling assertion have to set CONFIG_RTE_LOG_LEVEL to
RTE_LOG_DEBUG. CONFIG_RTE_LOG_LEVEL is the default log level of control
path, RTE_LOG_DP_LEVEL is the log level of data path. It's a little bit
hard to understand literally that assertion is decided by control path
LOG_LEVEL, especially assertion used on data path.

On the other hand, DPDK need an assertion enabling switch w/o impacting
log output level, assuming "--log-level" not specified.

Assertion is an important API to balance DPDK high performance and
robustness. To promote assertion usage, it's valuable to unhide
assertion out of COFNIG_RTE_LOG_LEVEL.

In one word, log is log, assertion is assertion, debug is hot pot :)

Rationale of this patch is to introduce an dedicate switch of
assertion: RTE_ENABLE_ASSERT

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
7 years agotest: add check for AVX512F
Zhiyong Yang [Tue, 19 Sep 2017 02:25:31 +0000 (10:25 +0800)]
test: add check for AVX512F

The CPUs which support AVX512 have been released. Add support for
checking AVX512F instruction set.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agomempool/octeontx: fix icc build
Pablo de Lara [Mon, 9 Oct 2017 05:21:59 +0000 (06:21 +0100)]
mempool/octeontx: fix icc build

drivers/mempool/octeontx/octeontx_fpavf.c(789):
error #592: variable "fpa" is used before its value is set
        RTE_SET_USED(fpa);

Fixes: 1c842786fe6c ("mempool/octeontx: probe fpavf PCIe devices")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
7 years agoeal: remove Xen dom0 support
Jianfeng Tan [Thu, 14 Sep 2017 02:40:29 +0000 (02:40 +0000)]
eal: remove Xen dom0 support

We remove xen-specific code in EAL, including the option --xen-dom0,
memory initialization code, compiling dependency, etc.

Related documents are removed or updated, and bump the eal library
version.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
7 years agomem: remove API to get physical address in dom0
Jianfeng Tan [Thu, 14 Sep 2017 02:40:28 +0000 (02:40 +0000)]
mem: remove API to get physical address in dom0

Previously, to get MFN address in dom0, this API is a wrapper to
obtain the "physical address".

As we will removed xen dom0 support, this API is not necessary.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agoxen: remove dependency in libraries
Jianfeng Tan [Thu, 14 Sep 2017 02:40:27 +0000 (02:40 +0000)]
xen: remove dependency in libraries

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agoxen: remove dependency in applications
Jianfeng Tan [Thu, 14 Sep 2017 02:40:26 +0000 (02:40 +0000)]
xen: remove dependency in applications

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agonet/xenvirt: remove
Jianfeng Tan [Thu, 14 Sep 2017 02:40:25 +0000 (02:40 +0000)]
net/xenvirt: remove

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agoexamples/vhost_xen: remove
Jianfeng Tan [Thu, 14 Sep 2017 02:40:24 +0000 (02:40 +0000)]
examples/vhost_xen: remove

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agotest/cfgfile: add realloc scenario
Jacek Piasecki [Fri, 22 Sep 2017 09:44:50 +0000 (11:44 +0200)]
test/cfgfile: add realloc scenario

Load huge realloc_sections.ini file to check malloc/realloc
ability of cfgfile library.

Signed-off-by: Jacek Piasecki <jacekx.piasecki@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agocfgfile: rework load function
Jacek Piasecki [Fri, 22 Sep 2017 09:44:49 +0000 (11:44 +0200)]
cfgfile: rework load function

New functions added to cfgfile library make it possible
to significantly simplify the code of rte_cfgfile_load_with_params()

This patch shows the new body of this function.

Signed-off-by: Jacek Piasecki <jacekx.piasecki@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agocfgfile: support runtime modification
Jacek Piasecki [Fri, 22 Sep 2017 09:44:48 +0000 (11:44 +0200)]
cfgfile: support runtime modification

Extend existing cfgfile library with providing new API functions:

rte_cfgfile_create() - create new cfgfile object
rte_cfgfile_add_section() - add new section to existing cfgfile
object
rte_cfgfile_add_entry() - add new entry to existing cfgfile
object in specified section
rte_cfgfile_set_entry() - update existing entry in cfgfile object
rte_cfgfile_save() - save existing cfgfile object to INI file

This modification allows to create a cfgfile on
runtime and opens up the possibility to have applications
dynamically build up a proper DPDK configuration, rather than having
to have a pre-existing one.

Signed-off-by: Jacek Piasecki <jacekx.piasecki@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agocfgfile: rework to flat arrays
Jacek Piasecki [Fri, 22 Sep 2017 09:44:47 +0000 (11:44 +0200)]
cfgfile: rework to flat arrays

Change to flat arrays in cfgfile struct force slightly
different data access for most of cfgfile functions.
This patch provides necessary changes in existing API.

Signed-off-by: Jacek Piasecki <jacekx.piasecki@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agocfgfile: remove EAL dependency
Jacek Piasecki [Fri, 22 Sep 2017 09:44:46 +0000 (11:44 +0200)]
cfgfile: remove EAL dependency

This patch removes the dependency to EAL in cfgfile library.

Signed-off-by: Jacek Piasecki <jacekx.piasecki@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agodoc: add membership documentation
Yipeng Wang [Wed, 4 Oct 2017 03:12:25 +0000 (20:12 -0700)]
doc: add membership documentation

This patch adds the documentation for membership library.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
7 years agotest/member: add functional and perf tests
Yipeng Wang [Wed, 4 Oct 2017 03:12:24 +0000 (20:12 -0700)]
test/member: add functional and perf tests

This patch adds functional and performance tests for membership
library.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
7 years agomember: add AVX for HT mode
Yipeng Wang [Wed, 4 Oct 2017 03:12:22 +0000 (20:12 -0700)]
member: add AVX for HT mode

For key search, the signatures of all entries are compared against
the signature of the key that is being looked up. Since all
signatures are contiguously put in a bucket, they can be compared
with vector instructions (AVX2), achieving higher lookup performance.

This patch adds AVX2 implementation in a separate header file.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
7 years agomember: implement vBF mode
Yipeng Wang [Wed, 4 Oct 2017 03:12:21 +0000 (20:12 -0700)]
member: implement vBF mode

Bloom Filter (BF) [1] is a well-known space-efficient
probabilistic data structure that answers set membership queries.
Vector of Bloom Filters (vBF) is an extension to traditional BF
that supports multi-set membership testing. Traditional BF will
return found or not-found for each key. vBF will also return
which set the key belongs to if it is found.

Since each set requires a BF, vBF should be used when set count
is small. vBF's false positive rate could be set appropriately so
that its memory requirement and lookup speed is better in certain
cases comparing to HT based set-summary.

This patch adds the vBF implementation.

[1]B H Bloom, “Space/Time Trade-offs in Hash Coding with Allowable
Errors,” Communications of the ACM, 1970.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
7 years agomember: implement HT mode
Yipeng Wang [Wed, 4 Oct 2017 03:12:20 +0000 (20:12 -0700)]
member: implement HT mode

One of the set-summary structures is hash-table based
set-summary (HTSS). One example is cuckoo filter [1].

Comparing to a traditional hash table, HTSS has a much more
compact structure. For each element, only one signature and
its corresponding set ID is stored. No key comparison is required
during lookup. For the table structure, there are multiple entries
in each bucket, and the table is composed of many buckets.

Two modes are supported for HTSS, "cache" and "none-cache" modes.
The non-cache mode is similar to the cuckoo filter [1].
When a bucket is full, one entry will be evicted to its
alternative bucket to make space for the new key. The table could
be full and then no more keys could be inserted. This mode has
false-positive rate but no false-negative. Multiple entries
with same signature could stay in the same bucket.

The "cache" mode does not evict key to its alternative bucket
when a bucket is full, an existing key will be evicted out of
the table like a cache. Thus, the table will never reject keys when
it is full. Another property is in each bucket, there cannot be
multiple entries with same signature. The mode could have both
false-positive and false-negative probability.

This patch adds the implementation of HTSS.

[1] B Fan, D G Andersen and M Kaminsky, “Cuckoo Filter: Practically
Better Than Bloom,” in Conference on emerging Networking
Experiments and Technologies, 2014.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
7 years agomember: implement main API
Yipeng Wang [Wed, 4 Oct 2017 03:12:19 +0000 (20:12 -0700)]
member: implement main API

Membership library is an extension and generalization of a traditional
filter (for example Bloom Filter and cuckoo filter) structure.
In general, the Membership library is a data structure that provides a
"set-summary" and responds to set-membership queries of whether a
certain element belongs to a set(s). A membership test for an element
will return the set this element belongs to or not-found if the
element is never inserted into the set-summary.

The results of the membership test are not 100% accurate. Certain
false positive or false negative probability could exist. However,
comparing to a "full-blown" complete list of elements, a "set-summary"
is memory efficient and fast on lookup.

This patch adds the main API definition.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
7 years agojobstats: fix a doxygen comment
CongWen Zhang [Mon, 21 Aug 2017 00:44:51 +0000 (08:44 +0800)]
jobstats: fix a doxygen comment

Signed-off-by: CongWen Zhang <zhang.congwen@zte.com.cn>
Reviewed-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
7 years agomempool/octeontx: support memory area ops
Santosh Shukla [Sun, 8 Oct 2017 12:40:10 +0000 (18:10 +0530)]
mempool/octeontx: support memory area ops

Add support for register_memory_area ops in mempool driver.

Allow more than one HW pool when using OcteonTx mempool driver:
By storing each pool information to the list and find appropriate
list element by matching the rte_mempool pointers.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agomempool/octeontx: support capabilities query
Santosh Shukla [Sun, 8 Oct 2017 12:40:09 +0000 (18:10 +0530)]
mempool/octeontx: support capabilities query

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agomempool/octeontx: support count query
Santosh Shukla [Sun, 8 Oct 2017 12:40:08 +0000 (18:10 +0530)]
mempool/octeontx: support count query

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agomempool/octeontx: support enqueue and dequeue
Santosh Shukla [Sun, 8 Oct 2017 12:40:07 +0000 (18:10 +0530)]
mempool/octeontx: support enqueue and dequeue

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agomempool/octeontx: support freeing
Santosh Shukla [Sun, 8 Oct 2017 12:40:06 +0000 (18:10 +0530)]
mempool/octeontx: support freeing

Upon pool free request from application, Octeon FPA free
does following:
- Uses mbox to reset fpapf pool setup.
- frees fpavf resources.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agomempool/octeontx: support allocation
Santosh Shukla [Sun, 8 Oct 2017 12:40:05 +0000 (18:10 +0530)]
mempool/octeontx: support allocation

Upon pool allocation request by application, Octeontx FPA alloc
does following:
- Gets free pool from pci fpavf array.
- Uses mbox to communicate fpapf driver about,
  * gpool-id
  * pool block_sz
  * alignemnt
- Programs fpavf pool boundary.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agomempool/octeontx: probe fpavf PCIe devices
Santosh Shukla [Sun, 8 Oct 2017 12:40:04 +0000 (18:10 +0530)]
mempool/octeontx: probe fpavf PCIe devices

A mempool device is set of PCIe vfs.
On Octeontx HW, each mempool devices are enumerated as
separate SRIOV VF PCIe device.

In order to expose as a mempool device:
On PCIe probe, the driver stores the information associated with the
PCIe device and later upon application pool request
(e.g. rte_mempool_create_empty), Infrastructure creates a pool device
with earlier probed PCIe VF devices.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agomempool/octeontx: add build and log infrastructure
Santosh Shukla [Sun, 8 Oct 2017 12:40:03 +0000 (18:10 +0530)]
mempool/octeontx: add build and log infrastructure

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agomempool/octeontx: add HW constants
Santosh Shukla [Sun, 8 Oct 2017 12:40:02 +0000 (18:10 +0530)]
mempool/octeontx: add HW constants

add HW constants of octeontx fpa mempool device.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoigb_uio: fix build on arm64 kernel
Hemant Agrawal [Sun, 8 Oct 2017 09:17:32 +0000 (14:47 +0530)]
igb_uio: fix build on arm64 kernel

IGB_UIO compilation recently got enabled for ARM64 by default

The igb_uio compilation against ARM64 based stock 4.x (e.g. 4.13)
kernel is giving compilation warnings:

igb_uio.c: In function ‘igbuio_pci_irqcontrol’:
igb_uio.c:115:25: error: implicit declaration of function
‘irq_get_irq_dat ’ [-Werror=implicit-function-declaration]
  struct irq_data *irq = irq_get_irq_data(udev->info.irq);
                         ^
igb_uio.c:115:25: error: initialization makes pointer from integer without
a cast [-Werror=int-conversion]

Fixes: d196343a258e ("igb_uio: use kernel functions for masking MSI-X")

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
7 years agohash: optimize Toeplitz RSS computation
Yangchao Zhou [Tue, 22 Aug 2017 12:02:35 +0000 (20:02 +0800)]
hash: optimize Toeplitz RSS computation

Use rte_bsf32 and fast bit unset operation to optimize the
softrss computation.
The following measurements shows improvement over the default
softrss computation function.

tuple lens old(cycles) new(cycles)
    3        1225         337
    9        3743         992

Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
7 years agohash: fix eviction counter
Pablo de Lara [Fri, 22 Sep 2017 04:25:43 +0000 (05:25 +0100)]
hash: fix eviction counter

When adding a new entry in a hash table, there is
a maximum number of evictions that can be
performed. When the counter of these evictions reaches
this maximum, the entry cannot be added, as it is considered
that the algorithm has encountered an infinite loop.

The problem with the current implementation, is that this
counter was declared as a static variable.
If there are multiple threads adding entries in the same table
or in different tables, they should access different counters,
one per core and per table.

Therefore, the variable has been modified to be non-static.

Fixes: 243e93a5046f ("hash: fix unlimited cuckoo path")
Cc: stable@dpdk.org
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agoigb_uio: use UIO macro instead of hardcoded value
Tonghao Zhang [Mon, 18 Sep 2017 07:46:57 +0000 (00:46 -0700)]
igb_uio: use UIO macro instead of hardcoded value

This is not bugfix, but it's convenient to help developer
to review and maintain the igbuio codes.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
7 years agoigb_uio: add MSI IRQ mode
Markus Theil [Tue, 5 Sep 2017 12:04:06 +0000 (14:04 +0200)]
igb_uio: add MSI IRQ mode

This patch adds MSI IRQ mode in a way, that should
also work on older kernel versions. The base for my patch
was an attempt to do this in cf705bc36c which was later
reverted in d8ee82745a. Compilation was tested on Linux 3.2,
4.10 and 4.12.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
7 years agoigb_uio: use kernel functions for masking MSI-X
Markus Theil [Tue, 5 Sep 2017 12:04:05 +0000 (14:04 +0200)]
igb_uio: use kernel functions for masking MSI-X

This patch removes the custom MSI-X mask/unmask code and
uses already existing kernel functions.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
7 years agoigb_uio: release in exact reverse order
Markus Theil [Tue, 5 Sep 2017 12:04:04 +0000 (14:04 +0200)]
igb_uio: release in exact reverse order

For better readability throughout the module, the destruction
order is changed to the exact inverse construction order.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
7 years agoigb_uio: fix MSI-X IRQ assignment with new IRQ function
Markus Theil [Tue, 5 Sep 2017 12:04:03 +0000 (14:04 +0200)]
igb_uio: fix MSI-X IRQ assignment with new IRQ function

The patch which introduced the usage of pci_alloc_irq_vectors
came after the patch which switched to non-threaded ISR (f0d1896fa1),
but did not use non-threaded ISR, if pci_alloc_irq_vectors
is used.

Fixes: 99bb58f3adc7 ("igb_uio: switch to new irq function for MSI-X")
Cc: stable@dpdk.org
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
7 years agoigb_uio: fix IRQ disable on recent kernels
Markus Theil [Tue, 5 Sep 2017 12:04:02 +0000 (14:04 +0200)]
igb_uio: fix IRQ disable on recent kernels

igb_uio already allocates irqs using pci_alloc_irq_vectors on
recent kernels >= 4.8. The interrupt disable code was not
using the corresponding pci_free_irq_vectors, but the also
deprecated pci_disable_msix, before this fix.

Fixes: 99bb58f3adc7 ("igb_uio: switch to new irq function for MSI-X")
Cc: stable@dpdk.org
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
7 years agoigb_uio: refactor IRQ enable/disable into own functions
Markus Theil [Tue, 5 Sep 2017 12:04:01 +0000 (14:04 +0200)]
igb_uio: refactor IRQ enable/disable into own functions

Interrupt setup code in igb_uio has to deal with multiple
types of interrupts and kernel versions. This patch moves
the setup and teardown code into own functions, to make
it more readable.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Tested-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
7 years agoapp/procinfo: fix compilation with -O3
Keith Wiles [Tue, 15 Aug 2017 13:53:05 +0000 (08:53 -0500)]
app/procinfo: fix compilation with -O3

When using EXTRA_CFLAGS="-g -O3" in the build the -O3 causes
compiler warnings. Using Ubuntu 17.04 gcc compiler.

Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
7 years agokni: fix SLE version detection
Nirmoy Das [Fri, 11 Aug 2017 16:33:14 +0000 (18:33 +0200)]
kni: fix SLE version detection

detect SLE version reverse chronologically as ">=" is being used.

Fixes: 2972254ce163 ("kni: fix build on Suse 12 SP3")
Cc: stable@dpdk.org
Signed-off-by: Nirmoy Das <ndas@suse.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
7 years agoconfig: enable igb_uio on arm64
Jianbo Liu [Thu, 14 Sep 2017 03:53:47 +0000 (11:53 +0800)]
config: enable igb_uio on arm64

The kernel patch was merged to support pci resource mapping.
https://patchwork.kernel.org/patch/9677441/

So enable igu_uio in the default arm64 configuration.

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
7 years agomempool: notify memory area to pool
Santosh Shukla [Sun, 1 Oct 2017 09:29:02 +0000 (14:59 +0530)]
mempool: notify memory area to pool

HW pool manager e.g. Octeontx SoC demands s/w to program start and end
address of pool. Currently, there is no such api in external mempool.
Introducing rte_mempool_ops_register_memory_area api which will let HW(pool
manager) to know when common layer selects hugepage:
For each hugepage - Notify its start/end address to HW pool manager.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
7 years agomempool: introduce block size alignment flag
Santosh Shukla [Sun, 1 Oct 2017 09:29:01 +0000 (14:59 +0530)]
mempool: introduce block size alignment flag

Some mempool hw like octeontx/fpa block, demands block size
(/total_elem_sz) aligned object start address.

Introducing an MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS flag.
If this flag is set:
- Align object start address(vaddr) to a multiple of total_elt_sz.
- Allocate one additional object. Additional object is needed to make
  sure that requested 'n' object gets correctly populated.

Example:
- Let's say that we get 'x' size of memory chunk from memzone.
- And application has requested 'n' object from mempool.
- Ideally, we start using objects at start address 0 to...(x-block_sz)
  for n obj.
- Not necessarily first object address i.e. 0 is aligned to block_sz.
- So we derive 'offset' value for block_sz alignment purpose i.e..'off'.
- That 'off' makes sure that start address of object is blk_sz aligned.
- Calculating 'off' may end up sacrificing first block_sz area of
  memzone area x. So total number of the object which can fit in the
  pool area is n-1, Which is incorrect behavior.

Therefore we request one additional object (/block_sz area) from memzone
when MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS flag is set.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
7 years agomempool: detect physical contiguous objects
Santosh Shukla [Sun, 1 Oct 2017 09:29:00 +0000 (14:59 +0530)]
mempool: detect physical contiguous objects

The memory area containing all the objects must be physically
contiguous.
Introducing MEMPOOL_F_CAPA_PHYS_CONTIG flag for such use-case.

The flag useful to detect whether pool area has sufficient space
to fit all objects. If not then return -ENOSPC.
This way, we make sure that all object within a pool is contiguous.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
7 years agomempool: get capabilities
Santosh Shukla [Sun, 1 Oct 2017 09:28:59 +0000 (14:58 +0530)]
mempool: get capabilities

Allow the mempool driver to advertise his pool capabilities.
For that pupose, an api(rte_mempool_ops_get_capabilities)
and ->get_capabilities() handler has been introduced.
- Upon ->get_capabilities() call, mempool driver will advertise
his capabilities to mempool flags param.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
7 years agodoc: remove mempool deprecation notice
Santosh Shukla [Sun, 1 Oct 2017 09:28:58 +0000 (14:58 +0530)]
doc: remove mempool deprecation notice

Removed mempool deprecation notice and
updated change info in release_17.11.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
7 years agomempool: add flags arg in xmem size and usage
Santosh Shukla [Sun, 1 Oct 2017 09:28:57 +0000 (14:58 +0530)]
mempool: add flags arg in xmem size and usage

xmem_size and xmem_usage need to know the status of mempool flags,
so add 'flags' arg in _xmem_size/usage() api.

Following patch will make use of that.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
7 years agomempool: change flags from int to unsigned int
Santosh Shukla [Sun, 1 Oct 2017 09:28:56 +0000 (14:58 +0530)]
mempool: change flags from int to unsigned int

mp->flags is int and mempool API writes unsigned int
value in 'flags', so fix the 'flags' data type.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
7 years agomempool: remove unused flags argument
Santosh Shukla [Sun, 1 Oct 2017 09:28:55 +0000 (14:58 +0530)]
mempool: remove unused flags argument

* Remove redundant 'flags' API description from
  - __mempool_generic_put
  - __mempool_generic_get
  - rte_mempool_generic_put
  - rte_mempool_generic_get

* Remove unused 'flags' argument from
  - rte_mempool_generic_put
  - rte_mempool_generic_get

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
7 years agoethdev: get supported mempool per port
Santosh Shukla [Fri, 6 Oct 2017 07:45:30 +0000 (13:15 +0530)]
ethdev: get supported mempool per port

Now that dpdk supports more than one mempool drivers and
each mempool driver works best for specific PMD, example:
- sw ring based mempool for Intel PMD drivers.
- dpaa2 HW mempool manager for dpaa2 PMD driver.
- fpa HW mempool manager for Octeontx PMD driver.

Application would like to know the best mempool handle
for any port.

Introducing rte_eth_dev_pool_ops_supported() API,
which allows PMD driver to advertise
his supported pool capability to the application.

Supported pools are categorized in below priority:-
- Best mempool handle for this port (Highest priority '0')
- Port supports this mempool handle (Priority '1')

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>