dpdk.git
6 years agodoc: add GSO programmer's guide
Mark Kavanagh [Sat, 7 Oct 2017 14:56:44 +0000 (22:56 +0800)]
doc: add GSO programmer's guide

Add programmer's guide doc to explain the design and use of the
GSO library.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agoapp/testpmd: enable TCP/IPv4 VxLAN and GRE GSO
Jiayu Hu [Sat, 7 Oct 2017 14:56:43 +0000 (22:56 +0800)]
app/testpmd: enable TCP/IPv4 VxLAN and GRE GSO

This patch adds GSO support to the csum forwarding engine. Oversized
packets transmitted over a GSO-enabled port will undergo segmentation
(with the exception of packet-types unsupported by the GSO library).
GSO support is disabled by default.

GSO support may be toggled on a per-port basis, using the command:

        "set port <port_id> gso on|off"

The maximum packet length (including the packet header and payload) for
GSO segments may be set with the command:

        "set gso segsz <length>"

Show GSO configuration for a given port with the command:

"show port <port_id> gso"

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agogso: support GRE GSO
Mark Kavanagh [Sat, 7 Oct 2017 14:56:42 +0000 (22:56 +0800)]
gso: support GRE GSO

This patch adds GSO support for GRE-tunneled packets. Supported GRE
packets must contain an outer IPv4 header, and inner TCP/IPv4 headers.
They may also contain a single VLAN tag. GRE GSO doesn't check if all
input packets have correct checksums and doesn't update checksums for
output packets. Additionally, it doesn't process IP fragmented packets.

As with VxLAN GSO, GRE GSO uses a two-segment MBUF to organize each
output packet, which requires multi-segment mbuf support in the TX
functions of the NIC driver. Also, if a packet is GSOed, GRE GSO reduces
its MBUF refcnt by 1. As a result, when all of its GSOed segments are
freed, the packet is freed automatically.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agogso: support VxLAN GSO
Mark Kavanagh [Sat, 7 Oct 2017 14:56:41 +0000 (22:56 +0800)]
gso: support VxLAN GSO

This patch adds a framework that allows GSO on tunneled packets.
Furthermore, it leverages that framework to provide GSO support for
VxLAN-encapsulated packets.

Supported VxLAN packets must have an outer IPv4 header (prepended by an
optional VLAN tag), and contain an inner TCP/IPv4 packet (with an optional
inner VLAN tag).

VxLAN GSO doesn't check if input packets have correct checksums and
doesn't update checksums for output packets. Additionally, it doesn't
process IP fragmented packets.

As with TCP/IPv4 GSO, VxLAN GSO uses a two-segment MBUF to organize each
output packet, which mandates support for multi-segment mbufs in the TX
functions of the NIC driver. Also, if a packet is GSOed, VxLAN GSO
reduces its MBUF refcnt by 1. As a result, when all of its GSO'd segments
are freed, the packet is freed automatically.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agogso: support TCP/IPv4 GSO
Jiayu Hu [Sat, 7 Oct 2017 14:56:40 +0000 (22:56 +0800)]
gso: support TCP/IPv4 GSO

This patch adds GSO support for TCP/IPv4 packets. Supported packets
may include a single VLAN tag. TCP/IPv4 GSO doesn't check if input
packets have correct checksums, and doesn't update checksums for
output packets (the responsibility for this lies with the application).
Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.

TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
MBUF, to organize an output packet. Note that we refer to these two
chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
header, while the indirect mbuf simply points to a location within the
original packet's payload. Consequently, use of the GSO library requires
multi-segment MBUF support in the TX functions of the NIC driver.

If a packet is GSO'd, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
result, when all of its GSOed segments are freed, the packet is freed
automatically.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
6 years agogso: add Generic Segmentation Offload API framework
Jiayu Hu [Sat, 7 Oct 2017 14:56:39 +0000 (22:56 +0800)]
gso: add Generic Segmentation Offload API framework

Generic Segmentation Offload (GSO) is a SW technique to split large
packets into small ones. Akin to TSO, GSO enables applications to
operate on large packets, thus reducing per-packet processing overhead.

To enable more flexibility to applications, DPDK GSO is implemented
as a standalone library. Applications explicitly use the GSO library
to segment packets. To segment a packet requires two steps. The first
is to set proper flags to mbuf->ol_flags, where the flags are the same
as that of TSO. The second is to call the segmentation API,
rte_gso_segment(). This patch introduces the GSO API framework to DPDK.

rte_gso_segment() splits an input packet into small ones in each
invocation. The GSO library refers to these small packets generated
by rte_gso_segment() as GSO segments. Each of the newly-created GSO
segments is organized as a two-segment MBUF, where the first segment is a
standard MBUF, which stores a copy of packet header, and the second is an
indirect MBUF which points to a section of data in the input packet.
rte_gso_segment() reduces the refcnt of the input packet by 1. Therefore,
when all GSO segments are freed, the input packet is freed automatically.
Additionally, since each GSO segment has multiple MBUFs (i.e. 2 MBUFs),
the driver of the interface which the GSO segments are sent to should
support to transmit multi-segment packets.

The GSO framework clears the PKT_TX_TCP_SEG flag for both the input
packet, and all produced GSO segments in the event of success, since
segmentation in hardware is no longer required at that point.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agoapp/testpmd: enable the heavyweight mode TCP/IPv4 GRO
Jiayu Hu [Sat, 7 Oct 2017 07:45:57 +0000 (15:45 +0800)]
app/testpmd: enable the heavyweight mode TCP/IPv4 GRO

The GRO library provides two modes to reassemble packets. Currently, the
csum forwarding engine has supported to use the lightweight mode to
reassemble TCP/IPv4 packets. This patch introduces the heavyweight mode
for TCP/IPv4 GRO in the csum forwarding engine.

With the command "set port <port_id> gro on|off", users can enable
TCP/IPv4 GRO for a given port. With the command "set gro flush <cycles>",
users can determine when the GROed TCP/IPv4 packets are flushed from
reassembly tables. With the command "show port <port_id> gro", users can
display GRO configuration.

The GRO library doesn't re-calculate checksums for merged packets. If
users want the merged packets to have correct IP and TCP checksums,
please select HW IP checksum calculation and HW TCP checksum calculation
for the port which the merged packets are transmitted to.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
6 years agonet/ark: fix loop counter
John Miller [Fri, 6 Oct 2017 18:03:27 +0000 (14:03 -0400)]
net/ark: fix loop counter

Change loop counter that should be based on the number
of rx queues, not tx queues.  This only affects debug
output.

Fixes: 727b3fe292bc ("net/ark: integrate PMD")
Cc: stable@dpdk.org
Signed-off-by: John Miller <john.miller@atomicrules.com>
6 years agonet/bonding: fix LACP slave deactivate behavioral
Declan Doherty [Fri, 6 Oct 2017 09:21:12 +0000 (10:21 +0100)]
net/bonding: fix LACP slave deactivate behavioral

During a link down event of a port participating in a LACP 802.3ad
bond the current behavior can cause all ports to be deselected
and temporarily stop all traffic on the bond, causing unexpected
traffic loss across all ports and not just the port which was
affected by the link down event.

Fixes: 46fb43683679 ("bond: add mode 4")
Cc: stable@dpdk.org
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
6 years agodoc: add API documentation for bnxt PMD
Ferruh Yigit [Mon, 11 Sep 2017 16:33:35 +0000 (17:33 +0100)]
doc: add API documentation for bnxt PMD

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
6 years agonet/virtio: fix compilation with -Og
Olivier Matz [Mon, 11 Sep 2017 15:13:26 +0000 (17:13 +0200)]
net/virtio: fix compilation with -Og

The compilation with gcc-6.3.0 and EXTRA_CFLAGS=-Og gives the following
error:

  CC virtio_rxtx.o
  virtio_rxtx.c: In function ‘virtio_rx_offload’:
  virtio_rxtx.c:680:10: error: ‘csum’ may be used uninitialized in
                        this function [-Werror=maybe-uninitialized]
       csum = ~csum;
       ~~~~~^~~~~~~

The function rte_raw_cksum_mbuf() may indeed return an error, and
in this case, csum won't be initialized. Fix it by initializing csum
to 0.

Fixes: 96cb6711939e ("net/virtio: support Rx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
6 years agonet/mlx5: add operations for secondary process
Xueming Li [Fri, 6 Oct 2017 15:45:51 +0000 (23:45 +0800)]
net/mlx5: add operations for secondary process

Add operations that are safe for secondary processes:
* (x)stats
* device info get
* rx/tx descriptor status

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
6 years agonet/mlx5: allocate verbs object into shared memory
Xueming Li [Fri, 6 Oct 2017 15:45:50 +0000 (23:45 +0800)]
net/mlx5: allocate verbs object into shared memory

PMD uses Verbs object which were not available in the shared memory.

This patch modify the location where Verbs objects are allocated (from
process memory address space to shared memory address space) and thus
allow a secondary process to use those object by mapping this shared
memory space its own memory space.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
6 years agonet/mlx5: install a socket to exchange a file descriptor
Xueming Li [Fri, 6 Oct 2017 15:45:49 +0000 (23:45 +0800)]
net/mlx5: install a socket to exchange a file descriptor

Use a unix socket to get back the communication channel with the Kernel
driver from the primary process, this is necessary to remap those pages
in the secondary process memory space and thus use the same Tx queues.

This is only supported from rdma-core (v15).

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
6 years agonet/mlx5: change eth device reference for secondary process
Xueming Li [Fri, 6 Oct 2017 15:45:48 +0000 (23:45 +0800)]
net/mlx5: change eth device reference for secondary process

rte_eth_dev created by primary process were not available in secondary
process, it was not possible to use the primary process local memory
object from a secondary process.

This patch modify the reference of primary rte_eth_dev object, use
local rte_eth_dev secondary process instead.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
6 years agoexamples/vm_power_mgr: set MAC address of VF
David Hunt [Wed, 11 Oct 2017 16:18:55 +0000 (17:18 +0100)]
examples/vm_power_mgr: set MAC address of VF

We need to set vf mac from the host, so that they will be in sync on the
guest and the host. Otherwise, we'll have a random mac on the guest, and
a 00:00:00:00:00:00 mac on the host.

Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agoexamples/guest_cli: add send policy to host
Rory Sexton [Wed, 11 Oct 2017 16:18:54 +0000 (17:18 +0100)]
examples/guest_cli: add send policy to host

Here we're adding an example of setting up a policy, and allowing the
vm_cli_guest app to send it to the host using the cli command
"send_policy now"

Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com>
Signed-off-by: Rory Sexton <rory.sexton@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agopower: add send channel msg function to map file
David Hunt [Wed, 11 Oct 2017 16:18:53 +0000 (17:18 +0100)]
power: add send channel msg function to map file

Adding new wrapper function to existing private (but unused 'till now)
function with an rte_power_ prefix.

The plan is to clean up all the header files in the next release so
that only the intended public functions are in the map file and only
the relevant headers have the rte_ prefix so that only they are
included in the documentation.

Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agoexamples/vm_power_mgr: add port initialisation
David Hunt [Wed, 11 Oct 2017 16:18:52 +0000 (17:18 +0100)]
examples/vm_power_mgr: add port initialisation

We need to initialise the port's we're monitoring to be able to see
the throughput.

Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agoexamples/vm_power_mgr: add policy to channels
Rory Sexton [Wed, 11 Oct 2017 16:18:51 +0000 (17:18 +0100)]
examples/vm_power_mgr: add policy to channels

Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com>
Signed-off-by: Rory Sexton <rory.sexton@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agoexamples/vm_power_mgr: add scale to medium freq fn
David Hunt [Wed, 11 Oct 2017 16:18:50 +0000 (17:18 +0100)]
examples/vm_power_mgr: add scale to medium freq fn

Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com>
Signed-off-by: Rory Sexton <rory.sexton@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agoexamples/vm_power_mgr: add VCPU to PCPU mapping
David Hunt [Wed, 11 Oct 2017 16:18:49 +0000 (17:18 +0100)]
examples/vm_power_mgr: add VCPU to PCPU mapping

Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com>
Signed-off-by: Rory Sexton <rory.sexton@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agopower: add extra msg type for policies
David Hunt [Wed, 11 Oct 2017 16:18:48 +0000 (17:18 +0100)]
power: add extra msg type for policies

Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com>
Signed-off-by: Rory Sexton <rory.sexton@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agonet/i40e: support converting VF MAC to VF id
Rory Sexton [Wed, 11 Oct 2017 16:18:47 +0000 (17:18 +0100)]
net/i40e: support converting VF MAC to VF id

Need a way to convert a VF id to a PF id on the host so as to query the
PF for relevant statistics which are used for the frequency changes in
the vm_power_manager app.

Used when profiles are passed down from the guest to the host, allowing
the host to map the VFs to PFs.

Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com>
Signed-off-by: Rory Sexton <rory.sexton@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agobus: ignore scan and probe failures
Shreyansh Jain [Sat, 12 Aug 2017 10:22:20 +0000 (15:52 +0530)]
bus: ignore scan and probe failures

Bus scan is responsible for finding devices over *all* buses.
Some of these buses might not be able to scan but that should
not prevent other buses to be scanned.

Same is the case for probing. It is possible that some devices which
were scanned didn't have a specific driver. That should not prevent
other buses from being probed.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
6 years agotimer: use 64-bit specific code on more platforms
Jerin Jacob [Sun, 13 Aug 2017 12:33:38 +0000 (18:03 +0530)]
timer: use 64-bit specific code on more platforms

64bit load and store will be an atomic operation on all the
64bit processors.
Change RTE_ARCH_X86_64 to RTE_ARCH_64 to reflect the case.

Fixes: 9b15ba895b9f ("timer: use a skip list")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agotimer: allow reset on service cores
Pavan Nikhilesh [Thu, 21 Sep 2017 20:10:03 +0000 (01:40 +0530)]
timer: allow reset on service cores

The rte_timer_reset function should be able to register timers on service
lcores as they are EAL threads.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
6 years agoeal: add function to check lcore role
Pavan Nikhilesh [Thu, 21 Sep 2017 10:59:17 +0000 (16:29 +0530)]
eal: add function to check lcore role

This function can be used to check the role of a specific lcore.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
6 years agoeal/x86: use cpuid builtin
Sergio Gonzalez Monroy [Wed, 23 Aug 2017 15:00:27 +0000 (16:00 +0100)]
eal/x86: use cpuid builtin

GCC does have the __get_cpuid_count builtin which checks for maximum
supported leaf, but implementations differ between CLANG and GCC.

This change provides an implementation compatible with both GCC and
CLANG 3.4+.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
6 years agovhost: fix false-positive warning from clang 5
Bruce Richardson [Wed, 11 Oct 2017 11:28:17 +0000 (12:28 +0100)]
vhost: fix false-positive warning from clang 5

When compiling with clang extra warning flags, such as used by default with
meson, a warning is given in iotlb.c:

lib/librte_vhost/iotlb.c:318:6: warning:
variable 'socket' is used uninitialized whenever
'if' condition is false [-Wsometimes-uninitialized]

This is a false positive, as the socket value will be initialized by the
call to get_mempolicy in the case where the NUMA build-time flag is set,
and in cases where it is not set, "if (ret)" will always be true as ret is
initialized to -1 and never changed.

However, this is not immediately obvious, and is perhaps a little fragile,
as it will break if other code using ret is subsequently added above the
call to get_mempolicy by someone unaware of this subtle dependency.
Therefore, we can fix the warning and making the code more robust by
explicitly initializing socket to zero, and moving the extra condition
check on the return from get_mempolicy() into the #ifdef

Fixes: d012d1f293f4 ("vhost: add IOTLB helper functions")

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
6 years agotest/eventdev: add tests for eth Rx adapter APIs
Nikhil Rao [Tue, 10 Oct 2017 14:18:44 +0000 (19:48 +0530)]
test/eventdev: add tests for eth Rx adapter APIs

Add unit tests for rte_event_eth_rx_adapter_xxx() APIs

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agoeventdev: add eth Rx adapter implementation
Nikhil Rao [Tue, 10 Oct 2017 22:21:36 +0000 (03:51 +0530)]
eventdev: add eth Rx adapter implementation

The adapter implementation uses eventdev PMDs to configure the packet
transfer if HW support is available and if not, it uses an EAL service
function that reads packets from ethernet Rx queues and injects these
as events into the event device.

Signed-off-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agoeventdev: add event type for eth Rx adapter
Nikhil Rao [Tue, 10 Oct 2017 22:21:35 +0000 (03:51 +0530)]
eventdev: add event type for eth Rx adapter

Add RTE_EVENT_TYPE_ETH_RX_ADAPTER event type. Certain platforms (e.g.,
octeontx), in the event dequeue function, need to identify events
injected from ethernet hardware into eventdev so that DPDK mbuf can be
populated from the HW descriptor.

Events injected from ethernet hardware would use an event type of
RTE_EVENT_TYPE_ETHDEV and events injected from the rx adapter service
function would use an event type of RTE_EVENT_TYPE_ETH_RX_ADAPTER to
help the event dequeue function differentiate between these two event
sources.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agoeventdev: add eth Rx adapter API
Nikhil Rao [Tue, 10 Oct 2017 22:21:34 +0000 (03:51 +0530)]
eventdev: add eth Rx adapter API

Add common APIs for configuring packet transfer from ethernet Rx
queues to event devices across HW & SW packet transfer mechanisms.
A detailed description of the adapter is contained in the header's
comments.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agoevent/sw: add eth Rx adapter capabilities function
Nikhil Rao [Tue, 10 Oct 2017 22:21:33 +0000 (03:51 +0530)]
event/sw: add eth Rx adapter capabilities function

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agoeventdev: add PMD callbacks for eth Rx adapter
Nikhil Rao [Tue, 10 Oct 2017 22:21:32 +0000 (03:51 +0530)]
eventdev: add PMD callbacks for eth Rx adapter

The PMD callbacks are used by the rte_event_eth_rx_xxx() APIs to
configure and control the ethernet receive adapter if packet transfers
from the ethdev to eventdev is implemented in hardware.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agoeventdev: add capabilities API
Nikhil Rao [Tue, 10 Oct 2017 22:21:31 +0000 (03:51 +0530)]
eventdev: add capabilities API

The caps API allows application to retrieve capability information
needed to configure the ethernet Rx adapter for the eventdev and
ethdev pair.

For e.g., the ethdev, eventdev pairing maybe such that all of the
ethdev Rx queues can only be connected to a single event queue, in
this case the application is required to pass in -1 as the queue id
when adding a receive queue to the adapter.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agoeventdev: extend port attribute get function
Gage Eads [Wed, 20 Sep 2017 15:21:02 +0000 (10:21 -0500)]
eventdev: extend port attribute get function

This commit adds the new_event_threshold port attribute, so the entire port
configuration structure passed to rte_event_queue_setup can be queried.

Signed-off-by: Gage Eads <gage.eads@intel.com>
6 years agoeventdev: extend queue attribute get function
Gage Eads [Wed, 20 Sep 2017 15:21:01 +0000 (10:21 -0500)]
eventdev: extend queue attribute get function

This commit adds three new queue attributes, so that the entire queue
configuration structure passed to rte_event_queue_setup can be queried.

Signed-off-by: Gage Eads <gage.eads@intel.com>
6 years agoevent/sw: rename map file to standard name
Bruce Richardson [Thu, 14 Sep 2017 14:47:28 +0000 (15:47 +0100)]
event/sw: rename map file to standard name

Naming convention for event drivers is "rte_pmd_<name>_event_version.map"

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
6 years agoeventdev: bump library version
Harry van Haaren [Wed, 20 Sep 2017 13:36:03 +0000 (14:36 +0100)]
eventdev: bump library version

This commit bumps the library version to refect the ABI change
caused by removing the individual rte_event_port_count, queue_count,
and other get functions. These functions are superseded by the
get-attribute style API, which allows fetching values without API/ABI
changes.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
6 years agoeventdev: add device started attribute
Harry van Haaren [Wed, 20 Sep 2017 13:36:02 +0000 (14:36 +0100)]
eventdev: add device started attribute

This commit adds an attribute to the eventdev, allowing applications
to retrieve if the eventdev is running or stopped. Note that no API
or ABI changes were required in adding the statistic, and code changes
are minimal.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agoeventdev: add queue attribute function
Harry van Haaren [Wed, 20 Sep 2017 13:36:01 +0000 (14:36 +0100)]
eventdev: add queue attribute function

This commit adds a generic queue attribute function. It also removes
the previous rte_event_queue_priority() and priority() functions, and
updates the map files and unit tests to use the new attr functions.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
6 years agoeventdev: add dev attribute get function
Harry van Haaren [Wed, 20 Sep 2017 13:36:00 +0000 (14:36 +0100)]
eventdev: add dev attribute get function

This commit adds a device attribute function, allowing flexible
fetching of device attributes, like port count or queue count.
The unit tests and .map file are updated to the new function.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agoeventdev: add port attribute function
Harry van Haaren [Wed, 20 Sep 2017 13:35:59 +0000 (14:35 +0100)]
eventdev: add port attribute function

This commit reworks the port functions to retrieve information
about the port, like the enq or deq depths. Note that "port count"
is a device attribute, and is added in a later patch for dev attributes.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
6 years agoeventdev: clarify usage of forward and release ops
Tim McDaniel [Wed, 6 Sep 2017 15:42:07 +0000 (10:42 -0500)]
eventdev: clarify usage of forward and release ops

Update doxygen to make it clear that RTE_EVENT_OP_FORWARD and
RTE_EVENT_OP_RELEASE must only be enqueued to the same port that the
original event was dequeued from.

Signed-off-by: Tim McDaniel <timothy.mcdaniel@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agoeventdev: ease single-link queue config requirements
Gage Eads [Wed, 9 Aug 2017 19:58:04 +0000 (14:58 -0500)]
eventdev: ease single-link queue config requirements

Events sent through single-link queues are naturally in-order and
atomic, without reordering or atomic scheduling. Logically the
nb_atomic_flows and nb_atomic_order_sequences arguments don't apply to a
single link queue, but applications must set these (depending on the queue
config type) to bypass the is_valid_{ordered, atomic}_queue_conf() checks
in the eventdev layer.

This commit updates those is_valid_* functions to ignore queues with the
SINGLE_LINK flag, to simplify their configuration.

Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agovhost: distinguish master and slave requests
Maxime Coquelin [Tue, 10 Oct 2017 12:47:54 +0000 (14:47 +0200)]
vhost: distinguish master and slave requests

This patch adds an union in VhostUserMsg to distinguish between
master and slave initiated requests, instead of casting slave
requests as master request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: add user callbacks for socket open/close
Dariusz Stojaczyk [Wed, 30 Aug 2017 10:50:58 +0000 (12:50 +0200)]
vhost: add user callbacks for socket open/close

Added new callbacks to notify about socket connection status.
As destroy_device is used for virtqueue processing *pause* as well as
connection close, the user has no distinction between those.

Consider the following scenario:
rte_vhost: received SET_VRING_BASE message,
           calling destroy_device() as usual

user:  end-user asks to remove the device (together with socket file),
       OK, device is not *in use* - that's NOT the behavior we want
       calling rte_vhost_driver_unregister() etc.

Instead of changing new_device/destroy_device callbacks and breaking
the ABI, a set of new functions new_connection/destroy_connection
has been added.

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
6 years agovhost: check poll error code
Kuba Kozak [Fri, 22 Sep 2017 12:17:40 +0000 (14:17 +0200)]
vhost: check poll error code

Add return value check for poll() call.

Coverity issue: 140740
Fixes: 59317cef249c ("vhost: allow many vhost-user ports")
Cc: stable@dpdk.org
Signed-off-by: Kuba Kozak <kubax.kozak@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio-user: fix TAP name string termination
Sebastian Basierski [Tue, 19 Sep 2017 11:41:04 +0000 (13:41 +0200)]
net/virtio-user: fix TAP name string termination

Fix calling strncpy with the a maximum size equal of destination
array size.

Coverity issue: 140732
Fixes: e3b434818bbb ("net/virtio-user: support kernel vhost")
Cc: stable@dpdk.org
Signed-off-by: Sebastian Basierski <sebastianx.basierski@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: use pointer to replace memcpy
Zhiyong Yang [Fri, 11 Aug 2017 02:13:18 +0000 (10:13 +0800)]
net/virtio: use pointer to replace memcpy

To use pointer instead of memcpy can save many cycles in the funciton
virtio_send_command.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: fix a typo
Jay Zhou [Tue, 22 Aug 2017 02:34:36 +0000 (10:34 +0800)]
net/virtio: fix a typo

Fixed a comment in struct virtionet_ctl, referring to the ring type

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: enable IOMMU support
Maxime Coquelin [Thu, 5 Oct 2017 08:36:27 +0000 (10:36 +0200)]
vhost: enable IOMMU support

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: invalidate vring in case of matching IOTLB invalidate
Maxime Coquelin [Thu, 5 Oct 2017 08:36:26 +0000 (10:36 +0200)]
vhost: invalidate vring in case of matching IOTLB invalidate

As soon as a page used by a ring is invalidated, the access_ok flag
is cleared, so that processing threads try to map them again.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: postpone device creation until rings are mapped
Maxime Coquelin [Thu, 5 Oct 2017 08:36:25 +0000 (10:36 +0200)]
vhost: postpone device creation until rings are mapped

Translating the start addresses of the rings is not enough, we need to
be sure all the ring is made available by the guest.

It depends on the size of the rings, which is not known on SET_VRING_ADDR
reception. Furthermore, we need to be be safe against vring pages
invalidates.

This patch introduces a new access_ok flag per virtqueue, which is set
when all the rings are mapped, and cleared as soon as a page used by a
ring is invalidated. The invalidation part is implemented in a following
patch.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: translate ring addresses when IOMMU enabled
Maxime Coquelin [Thu, 5 Oct 2017 08:36:24 +0000 (10:36 +0200)]
vhost: translate ring addresses when IOMMU enabled

When IOMMU is enabled, the ring addresses set by the
VHOST_USER_SET_VRING_ADDR requests are guest's IO virtual addresses,
whereas Qemu virtual addresses when IOMMU is disabled.

When enabled and the required translation is not in the IOTLB cache,
an IOTLB miss request is sent, but being called by the vhost-user
socket handling thread, the function does not wait for the requested
IOTLB update.

The function will be called again on the next IOTLB update message
reception if matching the vring addresses.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: postpone rings addresses translation
Maxime Coquelin [Thu, 5 Oct 2017 08:36:23 +0000 (10:36 +0200)]
vhost: postpone rings addresses translation

This patch postpones rings addresses translations and checks, as
addresses sent by the master shuld not be interpreted as long as
ring is not started and enabled[0].

When protocol features aren't negotiated, the ring is started in
enabled state, so the addresses translations are postponed to
vhost_user_set_vring_kick().
Otherwise, it is postponed to when ring is enabled, in
vhost_user_set_vring_enable().

[0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04355.html

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: fix dereferencing invalid pointer after realloc
Maxime Coquelin [Thu, 5 Oct 2017 08:36:22 +0000 (10:36 +0200)]
vhost: fix dereferencing invalid pointer after realloc

numa_realloc() reallocates the virtio_net device structure and
updates the vhost_devices[] table with the new pointer if the rings
are allocated different NUMA node.

Problem is that vhost_user_msg_handler() still dereferences old
pointer afterward.

This patch prevents this by fetching again the dev pointer in
vhost_devices[] after messages have been handled.

Fixes: af295ad4698c ("vhost: realloc device and queues to same numa node as vring desc")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: enable rings at the right time
Maxime Coquelin [Thu, 5 Oct 2017 08:36:21 +0000 (10:36 +0200)]
vhost: enable rings at the right time

When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the ring is not
enabled when started, but enabled through dedicated
VHOST_USER_SET_VRING_ENABLE request.

When not negotiated, the ring is started in enabled state, at
VHOST_USER_SET_VRING_KICK request time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: use the guest IOVA to host VA helper
Maxime Coquelin [Thu, 5 Oct 2017 08:36:20 +0000 (10:36 +0200)]
vhost: use the guest IOVA to host VA helper

Replace rte_vhost_gpa_to_vva() calls with vhost_iova_to_vva(), which
requires to also pass the mapped len and the access permissions needed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: introduce guest IOVA to backend VA helper
Maxime Coquelin [Thu, 5 Oct 2017 08:36:19 +0000 (10:36 +0200)]
vhost: introduce guest IOVA to backend VA helper

This patch introduces vhost_iova_to_vva() function to translate
guest's IO virtual addresses to backend's virtual addresses.

When IOMMU is enabled, the IOTLB cache is queried to get the
translation. If missing from the IOTLB cache, an IOTLB_MISS request
is sent to Qemu, and IOTLB cache is queried again on IOTLB event
notification.

When IOMMU is disabled, the passed address is a guest's physical
address, so the legacy rte_vhost_gpa_to_vva() API is used.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: handle IOTLB update and invalidate requests
Maxime Coquelin [Thu, 5 Oct 2017 08:36:18 +0000 (10:36 +0200)]
vhost: handle IOTLB update and invalidate requests

Vhost-user device IOTLB protocol extension introduces
VHOST_USER_IOTLB message type. The associated payload is the
vhost_iotlb_msg struct defined in Kernel, which in this was can
be either an IOTLB update or invalidate message.

On IOTLB update, the virtqueues get notified of a new entry.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: initialize vrings IOTLB caches
Maxime Coquelin [Thu, 5 Oct 2017 08:36:17 +0000 (10:36 +0200)]
vhost: initialize vrings IOTLB caches

The per-virtqueue IOTLB cache init is done at virtqueue
init time. init_vring_queue() now takes vring id as parameter,
so that the IOTLB cache mempool name can be generated.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: support IOTLB miss slave requests
Maxime Coquelin [Thu, 5 Oct 2017 08:36:16 +0000 (10:36 +0200)]
vhost: support IOTLB miss slave requests

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: add pending IOTLB miss request list and helpers
Maxime Coquelin [Thu, 5 Oct 2017 08:36:15 +0000 (10:36 +0200)]
vhost: add pending IOTLB miss request list and helpers

In order to be able to handle other ports or queues while waiting
for an IOTLB miss reply, a pending list is created so that waiter
can return and restart later on with sending again a miss request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: add IOTLB helper functions
Maxime Coquelin [Thu, 5 Oct 2017 08:36:14 +0000 (10:36 +0200)]
vhost: add IOTLB helper functions

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: add IOMMU-related macros for old kernels
Maxime Coquelin [Thu, 5 Oct 2017 08:36:13 +0000 (10:36 +0200)]
vhost: add IOMMU-related macros for old kernels

These defines and enums have been introduced in upstream kernel v4.8,
and backported to RHEL 7.4.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: support slave requests channel
Maxime Coquelin [Thu, 5 Oct 2017 08:36:12 +0000 (10:36 +0200)]
vhost: support slave requests channel

Currently, only QEMU sends requests, the backend sends
replies. In some cases, the backend may need to send
requests to QEMU, like IOTLB miss events when IOMMU is
supported.

This patch introduces a new channel for such requests.
QEMU sends a file descriptor of a new socket using
VHOST_USER_SET_SLAVE_REQ_FD.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: prepare for slave requests
Maxime Coquelin [Thu, 5 Oct 2017 08:36:11 +0000 (10:36 +0200)]
vhost: prepare for slave requests

send_vhost_message() is currently only used to send
replies, so it modifies message flags to perpare the
reply.

With upcoming channel for backend initiated request,
this function can be used to send requests.

This patch introduces a new send_vhost_reply() that
does the message flags modifications, and makes
send_vhost_message() generic.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: make error handling consistent in Rx path
Maxime Coquelin [Thu, 5 Oct 2017 08:36:10 +0000 (10:36 +0200)]
vhost: make error handling consistent in Rx path

In the non-mergeable receive case, when copy_mbuf_to_desc()
call fails the packet is skipped, the corresponding used element
len field is set to vnet header size, and it continues with next
packet/desc. It could be a problem because it does not know why
it failed, and assume the desc buffer is large enough.

In mergeable receive case, when copy_mbuf_to_desc_mergeable()
fails, packets burst is simply stopped.

This patch makes the non-mergeable error path to behave as the
mergeable one, as it seems the safest way. Also, doing this way
will simplify pending IOTLB miss requests handling.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: revert workaround MQ fails to startup
Maxime Coquelin [Thu, 5 Oct 2017 08:36:09 +0000 (10:36 +0200)]
vhost: revert workaround MQ fails to startup

This reverts commit 04d81227960b ("vhost: workaround MQ fails to
startup").

As agreed when this workaround was introduced, it can be reverted
as Qemu v2.10 that fixes the issue is now out.

The reply-ack feature is required for vhost-user IOMMU support.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: fix untrusted scalar value
Daniel Mrzyglod [Fri, 22 Sep 2017 15:21:49 +0000 (17:21 +0200)]
net/virtio: fix untrusted scalar value

The unscrutinized value may be incorrectly assumed to be within a certain
range by later operations.

In vhost_user_read: An unscrutinized value from an untrusted source used
in a trusted context - the value of sz_payload may be harmfull and we need
limit them to the max value of payload.

Coverity issue: 139601
Fixes: 6a84c37e3975 ("net/virtio-user: add vhost-user adapter layer")
Cc: stable@dpdk.org
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: fix Rx handler when checksum is requested
Olivier Matz [Thu, 7 Sep 2017 12:13:47 +0000 (14:13 +0200)]
net/virtio: fix Rx handler when checksum is requested

The simple Rx handler is selected even if Rx checksum offload is
requested by the application, but this handler does not support
offloads. This results in broken received packets (no checksum flag but
invalid checksum in the mbuf data).

Disable the simple Rx handler in that case.

Fixes: 96cb6711939e ("net/virtio: support Rx checksum offload")

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: keep Rx handler whatever the Tx queue config
Olivier Matz [Thu, 7 Sep 2017 12:13:46 +0000 (14:13 +0200)]
net/virtio: keep Rx handler whatever the Tx queue config

Split use_simple_rxtx into use_simple_rx and use_simple_tx,
and ensure that only use_simple_tx is updated when txq flags
forces to use the standard Tx handler.

This change is also useful for next commit (disable simple Rx
path when Rx checksum is requested).

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: remove SSE check
Olivier Matz [Thu, 7 Sep 2017 12:13:45 +0000 (14:13 +0200)]
net/virtio: remove SSE check

Since commit f27769f796a0 ("mk: require SSE4.2 support on all x86
platforms"), SSE4.2 is a requirement when compiling on x86 platforms.

We can remove this check in the virtio driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: rationalize setting of Rx/Tx handlers
Olivier Matz [Thu, 7 Sep 2017 12:13:44 +0000 (14:13 +0200)]
net/virtio: rationalize setting of Rx/Tx handlers

The selection of Rx/Tx handlers is done at several places,
group them in one function set_rxtx_funcs().

The update of hw->use_simple_rxtx is also rationalized:
- initialized to 1 (prefer simple path)
- in dev configure or rx/tx queue setup, if something prevents from
  using the simple path, change it to 0.
- in dev start, set the handlers according to hw->use_simple_rxtx.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: fix queue setup consistency
Olivier Matz [Thu, 7 Sep 2017 12:13:43 +0000 (14:13 +0200)]
net/virtio: fix queue setup consistency

In rx/tx queue setup functions, some code is executed only if
use_simple_rxtx == 1. The value of this variable can change depending on
the offload flags or sse support. If Rx queue setup is called before Tx
queue setup, it can result in an invalid configuration:

- dev_configure is called: use_simple_rxtx is initialized to 0
- rx queue setup is called: queues are initialized without simple path
  support
- tx queue setup is called: use_simple_rxtx switch to 1, and simple
  Rx/Tx handlers are selected

Fix this by postponing a part of Rx/Tx queue initialization in
dev_start(), as it was the case in the initial implementation.

Fixes: 48cec290a3d2 ("net/virtio: move queue configure code to proper place")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: fix mbuf port for simple Rx function
Olivier Matz [Thu, 7 Sep 2017 12:13:42 +0000 (14:13 +0200)]
net/virtio: fix mbuf port for simple Rx function

The mbuf->port was was not properly set for the first received
mbufs. Fix this by setting it in virtqueue_enqueue_recv_refill_simple(),
which is used to enqueue the first mbuf in the ring.

The function virtio_rxq_rearm_vec(), which is used to rearm the ring
with new mbufs, is correct and does not need to be updated.

Fixes: cab0461234e7 ("virtio: fill Rx avail ring with blank mbufs")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: fix log levels in configure
Olivier Matz [Thu, 7 Sep 2017 12:13:41 +0000 (14:13 +0200)]
net/virtio: fix log levels in configure

On error, we should log with error level.

Fixes: 9f4f2846ef76 ("virtio: support vlan filtering")
Fixes: 86d59b21468a ("net/virtio: support LRO")
Fixes: 96cb6711939e ("net/virtio: support Rx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agodoc: fix description of L4 Rx checksum offload
Olivier Matz [Thu, 7 Sep 2017 12:13:40 +0000 (14:13 +0200)]
doc: fix description of L4 Rx checksum offload

As described in API documentation, the field hw_ip_checksum
requests both L3 and L4 offload.

Fixes: dad1ec72a377 ("doc: document NIC features")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: revert not claiming IP checksum offload
Olivier Matz [Thu, 7 Sep 2017 12:13:39 +0000 (14:13 +0200)]
net/virtio: revert not claiming IP checksum offload

This reverts
commit 4dab342b7522 ("net/virtio: do not falsely claim to do IP checksum").

The description of rxmode->hw_ip_checksum is:

     hw_ip_checksum   : 1, /**< IP/UDP/TCP checksum offload enable. */

Despite its name, this field can be set by an application to enable L3
and L4 checksums. In case of virtio, only L4 checksum is supported and
L3 checksums flags will always be set to "unknown".

Fixes: 4dab342b7522 ("net/virtio: do not falsely claim to do IP checksum")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: revert not claiming LRO support
Olivier Matz [Thu, 7 Sep 2017 12:13:38 +0000 (14:13 +0200)]
net/virtio: revert not claiming LRO support

This reverts
commit 701a64622c26 ("net/virtio: do not claim to support LRO")

Setting rxmode->enable_lro is a way to tell the host that the guest is
ok to receive tso packets. From the guest point of view, it is like
enabling LRO on a physical driver.

Fixes: 701a64622c26 ("net/virtio: do not claim to support LRO")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio-user: send kick notify backend on init
Steven Luong [Tue, 1 Aug 2017 16:17:36 +0000 (09:17 -0700)]
net/virtio-user: send kick notify backend on init

Acccording to the vhost-user spec [0], client must start ring
upon receiving a kick (that is, detecting that file descriptor
is reachable) on the descriptor specified by VHOST_USER_SET_VRING_KICK.

The code sends a kick to the rx queue. It is missing sending a
kick for the tx queue. This patch is to add the missing code to
comply with the spec.

[0]: https://fossies.org/linux/qemu/docs/specs/vhost-user.txt

Signed-off-by: Steven Luong <sluong@cisco.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovhost: batch small guest memory copies
Tiwei Bie [Fri, 8 Sep 2017 12:50:46 +0000 (20:50 +0800)]
vhost: batch small guest memory copies

This patch adaptively batches the small guest memory copies.
By batching the small copies, the efficiency of executing the
memory LOAD instructions can be improved greatly, because the
memory LOAD latency can be effectively hidden by the pipeline.
We saw great performance boosts for small packets PVP test.

This patch improves the performance for small packets, and has
distinguished the packets by size. So although the performance
for big packets doesn't change, it makes it relatively easy to
do some special optimizations for the big packets too.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: replace magic number with PCI constant
Zhiyong Yang [Thu, 3 Aug 2017 01:21:50 +0000 (09:21 +0800)]
net/virtio: replace magic number with PCI constant

To use macro instead of magic number in order to enhance code
readability.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agonet/virtio: fix indent
Zhiyong Yang [Thu, 3 Aug 2017 01:21:49 +0000 (09:21 +0800)]
net/virtio: fix indent

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
6 years agovfio: refactor PCI BAR mapping
Jonas Pfefferle [Fri, 6 Oct 2017 14:40:20 +0000 (16:40 +0200)]
vfio: refactor PCI BAR mapping

Split pci_vfio_map_resource for primary and secondary processes.
Save all relevant mapping data in primary process to allow
the secondary process to perform mappings.

Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
6 years agovfio: fix sPAPR IOMMU DMA window size
Jonas Pfefferle [Tue, 8 Aug 2017 11:16:42 +0000 (13:16 +0200)]
vfio: fix sPAPR IOMMU DMA window size

DMA window size needs to be big enough to span all memory segment's
physical addresses. We do not need multiple levels of IOMMU tables
as we already span ~70TB of physical memory with 16MB hugepages.

Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
6 years agobus/dpaa: fix memory allocation during scan
Shreyansh Jain [Tue, 10 Oct 2017 09:34:58 +0000 (15:04 +0530)]
bus/dpaa: fix memory allocation during scan

With the IOVA auto detection changes, bus scan is performed before
memory initialization. DPAA bus scan must not use rte_malloc in
its path.

Fixes: cf408c22476c ("eal: auto detect IOVA mode")

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
6 years agodoc: add use of mlockall to programmers guide
Eelco Chaudron [Mon, 2 Oct 2017 10:01:40 +0000 (12:01 +0200)]
doc: add use of mlockall to programmers guide

When I was adding mlockall() to the testpmd application it was
suggested to add a reference to the use case of mlockall(). This patch
adds is.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
6 years agoapp/testpmd: avoid pages being swapped out
Eelco Chaudron [Fri, 29 Sep 2017 08:11:10 +0000 (10:11 +0200)]
app/testpmd: avoid pages being swapped out

Call the mlockall() function, to attempt to lock all of its process
memory into physical RAM, and preventing the kernel from paging any
of its memory to disk.

When using testpmd for performance testing, depending on the code path
taken, we see a couple of page faults in a row. These faults effect
the overall drop-rate of testpmd. On Linux the mlockall() call will
prefault all the pages of testpmd (and the DPDK libraries if linked
dynamically), even without LD_BIND_NOW.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
6 years agoeal: copy raw strings taken from command line
Patrick MacArthur [Fri, 4 Aug 2017 18:53:57 +0000 (14:53 -0400)]
eal: copy raw strings taken from command line

Normally, command line argument strings are considered immutable, but
SPDK [1] and urdma [2] construct argv arrays to pass to rte_eal_init().
These strings are allocated using malloc() and freed after DPDK
initialization with free(). However, in the case of --file-prefix and
--huge-dir, DPDK takes the pointer to these strings in argv directly. If
a secondary process calls rte_eal_pci_probe() after rte_eal_init()
returns, as is done by SPDK, this causes a use-after-free error because
the strings have been freed by the calling code immediately after
rte_eal_init() returns.

This problem was observed when running SPDK example programs as a
secondary process and causes the secondary processes to fail:

Starting DPDK 16.11.1 initialization...
[ DPDK EAL parameters: identify -c 4 --file-prefix=spdk3260 --base-virtaddr=0x1000000000 --proc-type=auto ]
EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: SECONDARY
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:81:00.0 on NUMA socket 1
EAL:   probe driver: 8086:953 spdk_nvme
EAL:   cannot connect to primary process!
EAL: Error - exiting with code: 1
Cause: Requested device 0000:81:00.0 cannot be used

Running strace shows that the file prefix has been zero'd out by the
time that the secondary process attempts to probe the NVMe device.

The use-after-free errors can be easily detected with valgrind:

==8489== Invalid read of size 1
==8489==    at 0x4C30D22: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489==    by 0x58DB955: vfprintf (vfprintf.c:1637)
==8489==    by 0x59A4685: __vsnprintf_chk (vsnprintf_chk.c:63)
==8489==    by 0x59A45E7: __snprintf_chk (snprintf_chk.c:34)
==8489==    by 0x1246AB: get_socket_path.constprop.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x124B09: vfio_mp_sync_connect_to_primary (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x123BE4: vfio_get_group_fd.part.1 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x124366: vfio_setup_device (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x126C8A: pci_vfio_map_resource (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x12B115: pci_probe_all_drivers.part.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x12B596: rte_eal_pci_probe (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x11D5B5: spdk_pci_enumerate (pci.c:147)
==8489==  Address 0x63f362e is 14 bytes inside a block of size 32 free'd
==8489==    at 0x4C2ED5B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489==    by 0x11E6FB: spdk_free_args (init.c:136)
==8489==    by 0x11EBF5: spdk_env_init (init.c:309)
==8489==    by 0x10D2AA: main (identify.c:976)
==8489==  Block was alloc'd at
==8489==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489==    by 0x11E7D7: _sprintf_alloc (init.c:76)
==8489==    by 0x11EA78: spdk_build_eal_cmdline (init.c:251)
==8489==    by 0x11EA78: spdk_env_init (init.c:282)
==8489==    by 0x10D2AA: main (identify.c:976)
==8489==

Fix this by using strdup() to create separate memory buffers for these
strings. Note that this patch will cause valgrind to report memory
leaks of these buffers as there is nowhere to free them. Using static
buffers is an option but would make these strings have a fixed maximum
length whereas there is currently no limit defined by the API.

[1] http://spdk.io
[2] https://github.com/zrlio/urdma

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Patrick MacArthur <patrick@patrickmacarthur.net>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
6 years agomem: check mmap failure
Seth Howell [Mon, 28 Aug 2017 21:49:12 +0000 (14:49 -0700)]
mem: check mmap failure

If mmap fails, it will return the value MAP_FAILED. Checking for this
return code allows us to properly identify mmap failures and report
them as such to the calling function.

Signed-off-by: Seth Howell <seth.howell@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
6 years agomem: fix malloc element free in debug mode
Xueming Li [Sat, 9 Sep 2017 07:33:19 +0000 (15:33 +0800)]
mem: fix malloc element free in debug mode

malloc_elem_free() is clearing(setting to 0) the trailer cookie when
RTE_MALLOC_DEBUG is enabled. In case of joining free neighbor element,
part of joined memory is not getting cleared due to missing the length
of trailer cookie in the middle.

This patch fixes calculation of free memory length to be cleared in
malloc_elem_free() by including trailer cookie.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
6 years agomem: fix malloc debug config
Xueming Li [Sat, 9 Sep 2017 07:33:18 +0000 (15:33 +0800)]
mem: fix malloc debug config

This patch replaces broken macro RTE_LIBRTE_MALLOC_DEBUG with
RTE_MALLOC_DEBUG.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
6 years agoconfig: add option to enable asserts
Xueming Li [Thu, 24 Aug 2017 08:23:10 +0000 (16:23 +0800)]
config: add option to enable asserts

Currently, enabling assertion have to set CONFIG_RTE_LOG_LEVEL to
RTE_LOG_DEBUG. CONFIG_RTE_LOG_LEVEL is the default log level of control
path, RTE_LOG_DP_LEVEL is the log level of data path. It's a little bit
hard to understand literally that assertion is decided by control path
LOG_LEVEL, especially assertion used on data path.

On the other hand, DPDK need an assertion enabling switch w/o impacting
log output level, assuming "--log-level" not specified.

Assertion is an important API to balance DPDK high performance and
robustness. To promote assertion usage, it's valuable to unhide
assertion out of COFNIG_RTE_LOG_LEVEL.

In one word, log is log, assertion is assertion, debug is hot pot :)

Rationale of this patch is to introduce an dedicate switch of
assertion: RTE_ENABLE_ASSERT

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
6 years agotest: add check for AVX512F
Zhiyong Yang [Tue, 19 Sep 2017 02:25:31 +0000 (10:25 +0800)]
test: add check for AVX512F

The CPUs which support AVX512 have been released. Add support for
checking AVX512F instruction set.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
6 years agomempool/octeontx: fix icc build
Pablo de Lara [Mon, 9 Oct 2017 05:21:59 +0000 (06:21 +0100)]
mempool/octeontx: fix icc build

drivers/mempool/octeontx/octeontx_fpavf.c(789):
error #592: variable "fpa" is used before its value is set
        RTE_SET_USED(fpa);

Fixes: 1c842786fe6c ("mempool/octeontx: probe fpavf PCIe devices")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
6 years agoeal: remove Xen dom0 support
Jianfeng Tan [Thu, 14 Sep 2017 02:40:29 +0000 (02:40 +0000)]
eal: remove Xen dom0 support

We remove xen-specific code in EAL, including the option --xen-dom0,
memory initialization code, compiling dependency, etc.

Related documents are removed or updated, and bump the eal library
version.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>