dpdk.git
7 years agotest/eventdev: add basic SW tests
Harry van Haaren [Thu, 30 Mar 2017 19:30:44 +0000 (20:30 +0100)]
test/eventdev: add basic SW tests

This commit adds basic enqueue and dequeue unit tests,
some negative invalid tests, and configuration.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
7 years agotest/eventdev: add SW test infrastructure
Harry van Haaren [Thu, 30 Mar 2017 19:30:43 +0000 (20:30 +0100)]
test/eventdev: add SW test infrastructure

Add the test infrastructure, create and destroy the test
instance.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
7 years agoevent/sw: support xstats
Bruce Richardson [Thu, 30 Mar 2017 19:30:42 +0000 (20:30 +0100)]
event/sw: support xstats

Add support for xstats to report out on the state of the eventdev.
Useful for debugging and for unit tests, as well as observability
at runtime and performance tuning of apps to work well with the
scheduler.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
7 years agoevent/sw: add dump function for easier debugging
Bruce Richardson [Thu, 30 Mar 2017 19:30:41 +0000 (20:30 +0100)]
event/sw: add dump function for easier debugging

Segfault issue resolved when only partially configured and
rte_event_dev_dump() is called before start(),

Reported-by: Vipin Varghese <vipin.varghese@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
7 years agoevent/sw: add start stop and close functions
Bruce Richardson [Thu, 30 Mar 2017 19:30:40 +0000 (20:30 +0100)]
event/sw: add start stop and close functions

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoevent/sw: add scheduling logic
Bruce Richardson [Thu, 30 Mar 2017 19:30:39 +0000 (20:30 +0100)]
event/sw: add scheduling logic

Add in the scheduling function which takes the events from the
producer queues and buffers them before scheduling them to consumer
queues. The scheduling logic includes support for atomic, reordered,
and parallel scheduling of flows.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
7 years agoevent/sw: add worker core functions
Bruce Richardson [Thu, 30 Mar 2017 19:30:38 +0000 (20:30 +0100)]
event/sw: add worker core functions

add the event enqueue, dequeue and release functions to the eventdev.
These also include tracking of stats for observability in the load of
the scheduler.
Internally in the enqueue function, the various types of enqueue
operations, to forward an existing event, to send a new event, to
drop a previous event, are converted to a series of flags which will
be used by the scheduler code to perform the needed actions for that
event.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoevent/sw: support linking queues to ports
Bruce Richardson [Thu, 30 Mar 2017 19:30:37 +0000 (20:30 +0100)]
event/sw: support linking queues to ports

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoevent/sw: support event ports
Bruce Richardson [Thu, 30 Mar 2017 19:30:36 +0000 (20:30 +0100)]
event/sw: support event ports

Add in the data-structures for the ports used by workers to send
packets to/from the scheduler. Also add in the functions to
create/destroy those ports.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoevent/sw: support event queues
Bruce Richardson [Thu, 30 Mar 2017 19:30:35 +0000 (20:30 +0100)]
event/sw: support event queues

Add in the data structures for the event queues, and the eventdev
functions to create and destroy those queues.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoevent/sw: return default port/queue config
Bruce Richardson [Thu, 30 Mar 2017 19:30:34 +0000 (20:30 +0100)]
event/sw: return default port/queue config

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoevent/sw: add configure function
Bruce Richardson [Thu, 30 Mar 2017 19:30:33 +0000 (20:30 +0100)]
event/sw: add configure function

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoevent/sw: add device capabilities function
Bruce Richardson [Thu, 30 Mar 2017 19:30:32 +0000 (20:30 +0100)]
event/sw: add device capabilities function

Add in the info_get function to return details on the queues, flow,
prioritization capabilities, etc. that this device has.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoevent/sw: add new software-only eventdev driver
Bruce Richardson [Thu, 30 Mar 2017 19:30:31 +0000 (20:30 +0100)]
event/sw: add new software-only eventdev driver

This adds the minimal changes to allow a SW eventdev implementation to
be compiled, linked and created at run time. The eventdev does nothing,
but can be created via vdev on commandline, e.g.

  sudo ./x86_64-native-linuxapp-gcc/app/test --vdev=event_sw0
  ...
  PMD: Creating eventdev sw device event_sw0, numa_node=0, sched_quanta=128
  RTE>>

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agotest/eventdev: pass timeout ticks unsupported
Harry van Haaren [Thu, 30 Mar 2017 19:30:30 +0000 (20:30 +0100)]
test/eventdev: pass timeout ticks unsupported

This commit reworks the return value handling of the
timeout ticks test. This feature is not mandatory for
a pmd, the eventdev layer returns -ENOTSUP if the PMD
doesn't implement the function.

The test is modified to check if the return value is
-ENOTSUP, and return -ENOTSUP to the test framework,
which can handle "unsupported" tests since patch[1].

As such, this test will function correctly if the
patchset linked below is applied, it fails if the
patch is not applied and the PMD doesn't the timeout
ticks function.

Note it does not depend (as a compile time dependency)
on the patchset linked below.

[1] http://dpdk.org/dev/patchwork/patch/21979/

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoeventdev: improve docs of start function
Harry van Haaren [Thu, 30 Mar 2017 19:30:29 +0000 (20:30 +0100)]
eventdev: improve docs of start function

This commit documents two error return values for the
rte_event_dev_start() function.

-ESTALE  indicates not all ports are configured
-ENOLINK indicates that not all queues are linked to ports. If an
         application enqueues to such a queue it can lead to deadlock

Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
7 years agodoc: add eventdev library to release notes
Jerin Jacob [Fri, 31 Mar 2017 14:20:29 +0000 (19:50 +0530)]
doc: add eventdev library to release notes

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
7 years agoeventdev: add errno-style return values
Gage Eads [Mon, 3 Apr 2017 12:35:51 +0000 (18:05 +0530)]
eventdev: add errno-style return values

This commit adds rte_errno return values to rte_event_enqueue_burst() and
rte_event_dequeue_burst().

These return values allows user software to differentiate between an
invalid argument (such as an invalid queue_id or sched_type in an enqueued
event) and backpressure from the event device.

The port and device ID checks are placed in RTE_LIBRTE_EVENTDEV_DEBUG
header guards to avoid the performance hit in non-debug execution.

Signed-off-by: Gage Eads <gage.eads@intel.com>
7 years agoeventdev: add extended stats
Bruce Richardson [Fri, 10 Mar 2017 19:43:19 +0000 (19:43 +0000)]
eventdev: add extended stats

Add in APIs for extended stats so that eventdev implementations can report
out information on their internal state. The APIs are based on, but not
identical to, the equivalent ethdev functions.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agotest/eventdev: link all queues before start
Harry van Haaren [Fri, 10 Mar 2017 19:43:17 +0000 (19:43 +0000)]
test/eventdev: link all queues before start

The software eventdev can lock-up if not all queues are
linked to a port. For this reason, the software evendev
fails to start if queues are not linked to anything.

This commit creates dummy links from all queues to port
0 in the eventdev setup function and start/stop test,
which would otherwise fail due to unlinked queues.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoeventdev: remove default queue overriding
Harry van Haaren [Fri, 10 Mar 2017 15:19:15 +0000 (15:19 +0000)]
eventdev: remove default queue overriding

PMDs that only do a specific type of scheduling cannot provide
CFG_ALL_TYPES, so the Eventdev infrastructure should not demand
that every PMD supports CFG_ALL_TYPES.

By not overriding the default configuration of the queue as
suggested by the PMD, the eventdev_common unit tests can pass
on all PMDs, regardless of their capabilities.

RTE_EVENT_QUEUE_CFG_DEFAULT is no longer used by the eventdev layer
it can be removed now. Applications should use CFG_ALL_TYPES
if they require enqueue of all types a queue, or specify which
type of queue they require.

The CFG_DEFAULT value is changed to CFG_ALL_TYPES in event/skeleton,
to not break the compile.

A capability flag is added that indicates if the underlying PMD
supports creating queues of ALL_TYPES.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agotest/eventdev: fix reconfigure values
Jerin Jacob [Fri, 3 Mar 2017 17:27:44 +0000 (22:57 +0530)]
test/eventdev: fix reconfigure values

Minimum value of nb_event_ports and/or nb_event_queues
should be one before reconfiguring the event device.

Fixes: f8f9d233ea0e ("test/eventdev: add unit tests")

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
7 years agoeventdev: return code in dequeue timeout conversion
Jerin Jacob [Fri, 3 Mar 2017 17:27:43 +0000 (22:57 +0530)]
eventdev: return code in dequeue timeout conversion

eventdev driver may return error on dequeue timeout tick conversion.
Change the pmd callback interface to address the same.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
7 years agoeventdev: fix links map initialization for SW PMD
Gage Eads [Mon, 3 Apr 2017 12:35:50 +0000 (18:05 +0530)]
eventdev: fix links map initialization for SW PMD

This patch initializes the links_map array entries to
EVENT_QUEUE_SERVICE_PRIORITY_INVALID, as expected by
rte_event_port_links_get(). This is necessary for the sw eventdev PMD,
which does not initialize links_map when rte_event_port_setup() calls
rte_event_port_unlink().

Fixes: 4f0804bbdfb9 ("eventdev: implement the northbound APIs")

Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoeventdev: use generic device holder
Nipun Gupta [Fri, 3 Mar 2017 15:33:02 +0000 (21:03 +0530)]
eventdev: use generic device holder

rte_device is a generic device which is available to the applications
and EAL. This patch replaces rte_pci_device in 'struct rte_eventdev'
and in 'struct rte_event_dev_info' with common rte_device.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
7 years agoeventdev: improve API doc for timeout ticks
Harry van Haaren [Wed, 8 Mar 2017 10:35:34 +0000 (10:35 +0000)]
eventdev: improve API doc for timeout ticks

Improve the documentation of the return values of the
rte_event_dequeue_timeout_ticks() function, adding a
-ENOTSUP value for eventdevs that do not support waiting.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoeventdev: increase size of enq/deq conf variables
Harry van Haaren [Fri, 10 Mar 2017 19:43:16 +0000 (19:43 +0000)]
eventdev: increase size of enq/deq conf variables

Large port enqueue sizes were not supported as the value
it was stored in was a uint8_t. Using uint8_ts to save
space in config apis makes no sense - increasing the 3
instances of uint8_t enqueue / dequeue depths to more
appropriate values (based on the context around them).

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoeventdev: amend timeout criteria comment for burst dequeue
Nipun Gupta [Fri, 10 Feb 2017 16:26:50 +0000 (21:56 +0530)]
eventdev: amend timeout criteria comment for burst dequeue

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
7 years agoeventdev: clarify some parameter descriptions
Gage Eads [Mon, 3 Apr 2017 12:35:49 +0000 (18:05 +0530)]
eventdev: clarify some parameter descriptions

This commit clarifies the usage of nb_links and nb_unlinks when passing
a NULL pointer as the queues argument.

Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoeventdev: amend comments for events limit and threshold
Nipun Gupta [Tue, 14 Feb 2017 12:42:41 +0000 (18:12 +0530)]
eventdev: amend comments for events limit and threshold

Updated the comments on 'nb_events_limit' of 'struct rte_event_dev_config'
and 'new_event_threshold' of 'struct rte_event_port_conf'.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
7 years agoeventdev: limit port link operation to configured queues
Jerin Jacob [Mon, 6 Feb 2017 05:29:37 +0000 (10:59 +0530)]
eventdev: limit port link operation to configured queues

On port_setup, the link_map is updated only
for configured number of event queues.
Limit the port_links_get scan only to configured number
of event queues. Also, Limit the port link and unlink queue
validation to configured number of event queues.

Fixes: 4f0804bbdfb9 ("eventdev: implement the northbound APIs")

Reported-by: Nipun Gupta <nipun.gupta@nxp.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
7 years agoevent/skeleton: support vdev uninit
Jerin Jacob [Mon, 6 Feb 2017 05:23:39 +0000 (10:53 +0530)]
event/skeleton: support vdev uninit

Removed global index based device name
generation as vdev uninit needs the exact driver
name used vdev init.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
7 years agoeventdev: support vdev uninit
Jerin Jacob [Mon, 6 Feb 2017 05:23:38 +0000 (10:53 +0530)]
eventdev: support vdev uninit

Added eventdev vdev uninit support to release the resources
allocated in eventdev vdev init.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
7 years agoeventdev: fix event driver name to eventdev lookup
Jerin Jacob [Mon, 6 Feb 2017 05:23:37 +0000 (10:53 +0530)]
eventdev: fix event driver name to eventdev lookup

- Removed uninitialized max_devs value
- Corrected dev assignment

Fixes: 4f0804bbdfb9 ("eventdev: implement the northbound APIs")

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
7 years agoeventdev: remove unneeded dependencies
Bruce Richardson [Tue, 31 Jan 2017 16:14:19 +0000 (16:14 +0000)]
eventdev: remove unneeded dependencies

Since eventdev uses event structures rather than working directly on
mbufs, there is no actual dependencies on the mbuf library. The
inclusion of an mbuf pointer element inside the event itself does not
require the inclusion of the mbuf header file. Similarly the pci
header is not needed, but following their removal, rte_memory.h is
needed for the definition of the __rte_cache_aligned macro.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agoeventdev: update event port link and unlink callbacks
Nipun Gupta [Mon, 6 Feb 2017 19:04:37 +0000 (00:34 +0530)]
eventdev: update event port link and unlink callbacks

Added a pointer to the rte_eventdev type in the event port
link and unlink callbacks. This device shall be used by some
of the event drivers to fetch queue related information.

Also, update the skeleton eventdev driver with corresponding changes.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agotest/eventdev: add unit tests
Jerin Jacob [Fri, 18 Nov 2016 05:45:02 +0000 (11:15 +0530)]
test/eventdev: add unit tests

This commit adds basic unit tests for the eventdev API.

commands to run the test app:
./build/app/test -c 2
RTE>>eventdev_common_autotest

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agoevent/skeleton: add skeleton eventdev driver
Jerin Jacob [Fri, 18 Nov 2016 05:45:01 +0000 (11:15 +0530)]
event/skeleton: add skeleton eventdev driver

The skeleton driver facilitates, bootstrapping the new
eventdev driver and creates a platform to verify
the northbound eventdev common code.

The driver supports both VDEV and PCI based eventdev
devices.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agoeventdev: implement PMD registration functions
Jerin Jacob [Tue, 6 Dec 2016 02:24:15 +0000 (07:54 +0530)]
eventdev: implement PMD registration functions

This patch adds infrastructure for registering the vdev or
the PCI based event device.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agoeventdev: implement the northbound APIs
Jerin Jacob [Tue, 6 Dec 2016 01:51:46 +0000 (07:21 +0530)]
eventdev: implement the northbound APIs

This patch implements northbound eventdev API interface using
southbond driver interface

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agoeventdev: define southbound driver interface
Jerin Jacob [Tue, 6 Dec 2016 01:25:30 +0000 (06:55 +0530)]
eventdev: define southbound driver interface

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agoeventdev: introduce event driven programming model
Jerin Jacob [Fri, 18 Nov 2016 02:00:38 +0000 (07:30 +0530)]
eventdev: introduce event driven programming model

In a polling model, lcores poll ethdev ports and associated
rx queues directly to look for packet. In an event driven model,
by contrast, lcores call the scheduler that selects packets for
them based on programmer-specified criteria. Eventdev library
adds support for event driven programming model, which offer
applications automatic multicore scaling, dynamic load balancing,
pipelining, packet ingress order maintenance and
synchronization services to simplify application packet processing.

By introducing event driven programming model, DPDK can support
both polling and event driven programming models for packet processing,
and applications are free to choose whatever model
(or combination of the two) that best suits their needs.

This patch adds the eventdev specification header file.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
7 years agoexamples/vhost: demonstrate the new generic APIs
Yuanhan Liu [Sat, 1 Apr 2017 07:23:00 +0000 (15:23 +0800)]
examples/vhost: demonstrate the new generic APIs

Now DPDK vhost lib has been generic enough, that it can be used to
implement any vhost-user drivers.

For example, this patch implements a very simple vhost-user net driver,
mainly for demonstrating how to use those generic vhost APIs.

And when the --builtin-net-driver option is used, the example virtio-net
driver code will be invoked, instead of the one provided from the vhost
library.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agovhost: do not destroy device on repeat mem table message
Yuanhan Liu [Sat, 1 Apr 2017 07:22:59 +0000 (15:22 +0800)]
vhost: do not destroy device on repeat mem table message

It doesn't make any sense to invoke destroy_device() callback at
while handling SET_MEM_TABLE message.

From the vhost-user spec, it's the GET_VRING_BASE message indicates
the end of a vhost device: the destroy_device() should be invoked
from there (luckily, we already did that).

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: workaround the build dependency on mbuf header
Yuanhan Liu [Sat, 1 Apr 2017 07:22:58 +0000 (15:22 +0800)]
vhost: workaround the build dependency on mbuf header

rte_mbuf struct is something more likely will be used only in vhost-user
net driver, while we have made vhost-user generic enough that it can
be used for implementing other drivers (such as vhost-user SCSI), they
have also include <rte_mbuf.h>. Otherwise, the build will be broken.

We could workaround it by using forward declaration, so that other
non-net drivers won't need include <rte_mbuf.h>.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: rename header file
Yuanhan Liu [Sat, 1 Apr 2017 07:22:57 +0000 (15:22 +0800)]
vhost: rename header file

Rename "rte_virtio_net.h" to "rte_vhost.h", to not let it be virtio
net specific.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: introduce API to start a specific driver
Yuanhan Liu [Sat, 1 Apr 2017 07:22:56 +0000 (15:22 +0800)]
vhost: introduce API to start a specific driver

We used to use rte_vhost_driver_session_start() to trigger the vhost-user
session. It takes no argument, thus it's a global trigger. And it could
be problematic.

The issue is, currently, rte_vhost_driver_register(path, flags) actually
tries to put it into the session loop (by fdset_add). However, it needs
a set of APIs to set a vhost-user driver properly:
  * rte_vhost_driver_register(path, flags);
  * rte_vhost_driver_set_features(path, features);
  * rte_vhost_driver_callback_register(path, vhost_device_ops);

If a new vhost-user driver is registered after the trigger (think OVS-DPDK
that could add a port dynamically from cmdline), the current code will
effectively starts the session for the new driver just after the first
API rte_vhost_driver_register() is invoked, leaving later calls taking
no effect at all.

To handle the case properly, this patch introduce a new API,
rte_vhost_driver_start(path), to trigger a specific vhost-user driver.
To do that, the rte_vhost_driver_register(path, flags) is simplified
to create the socket only and let rte_vhost_driver_start(path) to
actually put it into the session loop.

Meanwhile, the rte_vhost_driver_session_start is removed: we could hide
the session thread internally (create the thread if it has not been
created). This would also simplify the application.

NOTE: the API order in prog guide is slightly adjusted for showing the
correct invoke order.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: export APIs for live migration support
Yuanhan Liu [Sat, 1 Apr 2017 07:22:55 +0000 (15:22 +0800)]
vhost: export APIs for live migration support

Export few APIs for the vhost-user driver to log the guest memory writes,
which is a must for live migration support.

This patch basically moves vhost_log_write() and vhost_log_used_vring()
into vhost.h and then add an wrapper (the public API) to them.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: add features changed callback
Yuanhan Liu [Sat, 1 Apr 2017 07:22:54 +0000 (15:22 +0800)]
vhost: add features changed callback

Features could be changed after the feature negotiation. For example,
VHOST_F_LOG_ALL will be set/cleared at the start/end of live migration,
respecitively. Thus, we need a new callback to inform the application
on such change.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: rename virtio-net to vhost
Yuanhan Liu [Sat, 1 Apr 2017 07:22:53 +0000 (15:22 +0800)]
vhost: rename virtio-net to vhost

Rename "virtio-net" to "vhost" in the API comments and vhost prog guide.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: rename device ops struct
Yuanhan Liu [Sat, 1 Apr 2017 07:22:52 +0000 (15:22 +0800)]
vhost: rename device ops struct

rename "virtio_net_device_ops" to "vhost_device_ops", to not let it
be virtio-net specific.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: do not include net specific headers
Yuanhan Liu [Sat, 1 Apr 2017 07:22:51 +0000 (15:22 +0800)]
vhost: do not include net specific headers

Include it internally, at vhost.h.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: drop the Rx and Tx queue macro
Yuanhan Liu [Sat, 1 Apr 2017 07:22:50 +0000 (15:22 +0800)]
vhost: drop the Rx and Tx queue macro

They are virtio-net specific and should be defined inside the virtio-net
driver.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: move the device ready check at proper place
Yuanhan Liu [Sat, 1 Apr 2017 07:22:49 +0000 (15:22 +0800)]
vhost: move the device ready check at proper place

Currently, we check vq->desc, vq->kickfd and vq->callfd to know whether
a virtio device is ready or not. However, we only do it when handling
SET_VRING_KICK message, which could be wrong if a vhost-user frontend
send SET_VRING_KICK first and SET_VRING_CALL later.

To work for all possible vhost-user frontend implementations, we could
move the ready check at the end of vhost-user message handler.

Meanwhile, since we do the check more often than before, the "virtio
not ready" message is dropped, to not flood the screen.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: export the number of vrings
Yuanhan Liu [Sat, 1 Apr 2017 07:22:48 +0000 (15:22 +0800)]
vhost: export the number of vrings

We used to use rte_vhost_get_queue_num() for telling how many vrings.
However, the return value is the number of "queue pairs", which is
very virtio-net specific. To make it generic, we should return the
number of vrings instead, and let the driver do the proper translation.
Say, virtio-net driver could turn it to the number of queue pairs by
dividing 2.

Meanwhile, mark rte_vhost_get_queue_num as deprecated.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: turn queue pair to vring
Yuanhan Liu [Sat, 1 Apr 2017 07:22:47 +0000 (15:22 +0800)]
vhost: turn queue pair to vring

The queue pair is very virtio-net specific, other devices don't have
such concept. To make it generic, we should log the number of vrings
instead of the number of queue pairs.

This patch just does a simple convert, a later patch would export the
number of vrings to applications.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: export API to translate gpa to vva
Yuanhan Liu [Sat, 1 Apr 2017 07:22:46 +0000 (15:22 +0800)]
vhost: export API to translate gpa to vva

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: export vhost vring info
Yuanhan Liu [Sat, 1 Apr 2017 07:22:45 +0000 (15:22 +0800)]
vhost: export vhost vring info

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: introduce API to fetch negotiated features
Yuanhan Liu [Sat, 1 Apr 2017 07:22:44 +0000 (15:22 +0800)]
vhost: introduce API to fetch negotiated features

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: export guest memory regions
Yuanhan Liu [Sat, 1 Apr 2017 07:22:43 +0000 (15:22 +0800)]
vhost: export guest memory regions

Some vhost-user driver may need this info to setup its own page tables
for GPA (guest physical addr) to HPA (host physical addr) translation.
SPDK (Storage Performance Development Kit) is one example.

Besides, by exporting this memory info, we could also export the
gpa_to_vva() as an inline function, which helps for performance.
Otherwise, it has to be referenced indirectly by a "vid".

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: make notify ops per vhost driver
Yuanhan Liu [Sat, 1 Apr 2017 07:22:42 +0000 (15:22 +0800)]
vhost: make notify ops per vhost driver

Assume there is an application both support vhost-user net and
vhost-user scsi, the callback should be different. Making notify
ops per vhost driver allow application define different set of
callbacks for different driver.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: use new APIs to handle features
Yuanhan Liu [Sat, 1 Apr 2017 07:22:41 +0000 (15:22 +0800)]
vhost: use new APIs to handle features

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agonet/vhost: remove feature related APIs
Yuanhan Liu [Sat, 1 Apr 2017 07:22:40 +0000 (15:22 +0800)]
net/vhost: remove feature related APIs

The rte_eth_vhost_feature_disable/enable/get APIs are just a wrapper of
rte_vhost_feature_disable/enable/get. However, the later are going to
be refactored; it's going to take an extra parameter (socket_file path),
to let it be per-device.

Instead of changing those vhost-pmd APIs to adapt to the new vhost APIs,
we could simply remove them, and let vdev to serve this purpose. After
all, vdev options is better for disabling/enabling some features.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: introduce driver features related APIs
Yuanhan Liu [Sat, 1 Apr 2017 07:22:39 +0000 (15:22 +0800)]
vhost: introduce driver features related APIs

Introduce few APIs to set/get/enable/disable driver features.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agovhost: fix fd leaks for vhost-user server mode
Yuanhan Liu [Mon, 27 Mar 2017 08:52:15 +0000 (16:52 +0800)]
vhost: fix fd leaks for vhost-user server mode

A vhost-user server socket could have many connections, thus many connfd.
However, we currently just use one single int var to store it. Meaning,
it will get overwritten every time a new connection is created.

While this will not create fatal issue as it sounds (since the correct
connfd is closured to the event loop thread by fdset_add), it may cause
fd leaks if a user invokes rte_vhost_driver_unregister before shutting
down all connections: it just closes the recent connfd.

A simple example that should be able to reproduce this leaks issues is,
del the ovs vhost-user port while the connected VMs are still alive. (Note
that it's suggested to use one socket for one VM, which makes the issue
not that fatal as it sounds again).

Since we already use a struct "vhost_user_connection" to track all info
about one connection, it's obvious that we should put the connfd there.
Then we could build a connection list inside the vhost_user_socket struct,
to represent all connections belong that socket file.

Fixes: 164fd396788d ("vhost: fix unregistering in client mode")
Cc: stable@dpdk.org
Cc: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agonet/virtio-user: support LSC
Jianfeng Tan [Fri, 31 Mar 2017 19:44:58 +0000 (19:44 +0000)]
net/virtio-user: support LSC

So far, virtio-user with vhost-user as the backend can only support
client mode. So when vhost user backend is down, i.e., unix socket
connection is broken, the connection cannot be re-connected. We will
forcely set the link state to be down.

Note: virtio-user with vhost-kernel as the backend still cannot
support lsc now as we fail to find a way to monitor the backend, tap
device, up/down events.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agonet/virtio-user: support to report net status
Jianfeng Tan [Fri, 31 Mar 2017 19:44:57 +0000 (19:44 +0000)]
net/virtio-user: support to report net status

Originally, we did not report support of VIRTIO_NET_F_STATUS.
This feature is not reported by vhost backend, instead, it
is added/removed by QEMU in virtio PCI case.

We report the support of this feature so that following patch
will depend on this feature to enable LSC interrupt.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agonet/virtio-user: support Rx interrupt
Jianfeng Tan [Fri, 31 Mar 2017 19:44:56 +0000 (19:44 +0000)]
net/virtio-user: support Rx interrupt

For rxq interrupt, the device (backend driver) will notify driver
through callfd. Each virtqueue has a callfd. To keep compatible
with the existing framework, we will give these callfds to
interrupt thread for listening for interrupts.

Before that, we need to allocate intr_handle, and fill callfds
into it so that driver can use it to set up rxq interrupt mode.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
7 years agonet/virtio-user: move eventfd open/close into init/uninit
Jianfeng Tan [Fri, 31 Mar 2017 19:44:55 +0000 (19:44 +0000)]
net/virtio-user: move eventfd open/close into init/uninit

Originally, eventfd is opened when initializing each vq; and gets closded
in virtio_user_stop_device().

To make it possible to initialize intr_handle struct in init() in following
patch, we put the open() of all eventfds into init(); and put the close()
into uninit().

Suggested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agoeal/linux: add interrupt type for vdev
Jianfeng Tan [Fri, 31 Mar 2017 19:44:54 +0000 (19:44 +0000)]
eal/linux: add interrupt type for vdev

A new interrupt type, RTE_INTR_HANDLE_VDEV, is added to support lsc and rxq
interrupt for vdev.

For lsc interrupt, except from original EPOLLIN events, we also listen for
socket peer closed connection event (EPOLLRDHUP and EPOLLHUP).

For rxq interrupt, add a precondition to avoid invoking any vfio and uio
code.

For intr_handle initialization, let each vdev driver to do that.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
7 years agonet/virtio-user: support changing tap interface name
Wenfeng Liu [Tue, 28 Mar 2017 17:20:00 +0000 (17:20 +0000)]
net/virtio-user: support changing tap interface name

This patch adds a new option 'iface' to change the interface name of
tap device with vhost-kernel as backend.

Signed-off-by: Wenfeng Liu <liuwf@arraynetworks.com.cn>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agovhost: fix false sharing
Kevin Traynor [Thu, 23 Mar 2017 15:44:58 +0000 (15:44 +0000)]
vhost: fix false sharing

The broadcast_rarp field in the virtio_net struct is checked in the
dequeue datapath regardless of whether descriptors are available or not.

As it is checked with cmpset leading to a write, false sharing on the
virtio_net struct can happen between enqueue and dequeue datapaths
regardless of whether a RARP is requested. In OVS, the issue can cause
a uni-directional performance drop of up to 15%.

Fix that by only performing the cmpset if a read of broadcast_rarp
indicates that the cmpset is likely to succeed.

Fixes: a66bcad32240 ("vhost: arrange struct fields for better cache sharing")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agodoc: fix parameter of virtio-user for container
Yong Wang [Wed, 8 Mar 2017 07:45:53 +0000 (02:45 -0500)]
doc: fix parameter of virtio-user for container

Update the "Virtio_user for Container Networking" doc, add the
"--file-prefix" option to testpmd in host and container to avoid
hugepage config file conflict.

Fixes: 50665deebda0 ("doc: add guide to use virtio-user for container networking")
Cc: stable@dpdk.org
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agoapp/testpmd: print MTU in port info
Maxime Coquelin [Sun, 12 Mar 2017 16:34:06 +0000 (17:34 +0100)]
app/testpmd: print MTU in port info

This patch adds MTU display to "show port info" command.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agonet/virtio: support MTU feature
Maxime Coquelin [Sun, 12 Mar 2017 16:34:04 +0000 (17:34 +0100)]
net/virtio: support MTU feature

This patch implements support for the Virtio MTU feature.
When negotiated, the host shares its maximum supported MTU,
which is used as initial MTU and as maximum MTU the application
can set.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agonet/vhost: set MTU
Maxime Coquelin [Sun, 12 Mar 2017 16:34:03 +0000 (17:34 +0100)]
net/vhost: set MTU

This patch adds a call to rte_vhost_mtu_get() at device creation
time to fill device's MTU property when available.

This makes the MTU value defined in QEMU cmdline accessible to the
application by calling rte_eth_dev_get_mtu().

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agovhost: add API to get MTU value
Maxime Coquelin [Sun, 12 Mar 2017 16:34:01 +0000 (17:34 +0100)]
vhost: add API to get MTU value

This patch implements the function for the application to
get the MTU value.

rte_vhost_get_mtu() fills the mtu parameter with the MTU value
set in QEMU if VIRTIO_NET_F_MTU has been negotiated and returns 0,
-ENOTSUP otherwise.

The function returns -EAGAIN if Virtio feature negotiation
didn't happened yet.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agovhost: add new ready status flag
Maxime Coquelin [Sun, 12 Mar 2017 16:34:00 +0000 (17:34 +0100)]
vhost: add new ready status flag

This patch adds a new status flag indicating the Virtio device
is ready to operate.

This is required to be able to call rte_vhost_mtu_get() in the
.new_device() callback, as rte_vhost_mtu_get needs that the
negotiation is done, but it is too early to rely on running status
flag, which is set just after .new_device() returns.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agovhost: support MTU protocol feature
Maxime Coquelin [Sun, 12 Mar 2017 16:33:59 +0000 (17:33 +0100)]
vhost: support MTU protocol feature

This patch implements the vhost-user MTU protocol feature support.
When VIRTIO_NET_F_MTU is negotiated, QEMU notifies the vhost-user
backend with the configured MTU if dedicated protocol feature is
supported.

The value can be used by the application to ensure consistency with
value set by the user.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agovhost: enable virtio MTU feature
Maxime Coquelin [Sun, 12 Mar 2017 16:33:58 +0000 (17:33 +0100)]
vhost: enable virtio MTU feature

This patch enables the new VIRTIO_NET_F_MTU feature,
which makes possible for the host to advise the guest
with its maximum supported MTU.

MTU value is set via QEMU parameters, either via Libvirt XML, or
directly in virtio-net device command line arguments.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agovhost: remove a hack on queue allocation
Yuanhan Liu [Thu, 2 Mar 2017 06:16:07 +0000 (14:16 +0800)]
vhost: remove a hack on queue allocation

We used to allocate queues based on the index from SET_VRING_CALL
request: if corresponding queue hasn't been allocated, allocate it.

Though it's pratically right (it's the first per-vring request we
will get from QEMU for vhost-user negotiation), but it's not technically
right: it's not documented in the vhost-user spec that it will always
be the first per-vring request. For example, SET_VRING_ADDR could also
be the first per-vring request.

Thus, we should not depend the SET_VRING_CALL on queue allocation.
Instead, we could catch all the per-vring messages at the entrance of
request handler, and allocate one if it hasn't been allocated before.

By that, we could remove a hack.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agovhost: fix max queues
Yuanhan Liu [Wed, 1 Mar 2017 10:41:59 +0000 (18:41 +0800)]
vhost: fix max queues

0x8000 is the max virito-net queue pairs the virtio 1.0 spec claims to
support. While for vhost-user, it's a different story: the max vring
index could be passed by the vhost-user spec is 0xff, masked by the
VHOST_USER_VRING_IDX_MASK.

That said, the max queue pairs could vhost-user could supported is 0x80.
If user are asking more, I think the vhost-user need be extended.

Fixes: b09b198bfb5c ("vhost-user: announce queue number in message")
Cc: stable@dpdk.org
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agovhost: fix multiple queue not enabled for old kernels
Yuanhan Liu [Wed, 1 Mar 2017 10:41:58 +0000 (18:41 +0800)]
vhost: fix multiple queue not enabled for old kernels

Some macros (say VIRTIO_NET_F_MQ) are needed for enabling multiple queue,
however they are introduced since kernel v3.8, meaning build error happens
if we build DPDK vhost on those platforms.

71dfdbe66a66 ("vhost: fix build with kernel < 3.8") meant to fix it, but
in a wrong way: it completely disables the MQ features for those kernels.
However, the MQ feature doesn't depend on the kernel at all (except the
macros dependency stated above), that we could still enable the MQ feature
even the host kernel has no such support.

The right fix is to define the macro if it's not defined.

Fixes: 71dfdbe66a66 ("vhost: fix build with kernel < 3.8")
Cc: stable@dpdk.org
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
7 years agonet/virtio: disable LSC interrupt if MSIX not enabled
Matt Peters [Thu, 9 Mar 2017 20:28:02 +0000 (15:28 -0500)]
net/virtio: disable LSC interrupt if MSIX not enabled

The link state change interrupt can only be configured if the virtio device
supports MSIX.  Prior to this change the writing of the vector to the PCI
config space was causing it to overwrite the initial part of the MAC
address since the MSIX vector is not in the config space and is occupied by
the MAC address.

This has been reproduced in Virtual Box (v5.0.30.r112061) in Windows 7.

Fixes: 954ea11540b6 ("virtio: do not report link state feature unless available")
Cc: stable@dpdk.org
Signed-off-by: Matt Peters <matt.peters@windriver.com>
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
7 years agonet/virtio-user: fix overflow
Wenfeng Liu [Tue, 14 Mar 2017 10:09:56 +0000 (10:09 +0000)]
net/virtio-user: fix overflow

virtio-user limits the qeueue number to 8 but provides no limit
check against the queue number input from user. If a bigger queue
number (> 8) is given, there is an overflow issue. Doing a sanity
check could avoid it.

Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
Cc: stable@dpdk.org
Signed-off-by: Wenfeng Liu <liuwf@arraynetworks.com.cn>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agonet/virtio-user: fix tapfds close
Wenfeng Liu [Mon, 13 Mar 2017 09:33:21 +0000 (09:33 +0000)]
net/virtio-user: fix tapfds close

The valid tap file descriptor range should be equal or greater
than zero instead of non-zero

Fixes: e3b434818bbb ("net/virtio-user: support kernel vhost")
Cc: stable@dpdk.org
Signed-off-by: Wenfeng Liu <liuwf@arraynetworks.com.cn>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agovhost: change log levels in client mode
Ilya Maximets [Thu, 2 Mar 2017 09:39:34 +0000 (12:39 +0300)]
vhost: change log levels in client mode

Inability to connect to socket is a normal situation
in client mode because, in common case, server isn't
started yet. RTE_LOG_WARNING should be suitable for
the case of some unusual errors.
Message about reconnection is not an error at all.

Fixes: e623e0c6d8a5 ("vhost: add reconnect ability")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agonet/vhost: remove include of numaif.h
Rami Rosen [Mon, 27 Feb 2017 04:54:11 +0000 (23:54 -0500)]
net/vhost: remove include of numaif.h

This patch revmoves include of the numaif.h header from rte_eth_vhost.c.
Commit 586e39001317 ("vhost: export numa node") moved the invocation of
get_mempolicy() from rte_eth_vhost.c to librte_vhost. So there is no need
to include the numaif.h header anymore in rte_eth_vhost.c.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
7 years agonet/virtio: remove the redundant computing
Zhiyong Yang [Thu, 23 Feb 2017 07:11:42 +0000 (15:11 +0800)]
net/virtio: remove the redundant computing

The minor change aims to remove the redundant computing and make
it easier to understand the code.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
7 years agovhost: try to shrink pfdset when fdset_add fails
Matthias Gatto [Tue, 21 Feb 2017 14:25:30 +0000 (15:25 +0100)]
vhost: try to shrink pfdset when fdset_add fails

fdset_add increments pfdset->num, but fdset_del doesn't decrement
pfdset->num, so if we call fdset_add then fdset_del in a loop without
calling fdset_shrink, we can easily exceed MAX_FDS with only a few
number of fds used.

So my solution is simply to call fdset_shrink in fdset_add when it
exceeds MAX_FDS.

Because fdset_shrink and fdset_add locks pfdset->fd_mutex we can't
call fdset_shrink inside fdset_add because that would cause a dead
lock, so this patch split fdset_shrink in two, fdset_shrink and
fdset_shrink_nolock.

Fixes: 59317cef249c ("vhost: allow many vhost-user ports")
Cc: stable@dpdk.org
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
7 years agonet/sfc/base: fix out of bounds read in VIs allocation
Andy Moreton [Tue, 4 Apr 2017 12:13:27 +0000 (13:13 +0100)]
net/sfc/base: fix out of bounds read in VIs allocation

Coverity issue: 1349662
Fixes: e7cd430c864f ("net/sfc/base: import SFN7xxx family support")
Cc: stable@dpdk.org
Signed-off-by: Andy Moreton <amoreton@solarflare.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
7 years agonet/sfc/base: fix potential buffer overflow in Tx queue init
Andy Moreton [Tue, 4 Apr 2017 12:13:26 +0000 (13:13 +0100)]
net/sfc/base: fix potential buffer overflow in Tx queue init

Improve error checking to avoid a caller overflowing the MCDI
request buffer if the requested TXQ size was excessively large.

Coverity issue: 1305527
Fixes: e7cd430c864f ("net/sfc/base: import SFN7xxx family support")
CC: stable@dpdk.org
Signed-off-by: Andy Moreton <amoreton@solarflare.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
7 years agonet/sfc/base: fix failure path in EF10 Tx queue PIO enable
Andy Moreton [Tue, 4 Apr 2017 12:13:25 +0000 (13:13 +0100)]
net/sfc/base: fix failure path in EF10 Tx queue PIO enable

Coverity issue: 1387551
Fixes: e7cd430c864f ("net/sfc/base: import SFN7xxx family support")
Cc: stable@dpdk.org
Signed-off-by: Andy Moreton <amoreton@solarflare.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
7 years agoapp/testpmd: add CLI to set TC min bandwidth
Bernard Iremonger [Sat, 1 Apr 2017 01:18:19 +0000 (09:18 +0800)]
app/testpmd: add CLI to set TC min bandwidth

Add a CLI in testpmd to test the TC min bandwidth
setting.

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
7 years agonet/ixgbe: allocate TC bandwidth
Bernard Iremonger [Sat, 1 Apr 2017 01:18:18 +0000 (09:18 +0800)]
net/ixgbe: allocate TC bandwidth

Ixgbe supports to set the relative bandwidth for the TCs.
It's a global setting for the PF and all the VFs of a
physical port.
This feature provide the API to set the bandwidth.

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
7 years agonet/thunderx: wait to complete during link update
Andriy Berestovskyy [Fri, 31 Mar 2017 13:57:49 +0000 (15:57 +0200)]
net/thunderx: wait to complete during link update

Some DPDK applications/examples check link status on their
start. NICVF does not wait for the link, so those apps fail.

Wait up to 9 seconds for the link as other PMDs do in order
to fix those apps/examples.

Signed-off-by: Andriy Berestovskyy <andriy.berestovskyy@caviumnetworks.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
7 years agonet/i40e: fix VLAN promisc setting
Wenzhuo Lu [Sat, 1 Apr 2017 06:15:57 +0000 (14:15 +0800)]
net/i40e: fix VLAN promisc setting

After adding VLAN filter, the VLAN promiscuous mode is
disabled. But there's no chance to enable it.
So add the check after deleting VLAN filter. If there's
no VLAN filter left, enable the VLAN promiscuous mode.

Fixes: 9f0645cd147c ("net/i40e: fix VLAN filter")

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
7 years agonet/tap: fix redirection rule after MAC change
Pascal Mazon [Fri, 31 Mar 2017 13:54:11 +0000 (15:54 +0200)]
net/tap: fix redirection rule after MAC change

This is necessary to ensure packets with the new MAC address as
destination get redirected to the tap device.

Also change the MAC address only if the current one is different from
the requested one.

Fixes: 2bc06869cd94 ("net/tap: add remote netdevice traffic capture")

Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com>
7 years agonet/tap: fix null MAC address at init
Pascal Mazon [Fri, 31 Mar 2017 13:54:10 +0000 (15:54 +0200)]
net/tap: fix null MAC address at init

Immediately after init (probing), the device MAC address is all zeroes.
It should be possible to get a correct MAC address as soon as that,
without need for a dev_configure().

With this patch, a MAC address is set in eth_dev_tap_create()
explicitly. It either comes from the remote if any was configured, or is
randomly generated. In any case, the device MAC address is guaranteed to
be the correct one when the tap netdevice actually gets created in
tun_alloc().

Fixes: f76d46b4ff08 ("net/tap: add MAC address management")
Fixes: 2bc06869cd94 ("net/tap: add remote netdevice traffic capture")

Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com>
7 years agonet/tap: update netlink error code management
Pascal Mazon [Fri, 31 Mar 2017 13:54:09 +0000 (15:54 +0200)]
net/tap: update netlink error code management

Some errors received from the kernel are acceptable, such as a -ENOENT
for a rule deletion (the rule was already no longer existing in the
kernel). Make sure we consider return codes properly. For that,
nl_recv() has been simplified.

qdisc_exists() function is no longer needed as we can check whether the
kernel returned -EEXIST when requiring the qdisc creation. It's simpler
and faster.

Add a few messages for clarity when a netlink error occurs.

Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com>