Stefan Hajnoczi [Mon, 5 Feb 2018 12:16:00 +0000 (13:16 +0100)]
vhost: validate virtqueue size
Check the virtqueue size constraints so that invalid values don't cause
bugs later on in the code. For example, sometimes the virtqueue size is
stored as unsigned int and sometimes as uint16_t, so bad things happen
if it is ever larger than 65535.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Stefan Hajnoczi [Mon, 5 Feb 2018 12:16:00 +0000 (13:16 +0100)]
vhost: fix message payload union in setting ring address
vhost_user_set_vring_addr() uses the msg->payload.addr union member, not
msg->payload.state. Luckily the offset of the 'index' field is
identical in both structs, so there was never any buggy behavior.
Fixes: 5cd690e4fda9 ("vhost: fix vring addresses not translated") Cc: stable@dpdk.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Stefan Hajnoczi [Mon, 5 Feb 2018 12:16:00 +0000 (13:16 +0100)]
vhost: reject invalid log base mmap offset
If the log base mmap_offset is larger than mmap_size then it points
outside the mmap region. We must not write to memory outside the mmap
region, so validate mmap_offset in vhost_user_set_log_base().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Stefan Hajnoczi [Mon, 5 Feb 2018 12:16:00 +0000 (13:16 +0100)]
vhost: clear out unused SCM_RIGHTS file descriptors
The number of file descriptors received is not stored by vhost_user.c.
vhost_user_set_mem_table() assumes that memory.nregions matches the
number of file descriptors received, but nothing guarantees this:
for (i = 0; i < memory.nregions; i++)
close(pmsg->fds[i]);
Another questionable code snippet is:
case VHOST_USER_SET_LOG_FD:
close(msg.fds[0]);
If not enough file descriptors were received then fds[] contains
uninitialized data from the stack (see read_fd_message()). This might
cause non-vhost file descriptors to be closed if the uninitialized data
happens to match.
Refactoring vhost_user.c to pass around and check the number of file
descriptors everywhere would make the code more complex. It is simpler
for read_fd_message() to set unused elements in fds[] to -1. This way
close(-1) is called and no harm is done.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Stefan Hajnoczi [Mon, 5 Feb 2018 12:16:00 +0000 (13:16 +0100)]
vhost: validate untrusted memory regions number field
Check if memory.nregions is valid right away. This eliminates the
possibility of bugs when memory.nregions is used later on in
vhost_user_set_mem_table().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Stefan Hajnoczi [Mon, 5 Feb 2018 12:16:00 +0000 (13:16 +0100)]
vhost: avoid enum fields in VhostUserMsg
The VhostUserMsg struct binary representation must match the vhost-user
protocol specification since this struct is read from and written to the
socket.
The VhostUserMsg.request union contains enum fields. Enum binary
representation is implementation-defined according to the C standard and
it is unportable to make assumptions about the representation:
6.7.2.2 Enumeration specifiers
...
Each enumerated type shall be compatible with char, a signed integer
type, or an unsigned integer type. The choice of type is
implementation-defined, but shall be capable of representing the
values of all the members of the enumeration.
Additionally, librte_vhost relies on the enum type being unsigned when
validating untrusted inputs:
if (ret <= 0 || msg.request.master >= VHOST_USER_MAX) {
If msg.request.master is signed then negative values pass this check!
Even if we assume gcc on x86_64 (SysV amd64 ABI) and don't care about
portability, the actual enum constants still affect the final type. For
example, if we add a negative constant then its type changes to signed
int:
Stefan Hajnoczi [Mon, 5 Feb 2018 12:16:00 +0000 (13:16 +0100)]
vhost: add security model documentation
Input validation is not applied consistently in vhost_user.c. This
suggests that not everyone has the same security model in mind when
working on the code.
Make the security model explicit so that everyone can understand and
follow the same model when modifying the code.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: John McNamara <john.mcnamara@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Shahaf Shuler [Sun, 25 Feb 2018 07:28:37 +0000 (09:28 +0200)]
net/mlx5: fix tunnel offloads cap query
The query for the tunnel stateless offloads is wrongly implemented
because of:
1. It was using the device id to query for the offloads.
2. It was using a compilation flag for Verbs which no longer exits.
The main reason was lack of proper API from Verbs.
Fixing the query to use rdma-core API. The capability returned from
rdma-core refer to both Tx and Rx sides.
Eventhough there is a separate cap for GRE and VXLAN, implementation merge
them into a single flag in order to simplify the checks on the data
path.
Nélio Laranjeiro [Wed, 14 Feb 2018 15:04:45 +0000 (16:04 +0100)]
net/mlx5: fix flow creation with a single target queue
Adding a pattern targeting a single queues wrongly behaves as it is an RSS
request, ending by creating several Verbs flows rules to match the RSS
configuration.
Several control operations implemented by these PMDs affect netdevices
through sysfs, itself subject to file system permission checks enforced by
the kernel, which limits their use for most purposes to applications
running with root privileges.
Since performing the same operations through ioctl() requires fewer
capabilities (only CAP_NET_ADMIN) and given the remaining operations are
already implemented this way, this patch standardizes on ioctl() and gets
rid of redundant code.
Thomas Monjalon [Thu, 29 Mar 2018 15:28:26 +0000 (17:28 +0200)]
mk: fix kernel modules build dependency
Some kernel modules may need some header files to be "installed"
in the build directory.
When running multiple threads of make, kernel modules can try to
be compiled before the lib headers are ready:
make -j3
kernel/linux/kni/kni_misc.c:19:37: fatal error:
exec-env/rte_kni_common.h: No such file or directory
This error appeared recently after moving kernel modules in their
own directory.
Keith Wiles [Sat, 10 Mar 2018 16:24:28 +0000 (10:24 -0600)]
kvargs: fix syntax in comments
Use commas as separator, not semicolons.
Fixes: a8b97e3a1db0 ("devargs: use a comma instead of semicolon to separate key/values") Cc: stable@dpdk.org Signed-off-by: Keith Wiles <keith.wiles@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Radu Nicolau [Tue, 20 Feb 2018 12:05:57 +0000 (12:05 +0000)]
examples/exception_path: limit core count to 64
Application doesn't support more that 64 lcores due to command
line limitation of using a coremask that is parsed as a 64bit
value, so changed it to reflect this limitation.
Coverity issue: 30688 Fixes: af75078fece3 ("first public release") Cc: stable@dpdk.org Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Use strcmp to compare device names as the strncmp in original code
causes find_vdev to return -EEXIST for names that are prefix
of another. The creation of interfaces fails unpredictably based
on the order of their creation. An easy way hit this bug is to create
eth_vhost1 after eth_vhost11.
Fixes: dda987315ca2 ("vdev: make virtual bus use its device struct") Cc: stable@dpdk.org Signed-off-by: Nachiketa Prachanda <nprachan@vyatta.att-mail.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Andrew Rybchenko [Tue, 20 Mar 2018 11:26:18 +0000 (11:26 +0000)]
net/null: fix library version in meson build
Fixes: efd5d1a8d8dd ("drivers/net: build some vdev PMDs with meson") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Andrew Rybchenko [Tue, 20 Mar 2018 11:26:19 +0000 (11:26 +0000)]
net/ring: fix library version in meson build
Fixes: efd5d1a8d8dd ("drivers/net: build some vdev PMDs with meson") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Andrew Rybchenko [Tue, 20 Mar 2018 11:26:16 +0000 (11:26 +0000)]
net/i40e: fix library version in meson build
Fixes: e940646b20fa ("drivers/net: build Intel NIC PMDs with meson") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Andrew Rybchenko [Tue, 20 Mar 2018 11:26:17 +0000 (11:26 +0000)]
net/ixgbe: fix library version in meson build
Fixes: e940646b20fa ("drivers/net: build Intel NIC PMDs with meson") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Thomas Monjalon [Tue, 20 Feb 2018 17:11:12 +0000 (18:11 +0100)]
doc: adapt features tables header height
The length of the longest header name is used to adjust the padding
of the header row automatically, instead of fixed length.
The previous length (10) was too short for vdev_netvsc.
Thomas Monjalon [Tue, 20 Feb 2018 17:00:34 +0000 (18:00 +0100)]
doc: reduce features tables column width
The font size of the header row is reduced in order to shrink
the column size of the tables.
It is required for the NICs features table which is too large to fit
in the page width.
Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Jianfeng Tan [Sun, 11 Feb 2018 01:04:51 +0000 (01:04 +0000)]
maintainers: update for vhost lib and PMD
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Zhiyong Yang <zhiyong.yang@intel.com> Acked-by: Yuanhan Liu <yliu@fridaylinux.org> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thomas Monjalon [Fri, 9 Mar 2018 20:56:06 +0000 (21:56 +0100)]
drivers: rename bbdev directory to baseband
The drivers directory contains some sub-directories
for each kind of device (or bus, mem):
net, crypto, event, raw
They are not suffixed with "dev" because it is obvious.
For consistency, the sub-directory drivers/bbdev/
is renamed to drivers/baseband/.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>
Anatoly Burakov [Tue, 13 Mar 2018 17:42:40 +0000 (17:42 +0000)]
eal: ignore IPC messages until init is complete
If we receive messages that don't have a callback registered for
them, and we haven't finished initialization yet, it can be reasonably
inferred that we shouldn't have gotten the message in the first
place. Therefore, send requester a special message telling them to
ignore response to this request, as if this process wasn't there.
Since it is not possible for primary process to receive any messages
during initialization, this change in practice only applies to
secondary processes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Anatoly Burakov [Tue, 13 Mar 2018 17:42:37 +0000 (17:42 +0000)]
eal: do not hardcode socket filter value in IPC
Currently, filter value is hardcoded and disconnected from actual
value returned by eal_mp_socket_path(). Fix this to generate filter
value by deriving it from eal_mp_socket_path() instead.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Anatoly Burakov [Tue, 13 Mar 2018 17:42:35 +0000 (17:42 +0000)]
eal: add internal flag of init completed
Currently, primary process initialization is finalized by setting
the RTE_MAGIC value in the shared config. However, it is not
possible to check whether secondary process initialization has
completed. Add such a value to internal config.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Anatoly Burakov [Fri, 2 Mar 2018 08:41:37 +0000 (08:41 +0000)]
eal: fix race condition in IPC request
Unlocking the action list before sending message and locking it
again afterwards introduces a window where a response might
arrive before we have a chance to start waiting on a condition,
resulting in timeouts on valid messages.
Bruce Richardson [Thu, 22 Feb 2018 17:20:33 +0000 (17:20 +0000)]
net/pcap: simplify dependency checking using meson
Rather than trying to use meson's build-in detection for libpcap, and
having to special-case cross-building, just check for the presence of
pcap.h and the pcap library.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
This patch adds support for meter configuration profiles.
Benefits: simplified configuration procedure, improved performance.
Q1: What is the configuration profile and why does it make sense?
A1: The configuration profile represents the set of configuration
parameters for a given meter object, such as the rates and sizes for
the token buckets. The configuration profile concept makes sense when
many meter objects share the same configuration, which is the typical
usage model: thousands of traffic flows are each individually metered
according to just a few service levels (i.e. profiles).
Q2: How is the configuration profile improving the performance?
A2: The performance improvement is achieved by reducing the memory
footprint of a meter object, which results in better cache utilization
for the typical case when large arrays of meter objects are used. The
internal data structures stored for each meter object contain:
a) Constant fields: Low level translation of the configuration
parameters that does not change post-configuration. This is
really duplicated for all meters that use the same
configuration. This is the configuration profile data that is
moved away from the meter object. Current size (implementation
dependent): srTCM = 32 bytes, trTCM = 32 bytes.
b) Variable fields: Time stamps and running counters that change
during the on-going traffic metering process. Current size
(implementation dependent): srTCM = 24 bytes, trTCM = 32 bytes.
Therefore, by moving the constant fields to a separate profile
data structure shared by all the meters with the same
configuration, the size of the meter object is reduced by ~50%.
Pablo de Lara [Wed, 14 Feb 2018 17:14:06 +0000 (17:14 +0000)]
doc: fix outdated link to IPsec white paper
Fixes: 924e84f87306 ("aesni_mb: add driver for multi buffer based crypto") Cc: stable@dpdk.org Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Shreyansh Jain [Mon, 5 Feb 2018 06:22:22 +0000 (11:52 +0530)]
doc: announce ethdev API change for preferred burst size
rte_eth_rx_burst(..,nb_pkts) function has semantic that if return value
is smaller than requested, application can consider it end of packet
stream. Some hardware can only support smaller burst sizes which need
to be advertised. Similar is the case for Tx burst.
This patch adds deprecation notice for rte_eth_dev_info structure as
new members, for preferred Rx and Tx burst and ring size would be
added - impacting the size of the structure.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Zhiyong Yang <zhiyong.yang@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Andrew Rybchenko [Tue, 23 Jan 2018 13:23:04 +0000 (13:23 +0000)]
doc: announce mempool API changes
An API/ABI changes are planned for 18.05 [1]:
* Allow to customize how mempool objects are stored in memory.
* Deprecate mempool XMEM API.
* Add mempool driver ops to get information from mempool driver and
dequeue contiguous blocks of objects if driver supports it.
Anatoly Burakov [Tue, 16 Jan 2018 17:53:40 +0000 (17:53 +0000)]
doc: announce EAL ABI change for NUMA node count
There will be a new function added in v18.05 that will return
number of detected sockets, which will change the ABI.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Jonas Pfefferle <pepperjo@japf.ch> Acked-by: Thomas Monjalon <thomas@monjalon.net>
doc: announce EAL API change to lcore role function
This an API/ABI change notice for DPDK 18.05 announcing a change in
the meaning of the return values of the rte_lcore_has_role() function.
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Matan Azrad [Wed, 14 Feb 2018 14:47:26 +0000 (14:47 +0000)]
net/failsafe: fix Rx interrupt reinstallation
Fail-safe dev_start() operation can be called by both the application
and the hot-plug alarm mechanism.
The installation of Rx interrupt are triggered from dev_start() in any
time it is called while actually the Rx interrupt should be installed
only by the application calls.
So, each plug-in event causes reinstallation which causes memory leak
and spoils the fail-safe Rx interrupt mechanism.
Trigger the Rx interrupt installation only when it does not exist.
Ophir Munk [Wed, 14 Feb 2018 11:32:19 +0000 (11:32 +0000)]
net/tap: fix promiscuous rules double insertions
Running testpmd command "port stop all" followed by command "port start
all" may result in a TAP error:
PMD: Kernel refused TC filter rule creation (17): File exists
Root cause analysis: during the execution of "port start all" command
testpmd calls rte_eth_promiscuous_enable() while during the execution
of "port stop all" command testpmd does not call
rte_eth_promiscuous_disable().
As a result the TAP PMD is trying to add tc (traffic control command)
promiscuous rules to the remote netvsc device consecutively. From the
kernel point of view it is seen as an attempt to add the same rule more
than once. In recent kernels (e.g. version 4.13) this attempt is rejected
with a "File exists" error. In less recent kernels (e.g. version 4.4) the
same rule may have been successfully accepted twice, which is undesirable.
In the corrupted code every tc promiscuous rule included a different
handle number parameter. If instead an identical handle number is
used for all tc promiscuous rules - all kernels will reject the second
identical rule with a "File exists" error, which is easy to identify and
to silently ignore.
Ciara Power [Tue, 13 Feb 2018 09:08:32 +0000 (09:08 +0000)]
doc: add maintainers section to the contributors guide
Add a maintainers section to the contributors guide to have a low tech
location to check/link to the current maintainers. This file is included
dynamically from the MAINTAINERS file in the root directory of the DPDK
source when the docs are built. This also allows us to link to the file
from other sections of the docs.
Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>