git.droids-corp.org - dpdk.git/log

]> git.droids-corp.org - dpdk.git/log

git.droids-corp.org / dpdk.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Honnappa Nagarahalli [Fri, 26 Oct 2018 05:37:29 +0000 (00:37 -0500)]

hash: separate multi-writer from r/w concurrency

RW concurrency is required with single writer and multiple reader
usecase as well. Hence, multi-writer should not be enabled by default when
RW concurrency is enabled.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

David Hunt [Wed, 17 Oct 2018 13:05:33 +0000 (14:05 +0100)]

examples/power: support meson/ninja build

Add meson.build in vm_power_manager and the guest_cli subdirectory.
Building can be achieved by going to the build directory, and using

meson configure -Dexamples=vm_power_manager,vm_power_manager/guest_cli

Then, when ninja is invoked, it will build dpdk-vm_power_manger and
dpdk-guest_cli

Work still needs to be done on the meson build system to handles the case
where the target list of example apps is defined as 'all'. That will come
in a future patch.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

David Hunt [Wed, 17 Oct 2018 13:05:32 +0000 (14:05 +0100)]

examples/power: clean up verbose messages

Some messages appearing several times a second, removing as they are
unnecessary. Other less severe messages change from INFO to DEBUG

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

David Hunt [Wed, 17 Oct 2018 13:05:31 +0000 (14:05 +0100)]

examples/power: add JSON string handling

Add JSON string handling to vm_power_manager for JSON strings received
through the fifo. The format of the JSON strings are detailed in the
next patch, the vm_power_manager user guide documentation updates.

This patch introduces a new dependency on Jansson, a C library for
encoding, decoding and manipulating JSON data. To compile the sample app
you now need to have installed libjansson4 and libjansson-dev (these may
be named slightly differently depending on your Operating System)

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

David Hunt [Wed, 17 Oct 2018 13:05:30 +0000 (14:05 +0100)]

examples/power: increase allowed number of clients

Now that we're handling host policies, containers and virtual machines,
we'll rename MAX_VMS to MAX_CLIENTS, and increase from 4 to 64

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

David Hunt [Wed, 17 Oct 2018 13:05:29 +0000 (14:05 +0100)]

examples/power: add host channel to power manager

This patch adds a fifo channel to the vm_power_manager app through which
we can send commands and polices. Intended for sending JSON strings.
The fifo is at /tmp/powermonitor/fifo

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

David Hunt [Wed, 17 Oct 2018 13:05:28 +0000 (14:05 +0100)]

examples/power: set core type in guest app

The changes here are minimal, as the guest app functionality is not
changing at all, but there is a new element in the channel_packet
struct that needs to have a default set (channel_packet->core_type).

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

David Hunt [Wed, 17 Oct 2018 13:05:27 +0000 (14:05 +0100)]

lib/power: add changes for host commands/policies

This patch does a couple of things:
  * Adds a new message type for removing policies (PKT_POLICY_REMOVE)
    Used when we want to remove a previously created policy.
  * Adds a core_type bool to the channel packet struct to specify whether
    the type of core we want to control is virtual or physical.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

David Hunt [Wed, 17 Oct 2018 13:05:26 +0000 (14:05 +0100)]

examples/power: allow number of VMs to be zero

Previously the vm_power_manager app required to have some vms defined, so
the call to get_all_vm() always set the noVms variable. Now we're accepting
policies from the host OS (without any VMs defined), so it is now valid to
have zero VMs. This patch initialises the relevant variables to zero just
in case the call to get_all_vms() does not find any, so could return with
the variables uninitialised.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

David Hunt [Wed, 17 Oct 2018 13:05:25 +0000 (14:05 +0100)]

examples/power: add checks around hypervisor

Allow vm_power_manager to run without requiring qemu to be present
on the machine. This will be required for instances where the JSON
interface is used for commands and polices, without any VMs present.
A use case for this is a container enviromnent.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

Liang Ma [Fri, 19 Oct 2018 11:07:19 +0000 (12:07 +0100)]

examples/l3fwd-power: support traffic pattern aware control

Add the support for new traffic pattern aware power control
power management API.

Example:
./l3fwd-power -l xxx -n 4 -w 0000:xx:00.0 -w 0000:xx:00.1 -- -p 0x3
-P --config="(0,0,xx),(1,0,xx)" --empty-poll="0,0,0" -l 14 -m 9 -h 1

Please Reference l3fwd-power document for full parameter usage

The option "l", "m", "h" are used to set the power index for
LOW, MED, HIGH power state. Only is useful after enable empty-poll

--empty-poll="training_flag, med_threshold, high_threshold"

The option training_flag is used to enable/disable training mode.

The option med_threshold is used to indicate the empty poll threshold
of modest state which is customized by user.

The option high_threshold is used to indicate the empty poll threshold
of busy state which is customized by user.

Above three option default value is all 0.

Once enable empty-poll. System will apply the default parameter if no
other command line options are provided.

If training mode is enabled, the user should ensure that no traffic
is allowed to pass through the system. When training phase complete,
the application transfer to normal operation

System will start running with the modest power mode.
If the traffic goes above 70%, then system will move to High power state.
If the traffic drops below 30%, the system will fallback to the modest
power state.

Example code use master thread to monitoring worker thread busyness.
The default timer resolution is 10ms.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>

commit | commitdiff | tree

Liang Ma [Fri, 19 Oct 2018 11:07:18 +0000 (12:07 +0100)]

power: add traffic pattern aware power control

1. Abstract

For packet processing workloads such as DPDK polling is continuous.
This means CPU cores always show 100% busy independent of how much work
those cores are doing. It is critical to accurately determine how busy
a core is hugely important for the following reasons:

   * No indication of overload conditions.

   * User does not know how much real load is on a system, resulting
     in wasted energy as no power management is utilized.

Compared to the original l3fwd-power design, instead of going to sleep
after detecting an empty poll, the new mechanism just lowers the core
frequency. As a result, the application does not stop polling the device,
which leads to improved handling of bursts of traffic.

When the system become busy, the empty poll mechanism can also increase the
core frequency (including turbo) to do best effort for intensive traffic.
This gives us more flexible and balanced traffic awareness over the
standard l3fwd-power application.

2. Proposed solution

The proposed solution focuses on how many times empty polls are executed.
The less the number of empty polls, means current core is busy with
processing workload, therefore, the higher frequency is needed. The high
empty poll number indicates the current core not doing any real work
therefore, we can lower the frequency to safe power.

In the current implementation, each core has 1 empty-poll counter which
assume 1 core is dedicated to 1 queue. This will need to be expanded in the
future to support multiple queues per core.

2.1 Power state definition:

LOW:  Not currently used, reserved for future use.

MED:  the frequency is used to process modest traffic workload.

HIGH: the frequency is used to process busy traffic workload.

2.2 There are two phases to establish the power management system:

a.Initialization/Training phase. The training phase is necessary
  in order to figure out the system polling baseline numbers from
  idle to busy. The highest poll count will be during idle, where
  all polls are empty. These poll counts will be different between
  systems due to the many possible processor micro-arch, cache
  and device configurations, hence the training phase.
  In the training phase, traffic is blocked so the training
  algorithm can average the empty-poll numbers for the LOW, MED and
  HIGH  power states in order to create a baseline.
  The core's counter are collected every 10ms, and the Training
  phase will take 2 seconds.
  Training is disabled as default configuration. The default
  parameter is applied. Sample App still can trigger training
  if that's needed. Once the training phase has been executed once on
  a system, the application can then be started with the relevant
  thresholds provided on the command line, allowing the application
  to start passing start traffic immediately

b.Normal phase. Traffic starts immediately based on the default
  thresholds, or based on the user supplied thresholds via the
  command line parameters. The run-time poll counts are compared with
  the baseline and the decision will be taken to move to MED power
  state or HIGH power state. The counters are calculated every 10ms.

3. Proposed  API

1.  rte_power_empty_poll_stat_init(struct ep_params **eptr,
uint8_t *freq_tlb, struct ep_policy *policy);
which is used to initialize the power management system.

2.  rte_power_empty_poll_stat_free(void);
which is used to free the resource hold by power management system.

3.  rte_power_empty_poll_stat_update(unsigned int lcore_id);
which is used to update specific core empty poll counter, not thread safe

4.  rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
which is used to update specific core valid poll counter, not thread safe

5.  rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
which is used to get specific core empty poll counter.

6.  rte_power_poll_stat_fetch(unsigned int lcore_id);
which is used to get specific core valid poll counter.

7.  rte_empty_poll_detection(struct rte_timer *tim, void *arg);
which is used to detect empty poll state changes then take action.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>

commit | commitdiff | tree

Yipeng Wang [Mon, 22 Oct 2018 18:39:48 +0000 (11:39 -0700)]

hash: use partial-key hashing

This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Basically the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Yipeng Wang [Mon, 22 Oct 2018 18:39:47 +0000 (11:39 -0700)]

test/hash: add extendable bucket

This commit changes the current rte_hash unit test to
test the extendable table feature and performance.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Yipeng Wang [Mon, 22 Oct 2018 18:39:46 +0000 (11:39 -0700)]

hash: add extendable bucket feature

In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the hash table load can always achieve 100%.
In other words, the table can always accommodate the same
number of keys as the specified table size. This provides
100% table capacity guarantee.

Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Yipeng Wang [Mon, 22 Oct 2018 18:39:45 +0000 (11:39 -0700)]

hash: fix race condition in iterate

In rte_hash_iterate, the reader lock did not protect the
while loop which checks empty entry. This created a race
condition that the entry may become empty when enters
the lock, then a wrong key data value would be read out.

This commit reads out the position in the while condition,
which makes sure that the position will not be changed
to empty before entering the lock.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org
Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Yipeng Wang [Fri, 28 Sep 2018 14:11:09 +0000 (07:11 -0700)]

hash: remove unused constant

Since the depth-first search of cuckoo path is removed, we do not
need the macro anymore which specifies the depth of the cuckoo
search.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Yipeng Wang [Fri, 28 Sep 2018 14:11:08 +0000 (07:11 -0700)]

test/hash: add missing file in meson build

The test_hash_readwrite.c was not in the meson.build file. This
commit adds the missing test into the file.

Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Yipeng Wang [Fri, 28 Sep 2018 14:11:07 +0000 (07:11 -0700)]

test/hash: fix r/w test with non-consecutive cores

the multi-reader and multi-writer rte_hash unit test does not
work correctly with non-consecutive core ids. This commit
fixes the issue.

Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Yipeng Wang [Fri, 28 Sep 2018 14:11:06 +0000 (07:11 -0700)]

test/hash: improve accuracy of perf test output

Edit the printf information when error happens to be more
accurate and informative.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Yipeng Wang [Fri, 28 Sep 2018 14:11:05 +0000 (07:11 -0700)]

test/hash: fix bucket size in perf test

The bucket size was changed from 4 to 8 but the corresponding
perf test was not changed accordingly.

In the test, the bucket size and number of buckets are used
to map to the underneath rte_hash structure. They are used
to test performance of two conditions: keys in primary
buckets only and keys in both primary and secondary buckets.

Although there is no functional issue with bucket size set
to 4, it mismatches the underneath rte_hash structure,
which may affect code readability and future extension.

Fixes: 58017c98ed53 ("hash: add vectorized comparison")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

commit | commitdiff | tree

Reshma Pattan [Tue, 16 Oct 2018 14:29:07 +0000 (15:29 +0100)]

net/softnic: support flow API VXLAN encap action

Added support for ethdev flow API VXLAN encap action.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

commit | commitdiff | tree

Cristian Dumitrescu [Fri, 12 Oct 2018 11:52:07 +0000 (12:52 +0100)]

net/softnic: support VXLAN encap

Add CLI support for VXLAN encap.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

commit | commitdiff | tree

Ferruh Yigit [Mon, 15 Oct 2018 14:53:18 +0000 (15:53 +0100)]

devtools: add git log checks for PHY

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>

commit | commitdiff | tree

Anoob Joseph [Wed, 10 Oct 2018 13:01:22 +0000 (18:31 +0530)]

devtools: add git check exception for OCTEON TX

The 'TX' in OCTEON TX would cause a warning.
Adding an exception for that.

OCTEON TX is a registered product under Cavium

Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>

commit | commitdiff | tree

Thomas Monjalon [Thu, 18 Oct 2018 15:36:15 +0000 (17:36 +0200)]

devtools: fix alignment of Marvell build options

Really minor issue:
There were extra spaces making the alignment wrong.

Fixes: e95faac15110 ("crypto/mrvl: rename PMD to mvsam")
Fixes: 4ccc8d770d3b ("net/mvneta: add PMD skeleton")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

commit | commitdiff | tree

Thomas Monjalon [Thu, 18 Oct 2018 16:09:16 +0000 (18:09 +0200)]

doc: add deprecated list in doxygen

The option GENERATE_DEPRECATEDLIST will create a page
"Deprecated List" in "Related Pages" menu.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

commit | commitdiff | tree

Jerin Jacob [Mon, 27 Aug 2018 12:38:35 +0000 (18:08 +0530)]

mbuf: add IGMP packet type

Add support for IGMP packet type.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

commit | commitdiff | tree

Jerin Jacob [Sun, 26 Aug 2018 12:54:55 +0000 (18:24 +0530)]

mbuf: add MPLS packet type

Add support of MPLS packet type.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

commit | commitdiff | tree

Jerin Jacob [Sun, 26 Aug 2018 12:54:54 +0000 (18:24 +0530)]

mbuf: add FCoE packet type

Add support of FCoE packet type.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

commit | commitdiff | tree

Ferruh Yigit [Mon, 15 Oct 2018 14:50:56 +0000 (15:50 +0100)]

ring: add library version to meson build

Fixes: a3d6026711d0 ("ring: relax alignment constraint on ring structure")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>

commit | commitdiff | tree

Ferruh Yigit [Mon, 15 Oct 2018 14:50:55 +0000 (15:50 +0100)]

mbuf: fix library version on meson build

Fixes: d27a6261875d ("mbuf: remove control mbuf")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>

commit | commitdiff | tree

Ferruh Yigit [Mon, 15 Oct 2018 14:50:54 +0000 (15:50 +0100)]

doc: fix vhost library version in release notes

Fixes: 7c1290374621 ("vhost: rename device ops struct")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

commit | commitdiff | tree

Ferruh Yigit [Mon, 15 Oct 2018 14:50:53 +0000 (15:50 +0100)]

doc: remove shared libs with no API from release notes

The internal shared libraries shouldn't be part of release notes shared
library version section.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

commit | commitdiff | tree

Ferruh Yigit [Mon, 15 Oct 2018 14:50:52 +0000 (15:50 +0100)]

doc: add missing shared library versions to release notes

Fixes: 857ed6c68cf2 ("member: implement main API")
Fixes: 56b6ef874f80 ("efd: new Elastic Flow Distributor library")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>

commit | commitdiff | tree

Rami Rosen [Wed, 22 Aug 2018 14:45:45 +0000 (17:45 +0300)]

bpf: fix a typo

This trivial patch fixes a typo in rte_bpf_ethdev.h,

Fixes: a93ff62a8938 ("bpf: introduce basic Rx/Tx filters")
Cc: stable@dpdk.org
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

commit | commitdiff | tree

Reshma Pattan [Tue, 25 Sep 2018 14:51:26 +0000 (15:51 +0100)]

latency: fix timestamp marking and latency calculation

Latency calculation logic is not correct for the case where
packets gets dropped before TX. As for the dropped packets,
the timestamp is not cleared, and such packets still gets
counted for latency calculation in next runs, that will result
in inaccurate latency measurement.

So fix this issue as below,

Before setting timestamp in mbuf, check mbuf don't have
any prior valid time stamp flag set and after marking
the timestamp, set mbuf flags to indicate timestamp is
valid.

Before calculating timestamp check mbuf flags are set to
indicate timestamp is valid.

With the above logic it is guaranteed that correct timestamps
have been used.

Fixes: 5cd3cac9ed ("latency: added new library for latency stats")
Cc: stable@dpdk.org
Reported-by: Bao-Long Tran <longtb5@viettel.com.vn>
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Tested-by: Bao-Long Tran <longtb5@viettel.com.vn>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

commit | commitdiff | tree

Paul Luse [Fri, 21 Sep 2018 16:25:57 +0000 (12:25 -0400)]

bus/vdev: fix multi-process IPC buffer leak on scan

This patch fixes an issue caught with ASAN where a vdev_scan()
to a secondary bus was failing to free some memory.

The doxygen comment in EAL is fixed at the same time.

Fixes: cdb068f031c6 ("bus/vdev: scan by multi-process channel")
Fixes: 783b6e54971d ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Paul Luse <paul.e.luse@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

commit | commitdiff | tree

Gaetan Rivet [Wed, 17 Oct 2018 14:30:13 +0000 (16:30 +0200)]

devargs: fix variadic parsing memory leak

rte_devargs_parsef will leak memory each time it is called.
The device string must be freed.

Fixes: a23bc2c4e01b ("devargs: add non-variadic parsing function")
Cc: stable@dpdk.org
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>

commit | commitdiff | tree

Keith Wiles [Fri, 5 Oct 2018 14:48:25 +0000 (09:48 -0500)]

eal: add macro for attribute weak

eal: add shorthand __rte_weak macro
qat: update code to use __rte_weak macro
avf: update code to use __rte_weak macro
fm10k: update code to use __rte_weak macro
i40e: update code to use __rte_weak macro
ixgbe: update code to use __rte_weak macro
mlx5: update code to use __rte_weak macro
virtio: update code to use __rte_weak macro
acl: update code to use __rte_weak macro
bpf: update code to use __rte_weak macro

Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

commit | commitdiff | tree

Stephen Hemminger [Wed, 25 Jul 2018 18:20:16 +0000 (11:20 -0700)]

eal/arm: remove profanity in comment

Update comment to describe the problem better without
risk of being offensive.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

commit | commitdiff | tree

Stephen Hemminger [Wed, 10 Oct 2018 23:22:19 +0000 (16:22 -0700)]

eal/linux: eliminate cast of HPET thread signature

The cast of hpet_msb_inc is causing a warning in some compilations.
Yet the cast is unnecessary, the function is used only one place
just use the correct signature.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

commit | commitdiff | tree

Stephen Hemminger [Tue, 23 Oct 2018 16:29:15 +0000 (09:29 -0700)]

eal: remove double space in init alert messages

rte_init_alert already adds a newline, don't do it twice.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Shreyansh Jain <shreyansh.jain@nxp.com>

commit | commitdiff | tree

Jeff Guo [Thu, 18 Oct 2018 06:27:15 +0000 (14:27 +0800)]

igb_uio: fix unexpected removal for hot-unplug

When a device is hot-unplugged, pci_remove will be invoked unexpectedly
before pci_release, it will caused kernel hung issue which will throw the
error info of "Trying to free already-free IRQ XXX". And on the other hand,
if pci_remove before pci_release, the interrupt will not got chance to be
disabled. So this patch aim to fix this issue by adding pci_release call
in pci_remove, it will gurranty that all pci clean up will be done before
pci removal.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

commit | commitdiff | tree

Shreyansh Jain [Wed, 24 Oct 2018 05:33:41 +0000 (05:33 +0000)]

raw/skeleton: fix memory leak on test failure

In skeleton_rawdev unit tests, a malloc'd memory was leaking in case
the next sequential test fails. This fix moves the free of the
malloc'd memory above the failing test.

Coverity issue: 260402
Fixes: 55ca1b0f2151 ("raw/skeleton: add test cases")
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

commit | commitdiff | tree

Shreyansh Jain [Wed, 17 Oct 2018 10:10:38 +0000 (10:10 +0000)]

net/dpaa2: decrease link state log level

In case the link is down during initial link state check, messages for
link state check flood the console. Reducing the log level for these.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

commit | commitdiff | tree

Shreyansh Jain [Wed, 17 Oct 2018 10:10:36 +0000 (10:10 +0000)]

bus/fslmc: ignore dpaax PA-VA table errors

Presence of PA-VA Table is transparent to the drivers. Ignoring the
return values from table update call.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

commit | commitdiff | tree

Shreyansh Jain [Wed, 17 Oct 2018 10:10:34 +0000 (10:10 +0000)]

common/dpaax: reduce log level

DPAAX is a library used by various NXP drivers. In case of non-NXP
environment, this start spewing message about unavailability of
necessary environment.

This patch reduces the log level for certain messages as well as
reduces overall log-level. As a library, these message are not
necessarily relevant at higher log level, either.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

commit | commitdiff | tree

Shreyansh Jain [Wed, 24 Oct 2018 05:44:09 +0000 (05:44 +0000)]

common/dpaax: fix nodes check

In case the memory for nodes cannot be allocated, there is no need
to check for the length. Also, `node_count` is an unsigned value
and cannot be less than 0.

Coverity issue: 323521
Fixes: 2f3d633aa593 ("common/dpaax: add library for PA/VA translation table")
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

commit | commitdiff | tree

Shreyansh Jain [Wed, 17 Oct 2018 09:05:57 +0000 (09:05 +0000)]

common/dpaax: fix uninitialized PA-VA table case

There is a possibility that either because of missing device tree entry
or lack of memory, the PA-VA table might not be available. But, the
table being transparent, the callers don't necessary check for its
initialization state. This is explicitly done during update and
translation call.

Fixes: 2f3d633aa593 ("common/dpaax: add library for PA/VA translation table")
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

commit | commitdiff | tree

Ferruh Yigit [Fri, 5 Oct 2018 11:12:41 +0000 (12:12 +0100)]

mk: use EXTRA_CFLAGS for pmdinfogen

Currently not able to pass EXTRA_CFLAGS while building *.pmd.c file,
adding it.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>

commit | commitdiff | tree

Akhil Goyal [Mon, 22 Oct 2018 07:12:10 +0000 (12:42 +0530)]

crypto/dpaa2_sec: support PDCP offload

PDCP session configuration for lookaside protocol offload
and data path is added.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>

commit | commitdiff | tree

Akhil Goyal [Mon, 22 Oct 2018 07:12:09 +0000 (12:42 +0530)]

crypto/dpaa2_sec: add sample PDCP descriptor APIs

DPAA2 SEC platform can support look aside protocol
offload for PDCP protocol.

The relevant APIs for configuring the hardware for PDCP
is added for various modes and crypto algorithms.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Signed-off-by: Horia Geanta Neag <horia.geanta@nxp.com>
Signed-off-by: Alex Porosanu <alexandru.porosanu@nxp.com>
Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>

commit | commitdiff | tree

Akhil Goyal [Tue, 16 Oct 2018 10:39:00 +0000 (10:39 +0000)]

security: support PDCP

Packet Data Convergence Protocol (PDCP) is added in rte_security
for 3GPP TS 36.323 for LTE.

The patchset provide the structure definitions for configuring the
PDCP sessions and relevant documentation is added.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>

commit | commitdiff | tree

Gagandeep Singh [Tue, 23 Oct 2018 11:54:00 +0000 (11:54 +0000)]

crypto/caam_jr: fix type redefinition

dma_addr_t is already defined in compat.h.
so removing the local definition from caam_jr_config.h

Fixes: 64c0451f5bb9 ("crypto/caam_jr: add HW tuning options")
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>

commit | commitdiff | tree

Akhil Goyal [Tue, 23 Oct 2018 13:47:06 +0000 (13:47 +0000)]

drivers: fix build if security lib disabled

RTE_SECURITY is enabled by default. If it is disabled, dpaa2_sec,
dpaa_sec and caam_jr compilation fails.

This patch fixes compilation by disabling these drivers
when rte_security is not available.

Fixes: 1ee9569576f6 ("config: enable dpaaX drivers for generic ARMv8")
Fixes: 09e1e8d256b0 ("mk: fix dependencies of dpaaX drivers")
Fixes: af7c9b5e9ce7 ("crypto/caam_jr: introduce basic driver")
Cc: stable@dpdk.org
Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>

commit | commitdiff | tree

Dariusz Stojaczyk [Wed, 24 Oct 2018 10:05:17 +0000 (12:05 +0200)]

ipc: fix undefined behavior in no-shconf mode

In no-shconf mode the rte_mp_request_sync() wasn't initializing
the `reply` parameter, which contained e.g. a number of sent
requests. Callers of rte_mp_request_sync() might check that
param afterwards and might read potentially unitialized memory.

The no-shconf check that makes us return early (with rc = 0) was
placed before the `reply` initialization. Fix this by making the
`reply` initialization occur first.

Fixes: 5848e3d2813c ("ipc: support --no-shconf mode")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

Thomas Monjalon [Tue, 23 Oct 2018 16:01:40 +0000 (18:01 +0200)]

kvargs: fix processing a null list

In the doxygen description of rte_kvargs_process(), it is said:
If *kvlist* is NULL function does nothing.
It has been added by mistake here instead of rte_kvargs_free().
Anyway, null list should be correctly handled in both functions.

Comments are fixed in both functions and NULL handling is added
to rte_kvargs_process().

Fixes: c34af7424e09 ("kvargs: fix freeing behaviour for null")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

commit | commitdiff | tree

Anatoly Burakov [Mon, 22 Oct 2018 12:57:03 +0000 (13:57 +0100)]

mem: fix resource leak

Segment preallocation code allocates an array of structures on the
heap but does not free the memory afterwards. Fix it by freeing it
at the end of the function, and changing control flow to always go
through that code path.

Coverity issue: 323524
Fixes: 1dd342d0fdc4 ("mem: improve segment list preallocation")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

Dan Gora [Wed, 17 Oct 2018 00:22:44 +0000 (21:22 -0300)]

test: fix build of external memory test

There was a compilation error in test_external_mem.c:

  CC test_external_mem.o
test_external_mem.c: In function ‘test_external_mem’:
test_external_mem.c:375:2: error: ‘for’ loop initial declarations are
                           only allowed in C99 mode

  for (int i = 0; i < n_pages; i++) {
  ^
test_external_mem.c:375:2: note: use option -std=c99 or -std=gnu99 to
                           compile your code

Fixes: b270daa43b3d ("test: support external memory")
Signed-off-by: Dan Gora <dg@adax.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

Qi Zhang [Mon, 22 Oct 2018 06:15:16 +0000 (14:15 +0800)]

eal: fix bus name read for removal in multi-process

A crash may appear when removing some PCI devices because
dev->devargs is not always initialized. So use dev->bus instead of
dev->devargs->bus when building devargs string to remove a device.

Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>

commit | commitdiff | tree

Qi Zhang [Mon, 22 Oct 2018 05:47:11 +0000 (13:47 +0800)]

bus/vdev: fix uninitialized device bus

Device bus should be initialized after bus scan.
While it does not happened when scan vdev from secondary process,
that cause segment fault at rte_dev_probe when call dev->bus->xxx.

Fixes: cdb068f031c6 ("bus/vdev: scan by multi-process channel")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>

commit | commitdiff | tree

Anatoly Burakov [Fri, 5 Oct 2018 08:29:44 +0000 (09:29 +0100)]

mem: improve segment list preallocation

Current code to preallocate segment lists is trying to do
everything in one go, and thus ends up being convoluted,
hard to understand, and, most importantly, does not scale beyond
initial assumptions about number of NUMA nodes and number of
page sizes, and therefore has issues on some configurations.

Instead of fixing these issues in the existing code, simply
rewrite it to be slightly less clever but much more logical, and
provide ample comments to explain exactly what is going on.

We cannot use the same approach for 32-bit code because the
limitations of the target dictate current socket-centric
approach rather than type-centric approach we use on 64-bit
target, so 32-bit code is left unmodified. FreeBSD doesn't
support NUMA so there's no complexity involved there, and thus
its code is much more readable and not worth changing.

Fixes: 1d406458db47 ("mem: make segment preallocation OS-specific")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

Anatoly Burakov [Thu, 4 Oct 2018 10:20:39 +0000 (11:20 +0100)]

eal: improve musl compatibility of thread log

Musl complains about pthread id being of wrong size, because on
musl, pthread_t is a struct pointer, not an unsigned int. Fix the
printing code by casting pthread id to unsigned pointer type and
adjusting the format specifier to be of appropriate size.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Anatoly Burakov [Thu, 4 Oct 2018 10:20:38 +0000 (11:20 +0100)]

eal: improve musl compatibility of string functions

Musl wraps various string functions such as strlcpy in order to
harden them. However, the fortify wrappers are included without
including the actual string functions being wrapped, which
throws missing definition compile errors. Fix by including
string.h in string functions header.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Anatoly Burakov [Thu, 4 Oct 2018 10:20:37 +0000 (11:20 +0100)]

mem: improve musl compatibility

When built against musl, fcntl.h doesn't silently get included.
Fix by including it explicitly.

Bugzilla ID: 31

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Anatoly Burakov [Thu, 4 Oct 2018 10:20:36 +0000 (11:20 +0100)]

eal/linux: improve musl compatibility

When built against musl, fcntl.h doesn't silently get included.
Fix by including it explicitly.

Bugzilla ID: 33

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Anatoly Burakov [Thu, 4 Oct 2018 10:20:35 +0000 (11:20 +0100)]

fbarray: improve musl compatibility

When built against musl, fcntl.h doesn't silently get included.
Fix by including it explicitly.

Bugzilla ID: 34

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Anatoly Burakov [Thu, 4 Oct 2018 10:20:34 +0000 (11:20 +0100)]

vfio: improve musl compatibility

Musl already has PAGE_SIZE defined, and our define clashed with it.
Rename our define to SYS_PAGE_SIZE.

Bugzilla ID: 36

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

commit | commitdiff | tree

Anatoly Burakov [Thu, 4 Oct 2018 10:20:33 +0000 (11:20 +0100)]

mk: build with _GNU_SOURCE defined by default

We use _GNU_SOURCE all over the place, but often times we miss
defining it, resulting in broken builds on musl. Rather than
fixing every library's and driver's and application's makefile,
fix it by simply defining _GNU_SOURCE by default for all
builds.

Remove all usages of _GNU_SOURCE in source files and makefiles,
and also fixup a couple of instances of using __USE_GNU instead
of _GNU_SOURCE.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

commit | commitdiff | tree

Thomas Monjalon [Wed, 17 Oct 2018 23:42:52 +0000 (01:42 +0200)]

devargs: fix freeing during device removal

After calling unplug function of a bus, the device is expected
to be freed. It is too late for getting devargs to remove.
Anyway, the buses which implement unplug are already freeing
the devargs, except the PCI bus.
So the call to rte_devargs_remove() is removed from EAL and
added in PCI.

Fixes: 2effa126fbd8 ("devargs: simplify parameters of removal function")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

commit | commitdiff | tree

Raslan Darawsheh [Wed, 17 Oct 2018 15:22:11 +0000 (18:22 +0300)]

app/testpmd: set packet dump based on verbosity level

when changing verbosity level it will configure rx/tx callbacks to dump
packets based on the verbosity value as following:
    1- dump only received packets:
       testpmd> set verbose 1
    2- dump only sent packets:
       testpmd> set verbose 2
    3- dump sent and received packets:
       testpmd> set verbose (any number > 2)
    4- disable dump
       testpmd> set verbose 0

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>

commit | commitdiff | tree

Raslan Darawsheh [Wed, 17 Oct 2018 15:22:10 +0000 (18:22 +0300)]

app/testpmd: add packet dump callbacks

add new rx/tx callback functions to be used for dumping the packets.

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>

commit | commitdiff | tree

Raslan Darawsheh [Wed, 17 Oct 2018 15:22:09 +0000 (18:22 +0300)]

app/testpmd: move dumping packets to a separate function

verbosity for the received/sent packets is needed in all of the
forwarding engines so moving it to be in a separate function

Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>

commit | commitdiff | tree

Phil Yang [Wed, 17 Oct 2018 01:36:30 +0000 (09:36 +0800)]

app/testpmd: fix physical port socket initialization

Once the lcore list setting excluded the socket which physical device
attached, it will cause failure. Meanwhile, it will disable Testpmd
cross NUMA scenario.

Fixes: dbfb8ec7094c ("app/testpmd: optimize mbuf pool allocation")
Cc: stable@dpdk.org
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

commit | commitdiff | tree

Raslan Darawsheh [Wed, 10 Oct 2018 07:01:41 +0000 (07:01 +0000)]

net/tap: fix reported number of Tx packets

When writev fails to send packets it doesn't update the
number of Tx packets, but it still num_tx is updated.

The value that should be returned is the actual number
of sent packets which is num_packets.

Fixes: 02f96a0a82d1 ("net/tap: add TUN/TAP device PMD")
CC: stable@dpdk.org
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>

commit | commitdiff | tree

Stephen Hemminger [Wed, 25 Jul 2018 18:20:19 +0000 (11:20 -0700)]

net/ixgbe: remove mild profanity

At the tail end of comment about barriers (I feel your pain);
remove mild profanity.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

commit | commitdiff | tree

Stephen Hemminger [Wed, 25 Jul 2018 18:20:17 +0000 (11:20 -0700)]

net/bnx2x: remove profanity

No need for profanity in comments.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

commit | commitdiff | tree

Jerin Jacob [Tue, 16 Oct 2018 13:16:43 +0000 (13:16 +0000)]

doc: clarify VLAN and QinQ Tx offload prerequisite

- Fix missing PKT_TX_VLAN mbuf.ol_flag and mbuf.vlan_tci
fields for Tx VLAN INSERT offload.

- Fix missing mbuf.vlan_tci_outer field for Tx QINQ INSERT offload.

- Rename deprecated PKT_TX_QINQ_PKT to PKT_TX_QINQ

Fixes: cba7f53b717d ("ethdev: introduce Tx queue offloads API")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

commit | commitdiff | tree

Nithin Dabilpuram [Tue, 16 Oct 2018 12:45:43 +0000 (12:45 +0000)]

mbuf: fix missing Tx outer UDP checksum flag name

Fix missing Tx outer udp checksum flag name

Fixes: df694a05bfff ("ethdev: add Tx offload outer UDP checksum definition")
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

commit | commitdiff | tree

Jerin Jacob [Tue, 16 Oct 2018 12:45:40 +0000 (12:45 +0000)]

mbuf: fix offload flag name and list

Fix missing PKT_TX* & PKT_RX* ol_flag name and fix ol_flag list.

Fixes: 6d18505efaa6 ("vhost: support UDP Fragmentation Offload")
Fixes: 829a1c2c41dc ("mbuf: extend flow director field")
Fixes: 63c0d74daaa9 ("mbuf: add Tx side tunneling type")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

commit | commitdiff | tree

Shahaf Shuler [Tue, 16 Oct 2018 06:05:17 +0000 (09:05 +0300)]

net/mlx5: fix build on Arm

On some ARM environment, the below compilation error will be seen

dpdk/drivers/net/mlx5/mlx5_flow_dv.c: In function
'flow_dv_translate_item_nvgre':
/tmp/dpdk/drivers/net/mlx5/mlx5_flow_dv.c:785:22: error: pointer targets
in initialization differ in signedness [-Werror=pointer-sign]
const char *tni_v = nvgre_v->tni;

The reason for this error is that nvgre_v->tni is defined as byte array
in size of 3B. However the code in the function iterate till the 4B in
order to copy/set also the subsequent field after it (flow_id)

Fixing by pointing to this struct from a different pointer.

Fixes: fc2c498ccb94 ("net/mlx5: add Direct Verbs translate items")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:47 +0000 (14:40 +0200)]

vhost: enable postcopy protocol feature

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:46 +0000 (14:40 +0200)]

net/vhost: add parameter to enable postcopy

Introduce a new postcopy-support parameter to Vhost PMD that
passes the RTE_VHOST_USER_POSTCOPY_SUPPORT flag at vhost
device register time.

Flag should only be set if application does not prefault guest
memory using, for example, mlockall() syscall.

Default value is 0, meaning that postcopy support is disabled
unless specified explicitly.

Example to enable postcopy support for a given device:

--vdev 'net_vhost0,iface=/tmp/vhost-user1,postcopy-support=1'

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:45 +0000 (14:40 +0200)]

vhost: restrict postcopy live-migration enablement

Postcopy live-migration feature requires the application to
not populate the guest memory. As the vhost library cannot
prevent the application to that (e.g. preventing the
application to call mlockall()), the feature is disabled by
default.

The application should only enable the feature if it does not
force the guest memory to be populated.

In case the user passes the RTE_VHOST_USER_POSTCOPY_SUPPORT
flag at registration but the feature was not compiled,
registration fails.

For the same reason, postcopy and dequeue zero copy features
are not compatible, so don't advertize postcopy support if
dequeue zero copy is requested.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:44 +0000 (14:40 +0200)]

vhost: support postcopy end request

The master sends this message before stopping handling
userfaults, so that the backend closes the userfaultfd.

The master waits for the slave to acknowledge the request
with an empty 64bits payload for synchronization purpose.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:43 +0000 (14:40 +0200)]

vhost: send userfault range addresses back to Qemu

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:42 +0000 (14:40 +0200)]

vhost: avoid useless VhostUserMemory copy

The VHOST_USER_SET_MEM_TABLE payload is copied when handled,
whereas it could directly be referenced.

This is not very important, but next, we'll need to update the
payload and send it back to Qemu.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:41 +0000 (14:40 +0200)]

vhost: register new regions with userfaultfd

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:40 +0000 (14:40 +0200)]

vhost: support postcopy listen message

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:39 +0000 (14:40 +0200)]

vhost: introduce postcopy advise message

This patch opens a userfaultfd and sends it back to Qemu's
VHOST_USER_POSTCOPY_ADVISE request.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:38 +0000 (14:40 +0200)]

vhost: add config flag for postcopy

Postcopy live-migration features relies on userfaultfd,
which was only introduced in kernel v4.3.

This patch introduces a new define to allow building vhost
library on kernels not supporting userfaultfd.

With legacy build system, user has to explicitly set
CONFIG_RTE_LIBRTE_VHOST_POSTCOPY to 'y'.

With Meson build system, RTE_LIBRTE_VHOST_POSTCOPY gets
automatically defined if userfaultfd kernel header is
present.

Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:37 +0000 (14:40 +0200)]

vhost: enable fds passing in vhost-user messages

Passing userfault fds to Qemu will be required for postcopy
live-migration feature.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:36 +0000 (14:40 +0200)]

vhost: pass socket fd to message handling callbacks

This is not used for now, but will be needed for the
special handling of VHOST_USER_SET_MEM_TABLE message
once postcopy will be supported.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:35 +0000 (14:40 +0200)]

vhost: add number of fds to vhost-user messages

As soon as some ancillary data (fds) are received, it is copied
without checking its length.

This patch adds the number of fds received to the message,
which is set in read_vhost_message().

This is preliminary work to support sending fds to Qemu.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:34 +0000 (14:40 +0200)]

vhost: define postcopy protocol flag

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:33 +0000 (14:40 +0200)]

vhost: fix error handling when mem table gets updated

When the memory table gets updated, the rings addresses need
to be translated again. If it fails, we need to exit cleanly
by unmapping memory regions.

Fixes: d5022533c20a ("vhost: retranslate vring addr when memory table changes")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:32 +0000 (14:40 +0200)]

vhost: fix payload size of reply

QEMU doesn't expect any payload for the reply of
VHOST_USER_SET_LOG_BASE request, so don't send any.
Note that the Vhost-user specification isn't clear about
it and would need to be fixed.

Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request")
Cc: stable@dpdk.org
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:31 +0000 (14:40 +0200)]

vhost: clarify reply-ack in case a reply was already sent

For messages that require a reply, a second ack should not be
sent when reply-ack protocol feature is negotiated, even if
the corresponding flag is set in the message.

The code is compliant with the spec but it isn't clear it is,
so this patch adds a comment to make it explicit.

Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

commit | commitdiff | tree

Maxime Coquelin [Fri, 12 Oct 2018 12:40:30 +0000 (14:40 +0200)]

vhost: fix return code of messages requiring replies

VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE
and VHOST_USER_SET_LOG_BASE require replies, so their handlers
should return VH_RESULT_REPLY, not VH_RESULT_OK.

Fixes: 0bff510b5ea6 ("vhost: unify message handling function signature")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

DPDK repo used for reviews