Add Makefiles, meson files, and empty source files for compression PMD.
Handle cases for building either symmetric crypto PMD
or compression PMD or both and the common files both depend on.
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>
David Hunt [Fri, 13 Jul 2018 14:23:01 +0000 (15:23 +0100)]
examples/vm_power: add options to guest app
Add new command line arguments to the guest app to make
testing and validation of the policy usage easier.
These arguments are mainly around setting up the power
management policy that is sent from the guest vm to
to the vm_power_manager in the host
New command line parameters:
-n or --vm-name
sets the name of the vm to be used by the host OS.
-b or --busy-hours
sets the list of hours that are predicted to be busy
-q or --quiet-hours
sets the list of hours that are predicted to be quiet
-l or --vcpu-list
sets the list of vcpus to monitor
-p or --port-list
sets the list of posts to monitor when using a
workload policy.
-o or --policy
sets the default policy type
TIME
WORKLOAD
TRAFFIC
BRANCH_RATIO
The format of the hours or list paramers is a comma-separated
list of integers, which can take the form of
a. x e.g. --vcpu-list=1
b. x,y e.g. --quiet-hours=3,4
c. x-y e.g. --busy-hours=9-12
d. combination of above (e.g. --busy-hours=4,5-7,9)
Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Radu Nicolau <radu.nicolau@intel.com>
David Hunt [Fri, 13 Jul 2018 14:23:00 +0000 (15:23 +0100)]
examples/vm_power: add branch ratio policy type
Add the capability for the vm_power_manager to receive
a policy of type BRANCH_RATIO. This will add any vcpus
in the policy to the oob monitoring thread.
Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Radu Nicolau <radu.nicolau@intel.com>
David Hunt [Fri, 13 Jul 2018 14:22:57 +0000 (15:22 +0100)]
examples/vm_power: allow greater than 64 cores
To facilitate more info per core, change the global_cpu_mask
from a uint64_t to an array. This also removes the limit on
64 cores, allocing the aray at run-time based on the number of
cores found in the system.
Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Radu Nicolau <radu.nicolau@intel.com>
David Hunt [Fri, 13 Jul 2018 14:22:56 +0000 (15:22 +0100)]
examples/vm_power: add oob monitoring functions
This patch introduces the out-of-band (oob) core monitoring
functions.
The functions are similar to the channel manager functions.
There are function to add and remove cores from the
list of cores being monitored. There is a function to initialise
the monitor setup, run the monitor thread, and exit the monitor.
The monitor thread runs in it's own lcore, and is separate
functionality to the channel monitor which is epoll based.
THis thread is timer based. It loops through all monitored cores,
calculates the branch ratio, scales up or down the core, then
sleeps for an interval (~250 uS).
The method it uses to read the branch counters is a pread on the
/dev/cpu/x/msr file, so the 'msr' kernel module needs to be loaded.
Also, since the msr.h file has been made unavailable in recent
kernels, we have #defines for the relevant MSRs included in the
code.
The makefile has a switch for x86 and non-x86 platforms,
and compiles stub function for non-x86 platforms.
Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Radu Nicolau <radu.nicolau@intel.com>
David Hunt [Fri, 13 Jul 2018 14:22:55 +0000 (15:22 +0100)]
examples/vm_power: add core list parameter
Add in the '-l' command line parameter (also --core-list)
So the user can now pass --corelist=4,6,8-10 and it will
expand out to 4,6,8,9,10 using the parse function provided
in parse.c (parse_set).
This list of cores is then used to enable out-of-band monitoring
to scale up and down these cores based on the ratio of branch
hits versus branch misses. The ratio will be low when a poll
loop is spinning with no packets being received, so the frequency
will be scaled down.
Also , as part of this change, we introduce a core_info struct
which keeps information on each core in the system, and whether
we're doing out of band monitoring on them.
Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Radu Nicolau <radu.nicolau@intel.com>
Thomas Monjalon [Fri, 20 Jul 2018 11:34:52 +0000 (13:34 +0200)]
devtools: fix checkpatch for filename with space
If the patch filename or the temporary file path have a space
in their name, the script checkpatches.sh does not work.
The variables for the filenames must be enclosed in quotes
in order to preserve spaces.
Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Neil Horman <nhorman@tuxdriver.com>
Clear vfio_group_fd is not necessary to involve any IPC.
Also, current IPC implementation for SOCKET_CLR_GROUP is not
correct. rte_vfio_clear_group on secondary will always fail,
that prevent device be detached correctly on a secondary process.
The patch simply removes all IPC related stuff in
rte_vfio_clear_group.
Subroutine to unmap VFIO resource is shared by secondary and
primary, and it does not work on the secondary process. Since
for secondary process, it is not necessary to close interrupt
handler, set pci bus mastering and remove vfio_res from
vfio_res_list. So, the patch adds a dedicate function to handle
the situation when a device is unmapped on a secondary process.
When use memcmp to compare two PCI address, sizeof(struct rte_pci_addr)
is 4 bytes aligned, and it is 8. While only 7 byte of struct rte_pci_addr
is valid. So compare the 8th byte will cause the unexpected result, which
happens when repeatedly attach/detach a device.
If hotplug add an already plugged PCI device, it will
cause rte_pci_device->device.name be corrupted due to unexpected
rte_devargs_remove. Also if try to hotplug remove an already
unplugged device, it will cause segment fault due to unexpected
bus->unplug on a rte_device whose driver is NULL.
The patch fix these issues.
Thomas Monjalon [Wed, 18 Jul 2018 21:26:58 +0000 (23:26 +0200)]
devtools: fix symbol check for filename with space
If the patch filename or the temporary file path have a space
in their name, the script check-symbol-change.sh does not work.
The variables for the filenames must be enclosed in quotes
in order to preserve spaces.
Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Neil Horman <nhorman@tuxdriver.com>
Technically, single file segments codepath will never get
triggered when using in-memory mode, because EAL prohibits
mixing these two options at initialization time. However,
code analyzers do not know that, and some will complain
about either using uninitialized variables, or trying to
do operations on an already closed descriptor.
Fix this by assuring the compiler or code analyzer that
in-memory mode code never gets triggered when using
single-file segments mode.
Previously, we were skipping erasing pad because we were
expecting it to be freed when we were merging adjacent
segments. However, if there were no adjacent segments to
merge, we would've skipped erasing the pad, leaving non-zero
memory in our free space.
Fix this by including pad in the erasing unconditionally.
Fixes: e43a9f52b7ff ("malloc: fix pad erasing") Cc: stable@dpdk.org Reported-by: Andrew Rybchenko <arybchenko@solarflare.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Currently, we need runtime dir to put all of our runtime info in,
including the DPDK shared config. However, we use the shared
config to determine our proc type, and this happens earlier than
we actually create the config dir and thus can know where to
place the config file.
Fix this by moving runtime dir creation right after the EAL
arguments parsing, but before proc type autodetection. Also,
previously we were creating the config file unconditionally,
even if we specified no_shconf - fix it by only creating
the config file if no_shconf is not set.
Fixes: adf1d867361c ("eal: move runtime config file to new location") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com>
The original code did not align any addresses that were requested as
page-aligned, but were different because addr_is_hint was set.
Below fix by Dariusz has introduced an issue where all unaligned addresses
were left as unaligned.
This patch is a partial revert of
commit 7fa7216ed48d ("mem: fix alignment of requested virtual areas")
and implements a proper fix for this issue, by asking for alignment in all
but the following two cases:
1) page size is equal to system page size, or
2) we got an aligned requested address, and will not accept a different one
This ensures that alignment is performed in all cases, except for those we
can guarantee that the address will not need alignment.
Fixes: b7cc54187ea4 ("mem: move virtual area function in common directory") Fixes: 7fa7216ed48d ("mem: fix alignment of requested virtual areas") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Acked-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Pablo de Lara [Mon, 16 Jul 2018 06:26:27 +0000 (07:26 +0100)]
devargs: fix build with gcc 4.7
Fixed possible out-of-bounds issue:
lib/librte_eal/common/eal_common_devargs.c:
In function ‘rte_devargs_layers_parse’:
lib/librte_eal/common/eal_common_devargs.c:121:7:
error: array subscript is above array bounds
Bugzilla ID: 71 Fixes: 338327d731e6 ("devargs: add function to parse device layers") Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Neil Horman [Wed, 27 Jun 2018 18:01:01 +0000 (14:01 -0400)]
devtools: add checks for ABI symbol addition
Recently, some additional patches were added to allow for programmatic
marking of C symbols as experimental. The addition of these markers is
dependent on the manual addition of exported symbols to the EXPERIMENTAL
section of the corresponding libraries version map file. The consensus
on review is that, in addition to mandating the addition of symbols to
the EXPERIMENTAL version in the map, we need a mechanism to enforce our
documented process of mandating that addition when they are introduced.
To that end, I am proposing this change. It is an addition to the
checkpatches script, which scan incoming patches for additions and
removals of symbols to the map file, and warns the user appropriately.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
app/testpmd: fix typo in setting Tx offload command
udp_cksum is duplicated, second one should be tcp_cksum
Fixes: c73a9071877a ("app/testpmd: add commands to test new offload API") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
If "--disable-crc-strip" testpmd parameter issued, it removes the
DEV_RX_OFFLOAD_CRC_STRIP flag.
With introduction of new DEV_RX_OFFLOAD_KEEP_CRC offload flag, this
flag also should be set when this parameter issued.
Fixes: 70815c9ecadd ("ethdev: add new offload flag to keep CRC") Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
A device iterator allows iterating over a set of devices.
This set is defined by the two descriptions offered,
* rte_bus
* rte_class
Only one description can be provided, or both. It is not allowed to
provide no description at all.
Each layer of abstraction then performs a filter based on the
description provided. This filtering allows iterating on their internal
set of devices, stopping when a match is valid and returning the current
iteration context.
This context allows starting the next iteration from the same point and
going forward.
This abstraction exists since the infancy of DPDK.
It needs to be fleshed out however, to allow a generic
description of devices properties and capabilities.
A device class is the northbound interface of the device, intended
for applications to know what it can be used for.
With current implementation, we are not checking for queue_id range
and stat_idx range in stats mapping function. This patch will add
check for queue_id and stat_idx range.
Fixes: 5de201df892 ("ethdev: add stats per queue") Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
The driver supports Hyper-V networking directly like
virtio for KVM or vmxnet3 for VMware.
This code is based off of the FreeBSD driver. The file and variable
names are kept the same to help with understanding (with most of the
BSD style warts removed).
This version supports the latest NetVSP 6.1 version and
older versions.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
This patch adds support for an additional bus type Virtual Machine BUS
(VMBUS) on Microsoft Hyper-V in Windows 10, Windows Server 2016
and Azure. Most of this code was extracted from FreeBSD and some of
this is from earlier code donated by Brocade.
Only Linux is supported at present, but the code is split
to allow future FreeBSD and Windows support.
The bus support relies on the uio_hv_generic driver from Linux
kernel 4.16. Multiple queue support requires additional sysfs
interfaces which is in kernel 5.0 (a.k.a 4.17).
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Dan Gora [Mon, 18 Jun 2018 23:35:34 +0000 (16:35 -0700)]
mbuf: add accessor function for private data area
Add an inline accessor function to return the starting address of
the private data area in the supplied mbuf.
This allows applications to easily access the private data area between
the struct rte_mbuf and the data buffer in the specified mbuf without
creating private macros or accessor functions.
No checks are made to ensure that a private data area actually exists
in the buffer.
Signed-off-by: Dan Gora <dg@adax.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
This patch adds support for building and running mlx5 PMD on
32bit systems such as i686.
The main issue to tackle was handling the 32bit access to the UAR
as quoted from the mlx5 PRM:
QP and CQ DoorBells require 64-bit writes. For best performance, it
is recommended to execute the QP/CQ DoorBell as a single 64-bit write
operation. For platforms that do not support 64 bit writes, it is
possible to issue the 64 bits DoorBells through two consecutive
writes,
each write 32 bits, as described below:
* The order of writing each of the Dwords is from lower to upper
addresses.
* No other DoorBell can be rung (or even start ringing) in the midst
of an on-going write of a DoorBell over a given UAR page.
The last rule implies that in a multi-threaded environment, the access
to a UAR page (which can be accessible by all threads in the process)
must be synchronized (for example, using a semaphore) unless an atomic
write of 64 bits in a single bus operation is guaranteed. Such a
synchronization is not required for when ringing DoorBells on different
UAR pages.
The flow counter support introduced by
commit 9a761de8ea14 ("net/mlx5: flow counter support") was intend to
work only with MLNX_OFED_4.3 as the upstream rdma-core
libraries were lack such support.
On rdma-core v19 the support for the flow counters was added but with
different user APIs, hence causing compilation issues on the PMD.
This patch fix the compilation errors by forcing the flow counters
to be enabled only with MLNX_OFED APIs.
Once MLNX_OFED and rdma-core APIs will be aligned, a proper patch to
support the new API will be submitted.
Fixes: 9a761de8ea14 ("net/mlx5: flow counter support") Cc: stable@dpdk.org Reported-by: Stephen Hemminger <stephen@networkplumber.org> Reported-by: Ferruh Yigit <ferruh.yigit@intel.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>
RSS level is necessary to had a bit in the hash_fields which is already
provided in this API, for the tunnel, it is necessary to request such
queue to compute the checksum on the inner most, this last one should
always be activated.
Previous work introduce verbs priorities, whereas the PMD is making
translation between Flow priority into Verbs. Rename this to make more
sense on what the PMD has to translate.
Drop queues are essentially used in flows due to Verbs API, the
information if the fate of the flow is a drop or not is already present
in the flow. Due to this, drop queues can be fully mapped on regular
queues.
This start a series to re-work the flow engine in mlx5 to easily support
flow conversion to Verbs or TC. This is necessary to handle both regular
flows and representors flows.
As the full file needs to be clean-up to re-write all items/actions
processing, this patch starts to disable the regular code and only let the
PMD to start in isolated mode.
Prior to this patch, all port representors detected on a given device were
probed and Ethernet devices instantiated for each of them.
This patch adds support for the standard "representor" parameter, which
implies that port representors are not probed by default anymore, except
for the list provided through device arguments.
(Patch based on prior work from Yuanhan Liu)
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Reviewed-by: Xueming Li <xuemingl@mellanox.com>
net/mlx5: probe port representors in natural order
Port representors are probed in whatever unspecified order
ibv_get_device_list() returns them.
This is counterintuitive to users since DPDK port IDs assignment almost
never follows the same sequence as representor IDs. Additionally, the
master device does not necessarily inherit the lowest DPDK port ID.
The current PCI probing method is not aware of Verbs port representors,
which appear as standard Verbs devices bound to the same PCI address and
cannot be distinguished.
Problem is that more often than not, the wrong Verbs device is used,
resulting in unexpected traffic.
This patch makes the driver discard representors to only use the master
device. If unable to identify it (e.g. kernel drivers not recent enough),
either:
- There is only one matching device which isn't identified as a
representor, in that case use it.
- Otherwise log an error and do not probe the device.
(Patch based on prior work from Yuanhan Liu)
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Reviewed-by: Xueming Li <xuemingl@mellanox.com>
Since commit "net/mlx5: drop useless support for several Verbs ports"
removed an inner loop, mlx5_dev_spawn() is left with an unnecessary indent
level.
This patch eliminates a block, moves its local variables to function scope,
and re-indents its contents (diff best viewed with --ignore-all-space).
No functional impact.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Reviewed-by: Xueming Li <xuemingl@mellanox.com>
net/mlx5: drop useless support for several Verbs ports
Unlike mlx4 from which this capability was inherited, mlx5 devices expose
exactly one Verbs port per PCI bus address. Each physical port gets
assigned its own bus address with a single Verbs port.
While harmless, this code requires an extra loop that would get in the way
of subsequent refactoring.
Pablo de Lara [Fri, 13 Jul 2018 04:51:03 +0000 (05:51 +0100)]
test/power: fix 32-bit build
Compilation issue:
test/test/test_power_acpi_cpufreq.c:556:31:
error: format ‘%lx’ expects argument of type ‘long unsigned int’,
but argument 2 has type ‘uint64_t {aka long long unsigned int}’
Fixes: 39e38d583075 ("test/power: add unit test for get capabilities API") Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Radu Nicolau <radu.nicolau@intel.com>
Implement the final piece of the in-memory mode puzzle - enable running
DPDK entirely in memory, without creating any files.
To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work
without hugetlbfs mountpoints. In order to enable this, a few things needed
to be changed.
First of all, we need to allow empty hugetlbfs mountpoints in
hugepage_info, and handle them correctly (by not trying to create any
files and lock any directories).
Next, we need to reorder the mapping sequence, because the page is not
really allocated until the page fault, and we cannot get its IOVA
address before we trigger the page fault.
Finally, decide at compile time whether we are going to be supporting
anonymous hugepages or not, because we cannot check for it at runtime.
This command-line option will cause DPDK to operate entirely in
memory and not create any shared files at runtime, including any
shared configuration or hugetlbfs files. This is useful for debug
purposes, as well as for certain use cases like containers or
automatic memory cleanup.
Currently, this option acts as a strict superset of --no-shconf and
--huge-unlink commands.
Unlink hugepages after creating them, to honor the hugepage-unlink mode.
We cannot resize non-existing files, so make single file segments
explicitly unsupported.
IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no-shconf mode, IPC will be useless, so
do not enable it in the first place. In the interests of API usage
convenience, we will still allow registering callbacks, but obviously
they won't ever be triggered.
When using --no-shconf option, the expectation is that no multiprocess
will be supported as no shared files are created. However, fbarray still
creates some shared files that prevent multiple processes with the same
prefix from starting.
Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.
As per deprecation notice [1], move DPDK runtime config to default
DPDK runtime data location. Also, remove the deprecation notice and
update release notes to indicate the changes.
Anatoly Burakov [Tue, 26 Jun 2018 10:53:18 +0000 (11:53 +0100)]
doc: add IPC callback limitations
For asynchronous requests, user callback may be triggered either from
IPC thread or from interrupt thread. Because of this, delivery of
other interrupt-based events such as alarms may not be possible inside
the asynchronous IPC request callback handler. Document this
limitation.
Anatoly Burakov [Tue, 26 Jun 2018 10:53:17 +0000 (11:53 +0100)]
ipc: remove thread for async requests
Previously, we were using two IPC threads - one to handle messages
and synchronous requests, and another to handle asynchronous requests.
To handle replies for an async request, rte_mp_handle woke up the
rte_mp_handle_async thread to process through pthread_cond variable.
Change it to handle asynchronous messages within the main IPC thread.
To handle timeout events, for each async request which is sent,
we set an alarm for it. If its reply is received before timeout,
we will cancel the alarm when we handle the reply; otherwise,
alarm will invoke the async_reply_handle() as the alarm callback.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Jianfeng Tan [Tue, 26 Jun 2018 10:53:16 +0000 (11:53 +0100)]
eal: bring forward init of interrupt handling
Next commit will make asynchronous IPC requests rely on alarm API,
which in turn relies on interrupts to work. Therefore, move the EAL
interrupt initialization before IPC initialization to avoid breaking
IPC in the next commit.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Anatoly Burakov [Tue, 26 Jun 2018 10:53:15 +0000 (11:53 +0100)]
eal/bsd: support alarm API
Implement EAL alarm API support for FreeBSD. The implementation
is largely identical to that of Linux version, with one key
difference.
The alarm API is a little Linux-centric in that it is expecting
the alarm API to manage alarm timeouts without involvement of the
interrupt thread. This works on Linux because in Linux, there's
timerfd API which allows waiting for timer events on an fd.
On FreeBSD, however, there are no timerfd's, and timer events are
set up directly in kevent. There is no way to pass information from
the alarm API to the interrupt thread, so we also add a little
back-channel magic to get soonest alarm timeout from the alarm API.
Anatoly Burakov [Tue, 26 Jun 2018 10:53:14 +0000 (11:53 +0100)]
eal/bsd: add interrupt thread
Add interrupt thread to FreeBSD. It is largely a copy-paste from
Linuxapp interrupt thread, except for a few key differences:
* Use kevent instead of epoll
* Do not recreate the event queue on adding/removing interrupt
sources, add/remove them to/from the queue on the fly instead
* No support for UIO/VFIO handles