In some environments, the PCI domain can be larger than 16 bits.
For example, a PCI device passed through in Azure gets a synthetic domain
id which is internally generated based on GUID. The PCI standard does
not restrict domain to be 16 bits.
This change breaks ABI for API's that expose PCI address structure.
The printf format for PCI remains unchanged, so that on most
systems (with only 16 bit domain) the output format is unchanged
and is 4 characters wide. For example: 0000:00:01.0
Only on sysetms with higher bits will the domain take up more
space; example: 12000:00:01.0
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Ferruh Yigit [Fri, 9 Jun 2017 18:36:06 +0000 (19:36 +0100)]
ethdev: use device name from device structure
Device name resides in two different locations, in rte_device->name and
in ethernet device private data.
For now, the copy in the ethernet device private data is required for
multi process support, the name is the how secondary process finds about
primary process device.
But in the ethdev library some eth_dev->data->name usage can be
converted to rte_device->name.
This patch updates ethdev to use rte_device->name when possible.
Ferruh Yigit [Fri, 9 Jun 2017 18:36:05 +0000 (19:36 +0100)]
drivers/net: use device name from device structure
Device name resides in two different locations, in rte_device->name and
in ethernet device private data.
For now, the copy in the ethernet device private data is required for
multi process support, the name is the how secondary process finds about
primary process device.
But for drivers there is no reason to use the copy in the ethernet
device private data.
This patch updates PMDs to use only rte_device->name.
Qi Zhang [Tue, 13 Jun 2017 03:07:05 +0000 (23:07 -0400)]
ethdev: add fuzzy match in flow API
Add new meta pattern item RTE_FLOW_TYPE_ITEM_FUZZY in flow API.
This is for device that support fuzzy match option.
Usually a fuzzy match is fast but the cost is accuracy.
i.e. Signature Match only match pattern's hash value, but it is
possible that two different patterns have the same hash value.
Matching accuracy level can be configured by subfield threshold.
Driver can divide the range of threshold and map to different
accuracy levels that device support.
Jianfeng Tan [Mon, 26 Jun 2017 06:49:46 +0000 (06:49 +0000)]
eal: fix config file path when checking process
When primary process is booted with --file-prefix option, the API,
rte_eal_primary_proc_alive(), uses a wrong config file path to
check if primary process is alive.
Fix it by calling helper function to get config file path.
Fixes: dd3e00138d74 ("eal: check if primary process is alive") Cc: stable@dpdk.org Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Jianfeng Tan [Mon, 3 Jul 2017 06:37:31 +0000 (06:37 +0000)]
ethdev: fix secondary process crash on unused virtio
Suppose we have 2 virtio devices for a VM, with only the first one,
virtio0, binding to igb_uio. Start a primary DPDK process, driving
only virtio0. Then start a secondary DPDK process, it encounters
segfault at eth_virtio_dev_init() because hw is NULL, when trying
to initialize the 2nd virtio devices.
1539 if (!hw->virtio_user_dev) {
We could add a precheck to return error when hw is NULL. But the
root cause is that virtio devices which are not driven by the primary
process are not exluded by secondary eal probe function.
To support legacy virtio devices bound to none kernel driver, we
removed RTE_PCI_DRV_NEED_MAPPING in
commit 962cf902e6eb ("pci: export device mapping functions").
At the boot of primary process, ether dev is allocated in rte_eth_devices
array, rte_eth_dev_data is also allocated in rte_eth_dev_data array; then
probe function fails; and ether dev is released. However, the entry in
rte_eth_dev_data array is not cleared. Then we start secondary process,
and try to attach the virtio device that not used in primary process,
the field, dev_private (or hw), in rte_eth_dev_data, is NULL.
To fail the dev attach, we need to clear the field, name, when we
release any ether devices in primary, so that below loop in
rte_eth_dev_attach_secondary() will not find any matched names.
for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
if (strcmp(rte_eth_dev_data[i].name, name) == 0)
break;
}
Fixes: 6d890f8ab512 ("net/virtio: fix multiple process support") Cc: stable@dpdk.org Reported-by: Reshma Pattan <reshma.pattan@intel.com> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
mem: do not advertise physical address when no hugepages
When populating a mempool with a virtual memory area, the mempool
library expects to be able to get the physical address of each page.
When started with --no-huge, the physical addresses may not be available
because the pages are not locked in memory. It sometimes returns
RTE_BAD_PHYS_ADDR, which makes the mempool_populate() function to fail.
This was working before the commit cdc242f260e7 ("eal/linux: support
running as unprivileged user"), because rte_mem_virt2phy() was returning
0 instead of RTE_BAD_PHYS_ADDR, which was seen as a valid physical
address.
Since --no-huge is a debug function that breaks the support of physical
drivers, always set physical addresses to RTE_BAD_PHYS_ADDR in memzones
or in rte_mem_virt2phy(), and ensure that mempool won't complain in that
case.
Fixes: cdc242f260e7 ("eal/linux: support running as unprivileged user") Cc: stable@dpdk.org Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Jan Blunck <jblunck@infradead.org>
Jianbo Liu [Tue, 4 Jul 2017 10:23:59 +0000 (18:23 +0800)]
examples/l3fwd: rename file for sequential hash lookup
The l3fwd_em_sse.h is enabled by NO_HASH_LOOKUP_MULTI.
Renaming it because it's only for sequential hash lookup,
and doesn't include any x86 SSE instructions.
Moved the definition of GCC_VERSION from lib/librte_table/rte_lru.h
to lib/librte_eal/common/include/rte_common.h.
Tested compilation on:
* arm64 with gcc
* x86 with gcc and clang
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com> Reviewed-by: Jan Viktorin <viktorin@rehivetech.com> Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
armv8-a has optional CRYPTO extension which adds the
AES, PMULL, SHA1 and SHA2 capabilities. -march=armv8-a+crypto
enables code generation for the ARMv8-A architecture together
with the optional CRYPTO extensions.
Added the following flags to detect the corresponding
capability at compile time.
* RTE_MACHINE_CPUFLAG_AES
* RTE_MACHINE_CPUFLAG_PMULL
* RTE_MACHINE_CPUFLAG_SHA1
* RTE_MACHINE_CPUFLAG_SHA2
At run-time, the following flags can be used to detect the
capabilities.
* RTE_CPUFLAG_AES
* RTE_CPUFLAG_PMULL
* RTE_CPUFLAG_SHA1
* RTE_CPUFLAG_SHA2
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com> Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
Since this example is for x86_64 platforms only, and since SSE4 is now a
mandatory requirement, we can remove the ifdefs checking for that
instruction set level, and the fallbacks if it is not present.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Bruce Richardson [Tue, 20 Jun 2017 15:23:00 +0000 (16:23 +0100)]
hash: remove checks for SSE
Since SSE4 is now part of the minimum requirements for DPDK, we don't need
a fallback case to handle selection of algorithm when SSE4 is unavailable.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Bruce Richardson [Tue, 20 Jun 2017 15:22:59 +0000 (16:22 +0100)]
eal: remove unneeded conditionals for SSE headers
Our x86 baseline is to have support for SSE4.2, so therefore there is no
point in conditions around the inclusion of SSE1 - SSE4 headers.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Bruce Richardson [Tue, 20 Jun 2017 15:22:56 +0000 (16:22 +0100)]
mk: require SSE4.2 support on all x86 platforms
Increase the default baseline from "core2" architecture to "corei7". This
means that all builds will have SSE4.2 support included, and we can remove
special case manipulation of CFLAGS for the same. Naturally, this does mean
that some machines that previously could run DPDK now can't do so, but
hardware with SSE4.2 has been around for almost a decade now, so this
should not be a major problem.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tiwei Bie [Sun, 4 Jun 2017 05:53:24 +0000 (13:53 +0800)]
contigmem: do not zero pages during each mmap
Don't zero the pages during each mmap. Instead, only zero the pages
when they are not already mmapped. Otherwise, the multi-process
support will be broken, as the pages will be zeroed when secondary
processes map the memory. Besides, track the open and mmap operations
on the cdev, and prevent the module from being unloaded when it is
still in use.
Fixes: 82f931805506 ("contigmem: zero all pages during mmap") Cc: stable@dpdk.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Jan Blunck [Fri, 30 Jun 2017 18:19:31 +0000 (20:19 +0200)]
bus: add method to find device
This new method allows buses to expose their devices in a controlled
manner. A comparison function is provided by the user to discriminate
between devices, using arbitrary data as identifier.
It is possible to start an iteration from a specific point, in order to
continue a search.
Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Jerin Jacob [Mon, 5 Jun 2017 08:58:39 +0000 (14:28 +0530)]
eal/arm32: add empty pause function
The patch does not provide any functional change for ARM32
with respect to existing rte_pause() definition.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Jan Viktorin <viktorin@rehivetech.com> Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
Moved all common defines from defconfig_arm64-armv8a-linuxapp-gcc
to common_armv8a_linuxapp.
Created new config arm64-armv8a-linuxapp-clang which adds the
clang support to armv8a.
Now defconfigs arm64-armv8a-linuxapp-gcc/clang contain only the
CONFIG_RTE_TOOLCHAIN* defines and all other common defines are
inherited from common_armv8a_linuxapp.
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com> Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Jianbo Liu <jianbo.liu@linaro.org> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Fixed warning -Wasm-operand-widths seen with armv8a
clang compilation.
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com> Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Fixed warning -Wunknown-warning-option seen with
armv8a clang compilation.
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com> Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Replaced usage of %a0 in inline assembly with [%x0]
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com> Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Compile the armv8a CRC32 support only if the machine
has the CRC extensions i.e if RTE_MACHINE_CPUFLAG_CRC32
is defined.
Removed the .arch assembly directives as these are no
more necessary.
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com> Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Instead of simply busy-waiting for slave in rte_eal_wait_lcore()
do rte_pause(). This will give power savings.
This also fixes warning -Wempty-body seen with armv8a clang
compilation.
Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
* Moved all x86 related lru defines to rte_lru_x86.h while
retaining all common defines in rte_lru.h
* Verified the changes with table_autotest unit test case
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
* Removed setting CONFIG_RTE_SCHED_VECTOR=n from armv8a config
so that the setting from common_base is taken as the default
setting for armv8a
* Verified the changes with sched_autotest unit test case
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com> Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
RongQiang Xie [Thu, 15 Jun 2017 10:20:48 +0000 (18:20 +0800)]
app/testpmd: fix comments for bonding commands
Because the comments in function cmd_add_bonding_slave_parsed() and
cmd_remove_bonding_slave_parsed() is 'Set the primary slave for
a bonded device',so fix it with 'add the slave for a bonded device'
and 'remove the slave from a bonded device'.
At some places, the log2() function is used despite this function
works on float. This introduces a dependency to the math lib but
most of the time it is not required because we want an integer log2.
Add a new helper to do this job and fix nfp driver.
Nikhil Rao [Mon, 1 Aug 2016 05:49:48 +0000 (11:19 +0530)]
ethdev: fix a typo in global API introduction
This patch fixes a typo in the eth device API doc, device
config. not stored between calls to rte_eth_dev_start/stop()
should be restored before a call to rte_eth_dev_start()
instead of after a call to rte_eth_dev_start().
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>
Rami Rosen [Sat, 17 Jun 2017 20:13:45 +0000 (23:13 +0300)]
doc: fix a typo in sample apps guide
This patch fixes a trivial typo in the sample apps guide.
commit 35b09d76f89e ("doc: use corelist instead of coremask") replaced
the usage of coremask (-c) with corelist (-l).
As a result of this patch, we have
./build/ipv4_multicast -l 0-3 -n 3 -- -p 0x3 -q 1
in the sample app guide, while the explanation immediately following
says:
In this command:
• The -c option enables cores 0, 1, 2 and 3
This patch fixes the
explanation to have "-l" instead of "-c".
Fixes: 35b09d76f89e ("doc: use corelist instead of coremask") Cc: stable@dpdk.org Signed-off-by: Rami Rosen <rami.rosen@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>
Change the rte_eth_dev_callback_process function to return int,
and add a void *ret_param parameter.
The new parameter is used by ixgbe and i40e instead of abusing
the user data of the callback.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Daniel Verkamp [Fri, 2 Jun 2017 20:12:13 +0000 (13:12 -0700)]
ring: use aligned memzone allocation
rte_memzone_reserve() provides cache line alignment, but
struct rte_ring may require more than cache line alignment: on x86-64,
it needs 128-byte alignment due to PROD_ALIGN and CONS_ALIGN, which are
128 bytes, but cache line size is 64 bytes.
kni: allocate no more mbuf than empty slots in queue
In kni_allocate_mbufs(), we attempt to add max_burst (32) count of mbuf
always into alloc_q, which is excessively leading too many rte_pktmbuf_
free() when alloc_q is contending at high packet rate (for eg 10Gig data).
In a situation when alloc_q fifo can only accommodate very few (or zero)
mbuf, create only what needed and add in fifo.
With this patch, we could stop random network stall in KNI at higher packet
rate (eg 1G or 10G data between vEth0 and PMD) sufficiently exhausting
alloc_q on above condition. I tested i40e PMD for this purpose in ppc64le.
Vasily Philipov [Wed, 28 Jun 2017 12:25:12 +0000 (15:25 +0300)]
mbuf: fix debug checks for headroom and tailroom
rte_pktmbuf_headroom() and rte_pktmbuf_tailroom() should be usable
with any segment, not only with headered ones, so is_header should be 0
when we call for sanity check inside them.
Jerin Jacob [Tue, 27 Jun 2017 11:57:51 +0000 (17:27 +0530)]
mbuf: reduce pktmbuf init cycles
There is no need for initializing the complete
packet buffer with zero as the packet data area will be
overwritten by the NIC Rx HW anyway.
The testpmd configures the packet mempool
with around 180k buffers with
2176B size. In existing scheme, the init routine
needs to memset around ~370MB vs the proposed scheme
requires only around ~22MB on 128B cache aligned system.
Useful in running DPDK in HW simulators/emulators,
where millions of cycles have an impact on boot time.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Ilya Maximets [Thu, 29 Jun 2017 05:59:19 +0000 (08:59 +0300)]
mem: balanced allocation of hugepages
Currently EAL allocates hugepages one by one not paying attention
from which NUMA node allocation was done.
Such behaviour leads to allocation failure if number of available
hugepages for application limited by cgroups or hugetlbfs and
memory requested not only from the first socket.
Example:
# 90 x 1GB hugepages availavle in a system
cgcreate -g hugetlb:/test
# Limit to 32GB of hugepages
cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
# Request 4GB from each of 2 sockets
cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...
EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
EAL: 32 not 90 hugepages of size 1024 MB allocated
EAL: Not enough memory available on socket 1!
Requested: 4096MB, available: 0MB
PANIC in rte_eal_init():
Cannot init memory
This happens beacause all allocated pages are
on socket 0.
Fix this issue by setting mempolicy MPOL_PREFERRED for each hugepage
to one of requested nodes using following schema:
1) Allocate essential hugepages:
1.1) Allocate as many hugepages from numa N to
only fit requested memory for this numa.
1.2) repeat 1.1 for all numa nodes.
2) Try to map all remaining free hugepages in a round-robin
fashion.
3) Sort pages and choose the most suitable.
In this case all essential memory will be allocated and all remaining
pages will be fairly distributed between all requested nodes.
New config option RTE_EAL_NUMA_AWARE_HUGEPAGES introduced and
enabled by default for linuxapp except armv7 and dpaa2.
Enabling of this option adds libnuma as a dependency for EAL.
Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Flag dev_started should be cleared after dev_stop() function call
because the flag is checked inside the dev_stop() function.
Fixes: d11b0f30df88 ("cryptodev: introduce API and framework for crypto devices") Cc: stable@dpdk.org Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>