The virtual queue ring size and the number of slots actually usable
are separate parameters. In the most common environment (QEMU)
the virtual queue ring size is 256, but some environments the
ring maybe much larger.
The ring size comes from the host and the driver must use the
actual size passed.
The number of descriptors can be either zero to use the whole
available ring, or some value smaller. This is used to limit
the number of mbufs allocated for the receive ring. If more
descriptors are requested than available the size is silently
truncated.
Note: the ring size (from host) must be a power of two, but
the number of descriptors used can be any size from 1 to the
size of the virtual ring.
Fixes: d78deadae4dc ("virtio: fix ring size negotiation") Reported-by: Changchun Ouyang <changchun.ouyang@intel.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Malloc was moved to the EAL and dummy malloc library was left
to not break apps that had a librte_malloc.so dependency.
Note that the dummy library will be removed in the next release.
When building a combined library, all objects are copied to the same
directory before creating the library itself.
There are a few issues:
- CONFIG_RTE_LIBRTE_MALLOC is not a valid option anymore resulting
in wrong syntax and a compilation failure. Fix it by replacing it
with CONFIG_RTE_LIBRTE_EAL.
- As we kept a dummy library, there are now two objects with the
same name. This means that the proper rte_malloc.o object in eal gets
overwritten by an empty rte_malloc.o object from the dummy malloc lib.
Fix it by changing the name of rte_malloc.o object in the dummy
library.
- Update the copyright year.
This problem was discovered when passing invalid PCI id to the
blacklist API in devargs.
Any failures in rte_devargs_add would cause a core dump because
it would call rte_log() before the the EAL log environment was
initialized. Rather than try and log just remove the messages
and leave it up to the caller to check the return value.
Most of the other failure possibilities are when malloc() fails, and if
that happens any logging that used malloc() would also fail.
This failure was not caught by the standalone tests to devargs
because the tests are run after calling rte_eal_init (which is not
how devargs is intended to be used).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Cleanup the code in bonding that checks ports.
* Use standard rte_eth_dev_is_valid_port
* Change name of driver string to avoid variable namespace conflicts
* Get rid of unnecessary string comparison stuff. A simple pointer
check is enough here.
* Get rid of unnecessary assignment of driver_name, it is already
done by common code.
* Don't generate unnecessary log messages on error.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Declan Doherty <declan.doherty@intel.com>
The function rte_eth_dev_is_valid_port is good way to have all
drivers using same function and solves several hotplug related
bugs from drivers not checking attached flag.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Change the log level of startup messages. Anything that is
just normal activity (like getting virtual areas) is changed
to debug level. Anything that is a failure should be NOTICE
or ERR severity.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
i40evf: fix RSS with less Rx queues than Tx queues
I40e VF driver uses the num_queue_pairs in vf structure to construct
queue index look up table. When the nb_rx_queue is less than nb_tx_queue,
num_queue_pairs is equal to nb_tx_queue. It will make the table use
invalid queue index, then application cannot poll packets on these queues.
This patch also moves the inline function i40e_align_floor from
i40e_ethdev.c to i40e_ethdev.h.
Test report: http://dpdk.org/ml/archives/dev/2015-July/021838.html
Due to the NIC's firmware update, the input set of sctp flow is changed
to source IP, destination IP, source port, destination port and
Verification-Tag. This patch adds the sport and dport in the programming
packet of flow director.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com> Tested-by: Marvin Liu <yong.liu@intel.com>
Queues were freed in clear function called in stop function.
Split clearing and freeing in separate functions to
move queue freeing from stop to close function.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com> Acked-by: Helin Zhang <helin.zhang@intel.com>
Thomas Monjalon [Sat, 18 Jul 2015 18:35:57 +0000 (20:35 +0200)]
pci: fix detach and uninit naming
There are close and detach functions in ethdev.
To keep a consistent naming, PCI functions called by ethdev detach
must be named "detach" instead of "close".
Fix also comments which mix close and uninit names.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Pablo de Lara [Fri, 17 Jul 2015 09:17:58 +0000 (10:17 +0100)]
hash: fix build for non-x86 arch
Hash library uses optimized compare functions that use
x86 intrinsics, therefore non-x86 systems could not build
the library. In that case, the compare function is set
to the generic memcmp.
Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation") Reported-by: Zhigang Lu <zlu@ezchip.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Zhigang Lu <zlu@ezchip.com>
It should call api to unregister vhost driver when sample exit/quit, then
the socket file will be removed(by calling unlink), and thus make vhost sample
work correctly in the second time startup.
Test report: http://dpdk.org/ml/archives/dev/2015-July/020896.html
Right now the scheduler hierarchy is encoded as a bitfield
that is visible as part of the ABI. This creates an barrier
limiting future expansion of the hierarchy.
As a transistional step. hide the actual layout of the hierarchy
and mark the exposed structure as deprecated. This will allow for
expansion in later release.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
The setup messages should be at DEBUG level since they are not
important for normal operation of system. The messages about
problems should be at NOTICE or ERR level.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
add function to support ethtool ops:
- get_reg_length
- get_regs
- get_eeprom_length
- get_eeprom
- set_eeprom
Signed-off-by: Liang-Min Larry Wang <liang-min.wang@intel.com> Acked-by: Andrew Harvey <agh@cisco.com> Acked-by: David Harton <dharton@cisco.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
add function to support ethtool ops:
- get_reg_length
- get_regs
- get_eeprom_length
- get_eeprom
- set_eeprom
Signed-off-by: Liang-Min Larry Wang <liang-min.wang@intel.com> Acked-by: Andrew Harvey <agh@cisco.com> Acked-by: David Harton <dharton@cisco.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
to enable reading device parameters (register and
eeprom) based upon ethtool alike data parameter specification.
Signed-off-by: Liang-Min Larry Wang <liang-min.wang@intel.com> Acked-by: Andrew Harvey <agh@cisco.com> Acked-by: David Harton <dharton@cisco.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Liang-Min Larry Wang <liang-min.wang@intel.com> Acked-by: Andrew Harvey <agh@cisco.com> Acked-by: David Harton <dharton@cisco.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Liang-Min Larry Wang <liang-min.wang@intel.com> Acked-by: Andrew Harvey <agh@cisco.com> Acked-by: David Harton <dharton@cisco.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The new api, rte_eth_dev_default_mac_addr_set, uses the
existing dev_op, mac_addr_set, to enable setting mac
addr from ethdev level.
Signed-off-by: Liang-Min Larry Wang <liang-min.wang@intel.com> Acked-by: Andrew Harvey <agh@cisco.com> Acked-by: David Harton <dharton@cisco.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Fix return value, using the macro input instead of -EINVAL.
Fixes: 75acd57ad025 ("ethdev: introduce valid port helper") Signed-off-by: Liang-Min Larry Wang <liang-min.wang@intel.com> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Pablo de Lara [Wed, 15 Jul 2015 15:13:07 +0000 (16:13 +0100)]
kni: fix build on SLES 12
SLES 12 has kernel 3.12, which original does not have skb_set_hash,
but SuSE has added that function to the kernel integrated on it.
Therefore, the function is not declared when compiling on this OS.
Reported-by: Sotiris Salloumis <sotiris.salloumis@ericsson.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Pablo de Lara [Thu, 16 Jul 2015 09:00:54 +0000 (10:00 +0100)]
hash: fix build without SSE4.1
_mm_test_all_zeros is not available for CPUs with no SSE4.1,
therefore, DPDK would not build.
This patch adds an alternative for this, using _mm_cmpeq_epi32 and
_mm_movemask_epi8.
Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation") Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Pablo de Lara [Wed, 15 Jul 2015 12:40:42 +0000 (13:40 +0100)]
hash: fix build with gcc 4.4 and 4.5
gcc 4.4 and 4.5 throws following error:
rte_cuckoo_hash.c:145: error: flexible array member in otherwise empty struct.
This is due to empty length in flexible array, which has been changed to use
size 0 in the declaration of the array.
Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation") Reported-by: Olga Shern <olgas@mellanox.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The comment for TX offload flags stated that those flags started at bit
55 and then were added to the right of that, leaving 8 bits reserved for
generic mbuf (i.e. non-offload) use. This comment may not have been
clear as 5 of the 8 flags which were reserved have now been used for TX
offloads.
This patch:
* updates the description so that it now reflects reality that
only three flags are available for generic mbuf use
* reserved the final generic flag so that it can't be taken over for TX
offload in future
* clarifies the comment for TX flags to indicate that they should be
counting downwards not upwards.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Implement rte_memzone_free which, as its name implies, would free a
memzone.
Currently memzone are tracked in an array and cannot be free.
To be able to reuse the same array to track memzones, we have to
change how we keep track of reserved memzones.
With this patch, any memzone with addr NULL is not used, so we also need
to change how we look for the next memzone entry free.
Some unit test are not relevant anymore. It is the case of those malloc
UTs that checked corner cases when allocating MALLOC_MEMZONE_SIZE
chunks, and the case of those memzone UTs relaying of specific free
memsegs of rhte reserved memzone.
Other UTs just need to be update, for example, to calculate maximum free
block size available.
In the current memory hierarchy, memsegs are groups of physically
contiguous hugepages, memzones are slices of memsegs and malloc further
slices memzones into smaller memory chunks.
This patch modifies malloc so it partitions memsegs instead of memzones.
Thus memzones would call malloc internally for memory allocation while
maintaining its ABI.
During initialization malloc sets all available memory as part of the heaps.
CONFIG_RTE_MALLOC_MEMZONE_SIZE was used to specify the default memory
block size to expand the heap. The option is not used/relevant anymore,
so we remove it.
Remove free_memseg field from internal mem config structure as it is
not used anymore.
Also remove code in ivshmem that was setting up free_memseg on init.
It would be possible to free memzones and therefore any other structure
based on memzones, ie. mempools
Move malloc inside eal and create a new section in MAINTAINERS file for
Memory Allocation in EAL.
Create a dummy malloc library to avoid breaking applications that have
librte_malloc in their DT_NEEDED entries.
This is the first step towards using malloc to allocate memory directly
from memsegs. Thus, memzones would allocate memory through malloc,
allowing to free memzones.
As unified packet types are used instead, those old bit masks and
the relevant macros for packet type indication need to be removed.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.
Signed-off-by: Helin Zhang <helin.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
examples: replace some offload flags with packet type
To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI.
Signed-off-by: Helin Zhang <helin.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI.
Signed-off-by: Helin Zhang <helin.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Thomas Monjalon [Wed, 15 Jul 2015 16:05:18 +0000 (18:05 +0200)]
mlx4: replace some offload flags with packet type
The workaround for Tx tunnel offloading can now be replaced with packet
type flag checking.
The ol_flags for IPv4/IPv6 and tunnel Rx offloading are replaced with
packet type flags.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
As there are only 6 bit flags in ol_flags for indicating packet
types, which is not enough to describe all the possible packet
types hardware can recognize. For example, i40e hardware can
recognize more than 150 packet types. Unified packet type is
composed of L2 type, L3 type, L4 type, tunnel type, inner L2 type,
inner L3 type and inner L4 type fields, and can be stored in
'struct rte_mbuf' of 32 bits field 'packet_type'.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI.
In order to unify the packet type, the field of 'packet_type' in
'struct rte_mbuf' needs to be extended from 16 to 32 bits.
Accordingly, some fields in 'struct rte_mbuf' are re-organized to support
this change for Vector PMD.
As 'struct rte_kni_mbuf' for KNI should be right mapped to
'struct rte_mbuf', it should be modified accordingly.
In ixgbe PMD driver, corresponding changes are added for the mbuf changes,
especially the bit masks of packet type for 'ol_flags' are replaced by
unified packet type. In addition, more packet types (UDP, TCP and SCTP)
are supported in vectorized ixgbe PMD.
To avoid breaking ABI compatibility, all the changes would be enabled by
RTE_NEXT_ABI.
Note that around 2% performance drop (64B) was observed of doing 4 ports
(1 port per 82599 card) IO forwarding on the same SNB core.
After code rework from bellow commit, logic expects hugepage_sz field to
always be set (ie. not zero value).
When using --no-huge, this field was left unset defaulting to zero.
Set hugepage_sz to RTE_PGSIZE_4K when using --no-huge.
Fixes: b3dfffd962ecd ("mem: allow multiple page sizes to be requested") Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Fix spelling and grammar errors. Re-organize sections for better explanation
in the documentation. Add a section describing compilation of CXGBE with DPDK.
Add a note describing that CXGBE currently only supports binding to PF4.
When using vfio, the probe fails for BAR > 0 after the
commit-id 90a1633b2 (eal/linux: allow to map BARs with MSI-X tables).
While debugging further, found that the BAR region offset and size read from
vfio are u64, but are assigned to uint32_t variables. This results in the u64
value getting truncated to 0 and passing wrong offset and size to mmap for
subsequent BAR regions.
The fix is to use unsigned long for the offset and size.
This is based on patch by Alejandro Lucero <alejandro.lucero@netronome.com>
posted at below:
http://dpdk.org/ml/archives/dev/2015-June/020201.html
and updated with diff from below to fix 32-bit compilation:
http://dpdk.org/ml/archives/dev/2015-July/020963.html
The patch fixes vfio initialization issue introduced by below patch.
Root cause is that VFIO_PRESENT is inaccessible in eal common level.
To fix it, remove pci_map/unmap_device from common code, then implement
in linux and bsd code.
Fixes: 35b3313e322b ("pci: merge mapping functions for linux and bsd") Reported-by: Michael Qiu <michael.qiu@intel.com> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Klaus Degner [Mon, 13 Jul 2015 14:54:22 +0000 (16:54 +0200)]
pcap: add Rx and Tx byte counters
Added RX and TX bytes counter support to the PCAP statistics.
Added TX counter support for pcap dumper and interface functions.
Renamed RX and TX packet counters for consistency.
Signed-off-by: Klaus Degner <kd@allegro-packets.com> Tested-by: John McNamara <john.mcnamara@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>
Bruce Richardson [Mon, 13 Jul 2015 16:38:45 +0000 (17:38 +0100)]
hash: rename unused field
The cuckoo hash has a fixed number of entries per bucket, so the
configuration parameter for this is unused. We change this field in the
parameters struct to "reserved" to indicate that there is now no such
parameter value, while at the same time keeping ABI consistency.
Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation") Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
This commit adds support for the TILE-Gx platform, as well as the TILE
CPU architecture. This architecture port is fairly simple due to its
reliance on generics for most arch stuff.
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com> Signed-off-by: Zhigang Lu <zlu@ezchip.com>
mempool: allow config override on element alignment
On TILE-Gx and TILE-Mx platforms, the buffers fed into the hardware
buffer manager require a 128-byte alignment. With this change, we
allow configuration based override of the element alignment, and
default to RTE_CACHE_LINE_SIZE if left unspecified.
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com> Signed-off-by: Zhigang Lu <zlu@ezchip.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This patch extends the memzone allocator to remove the restriction
that prevented callers from specifying multiple page sizes in the
flags argument.
In doing so, we also sanitize the free segment matching logic to get
rid of architecture specific disjunctions (2MB vs 1GB on x86, and 16MB
vs 16GB on PPC), thereby allowing for a broader range of hugepages on
architectures that support it.
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com> Signed-off-by: Zhigang Lu <zlu@ezchip.com>
The definitions of rte_memzone_reserve_aligned() and
rte_memzone_reserve_bounded() were identical with the exception of the
bound argument passed into rte_memzone_reserve_thread_safe().
This patch removes this replication of code by unifying it into
rte_memzone_reserve_thread_safe(), which is then called by all three
variants of rte_memzone_reserve().
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com> Signed-off-by: Zhigang Lu <zlu@ezchip.com>
The library name is now being pinned to "dpdk" instead of intel_dpdk,
powerpc_dpdk, etc. As a result, we no longer need this config item.
This patch removes it.
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com> Signed-off-by: Zhigang Lu <zlu@ezchip.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Zhigang Lu [Thu, 9 Jul 2015 08:25:12 +0000 (16:25 +0800)]
eal: allow empty compile time cpu flags
When RTE_COMPILE_TIME_CPUFLAGS is empty, the rte_cpu_check_supported()
code breaks with a "comparison is always false due to limited range of
data type". This is because the compile_time_flags[] array is empty.
Assigning the array dimension to a local variable apparently solves this.
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com> Signed-off-by: Zhigang Lu <zlu@ezchip.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
app/test: restrict x86 cpu flags checks to x86 builds
The original code mistakenly defaulted to X86 when RTE_ARCH_PPC_64 was
left undefined. This did not accommodate other non-PPC/non-X86
architectures. This patch fixes this issue.
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com> Signed-off-by: Zhigang Lu <zlu@ezchip.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>