Markus Theil [Tue, 5 Sep 2017 12:04:06 +0000 (14:04 +0200)]
igb_uio: add MSI IRQ mode
This patch adds MSI IRQ mode in a way, that should
also work on older kernel versions. The base for my patch
was an attempt to do this in cf705bc36c which was later
reverted in d8ee82745a. Compilation was tested on Linux 3.2,
4.10 and 4.12.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Markus Theil [Tue, 5 Sep 2017 12:04:03 +0000 (14:04 +0200)]
igb_uio: fix MSI-X IRQ assignment with new IRQ function
The patch which introduced the usage of pci_alloc_irq_vectors
came after the patch which switched to non-threaded ISR (f0d1896fa1),
but did not use non-threaded ISR, if pci_alloc_irq_vectors
is used.
Fixes: 99bb58f3adc7 ("igb_uio: switch to new irq function for MSI-X") Cc: stable@dpdk.org Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Markus Theil [Tue, 5 Sep 2017 12:04:02 +0000 (14:04 +0200)]
igb_uio: fix IRQ disable on recent kernels
igb_uio already allocates irqs using pci_alloc_irq_vectors on
recent kernels >= 4.8. The interrupt disable code was not
using the corresponding pci_free_irq_vectors, but the also
deprecated pci_disable_msix, before this fix.
Fixes: 99bb58f3adc7 ("igb_uio: switch to new irq function for MSI-X") Cc: stable@dpdk.org Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Markus Theil [Tue, 5 Sep 2017 12:04:01 +0000 (14:04 +0200)]
igb_uio: refactor IRQ enable/disable into own functions
Interrupt setup code in igb_uio has to deal with multiple
types of interrupts and kernel versions. This patch moves
the setup and teardown code into own functions, to make
it more readable.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Tested-by: Markus Theil <markus.theil@tu-ilmenau.de> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Santosh Shukla [Sun, 1 Oct 2017 09:29:02 +0000 (14:59 +0530)]
mempool: notify memory area to pool
HW pool manager e.g. Octeontx SoC demands s/w to program start and end
address of pool. Currently, there is no such api in external mempool.
Introducing rte_mempool_ops_register_memory_area api which will let HW(pool
manager) to know when common layer selects hugepage:
For each hugepage - Notify its start/end address to HW pool manager.
Santosh Shukla [Sun, 1 Oct 2017 09:29:01 +0000 (14:59 +0530)]
mempool: introduce block size alignment flag
Some mempool hw like octeontx/fpa block, demands block size
(/total_elem_sz) aligned object start address.
Introducing an MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS flag.
If this flag is set:
- Align object start address(vaddr) to a multiple of total_elt_sz.
- Allocate one additional object. Additional object is needed to make
sure that requested 'n' object gets correctly populated.
Example:
- Let's say that we get 'x' size of memory chunk from memzone.
- And application has requested 'n' object from mempool.
- Ideally, we start using objects at start address 0 to...(x-block_sz)
for n obj.
- Not necessarily first object address i.e. 0 is aligned to block_sz.
- So we derive 'offset' value for block_sz alignment purpose i.e..'off'.
- That 'off' makes sure that start address of object is blk_sz aligned.
- Calculating 'off' may end up sacrificing first block_sz area of
memzone area x. So total number of the object which can fit in the
pool area is n-1, Which is incorrect behavior.
Therefore we request one additional object (/block_sz area) from memzone
when MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS flag is set.
Santosh Shukla [Sun, 1 Oct 2017 09:29:00 +0000 (14:59 +0530)]
mempool: detect physical contiguous objects
The memory area containing all the objects must be physically
contiguous.
Introducing MEMPOOL_F_CAPA_PHYS_CONTIG flag for such use-case.
The flag useful to detect whether pool area has sufficient space
to fit all objects. If not then return -ENOSPC.
This way, we make sure that all object within a pool is contiguous.
Santosh Shukla [Sun, 1 Oct 2017 09:28:59 +0000 (14:58 +0530)]
mempool: get capabilities
Allow the mempool driver to advertise his pool capabilities.
For that pupose, an api(rte_mempool_ops_get_capabilities)
and ->get_capabilities() handler has been introduced.
- Upon ->get_capabilities() call, mempool driver will advertise
his capabilities to mempool flags param.
Santosh Shukla [Fri, 6 Oct 2017 07:45:30 +0000 (13:15 +0530)]
ethdev: get supported mempool per port
Now that dpdk supports more than one mempool drivers and
each mempool driver works best for specific PMD, example:
- sw ring based mempool for Intel PMD drivers.
- dpaa2 HW mempool manager for dpaa2 PMD driver.
- fpa HW mempool manager for Octeontx PMD driver.
Application would like to know the best mempool handle
for any port.
Introducing rte_eth_dev_pool_ops_supported() API,
which allows PMD driver to advertise
his supported pool capability to the application.
Supported pools are categorized in below priority:-
- Best mempool handle for this port (Highest priority '0')
- Port supports this mempool handle (Priority '1')
Santosh Shukla [Fri, 6 Oct 2017 07:45:29 +0000 (13:15 +0530)]
eal: allow user to override default mempool driver
DPDK has support for both sw and hw mempool and
currently user is limited to use ring_mp_mc pool.
In case user want to use other pool handle,
need to update config RTE_MEMPOOL_OPS_DEFAULT, then
build and run with desired pool handle.
Introducing eal option to override default pool handle.
Now user can override the RTE_MEMPOOL_OPS_DEFAULT by passing
pool handle to eal `--mbuf-pool-ops-name=""`.
Santosh Shukla [Fri, 6 Oct 2017 11:03:43 +0000 (16:33 +0530)]
eal: auto detect IOVA mode
iova autodetection depends on rte_bus_scan result. Result of bus scan will
have updated device_list and each device in that list has its '.kdev' state
updated. That kdrv state used to detect iova mapping mode for that device.
_device_parse() has dependency on rt_bus_scan so,
Below calls moved up in the eal initialization order:
- eal_option_device_parse
- rte_bus_scan
And based on the result of rte_bus_scan_iommu_class - select iova
mapping mode.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Santosh Shukla [Fri, 6 Oct 2017 11:03:41 +0000 (16:33 +0530)]
bus: get IOMMU class
API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.
Algorithm for iova scheme selection for bus:
0. Iterate through bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Mode selection scheme is:
if mode == 0 then iova mode is _pa,
if mode == 1 then iova mode is _pa,
if mode == 2 then iova mode is _va,
if mode == 3 then iova mode ia _pa.
Santosh Shukla [Fri, 6 Oct 2017 11:03:40 +0000 (16:33 +0530)]
pci: get IOMMU class on Linux
Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.
Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
Flag used when driver needs to operate in iova=va mode.
Algorithm for iova scheme selection for PCI bus:
0. If no device bound then return with RTE_IOVA_DC mapping mode,
else goto 1).
1. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_IOVA_AS_VA.
2. Look for any device attached to UIO class of driver.
3. Check for vfio-noiommu mode enabled.
If 2) & 3) is false and 1) is true then select
mapping scheme as RTE_IOVA_VA. Otherwise use default
mapping scheme (RTE_IOVA_PA).
Zhiyong Yang [Fri, 29 Sep 2017 07:17:27 +0000 (15:17 +0800)]
mbuf: increase port initialization value
In order to support more than 256 virtual ports, the field "port"
in rte_mbuf has been increased to 16 bits. The initialization/reset
value of the field "port" should be changed from 0xff to 0xffff
accordingly.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Zhiyong Yang [Fri, 29 Sep 2017 07:17:24 +0000 (15:17 +0800)]
ethdev: increase port id range
Extend port_id definition from uint8_t to uint16_t in lib and drivers
data structures, specifically rte_eth_dev_data. Modify the APIs,
drivers and app using port_id at the same time.
Fix some checkpatch issues from the original code and remove some
unnecessary cast operations.
release_17_11 and deprecation docs have been updated in this patch.
Zhiyong Yang [Fri, 29 Sep 2017 07:17:23 +0000 (15:17 +0800)]
net/bonding: remove old ABI
There are two bonding APIs using ABI versioning, and both have
port_id as parameter. Since we are already breaking ABI, no need
to keep older versions of APIs.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Before this patch, the devices discovered from VFIO layer were
being added in the device list in the order received from directory
scan. This causes an issue in case devices are reordered.
This patch makes all the devices inserted in the device list in
sorted order according to their name.
net/dpaa2: fix the Tx handling of non HW pool bufs
The current code is sending 8 packet in each internal loop.
In some of the conditions, mbuf is being allocated or freed.
In case of error, the code is returning without taking care of
such buffer. It is better to send already prepared buffer and err
for the current failure only.
Haiying Wang [Sat, 16 Sep 2017 10:52:18 +0000 (16:22 +0530)]
bus/fslmc: support up to 32 frames in one volatile dequeue
QMan5.0 supports up to 32 frames in one volatile dequeue
command. For the older Qman versions which only support
up to 16 frames, the highest bit in NUMF will be ignored.
Signed-off-by: Haiying Wang <haiying.wang@nxp.com> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
The compilation with gcc-6.3.0 and EXTRA_CFLAGS=-Og gives the following
error:
CC cperf_test_verify.o
cperf_test_verify.c: In function ‘cperf_verify_op’:
cperf_test_verify.c:382:5: error: ‘auth’ may be used uninitialized
in this function
[-Werror=maybe-uninitialized]
if (auth == 1) {
^
cperf_test_verify.c:371:5: error: ‘cipher’ may be used uninitialized
in this function
[-Werror=maybe-uninitialized]
if (cipher == 1) {
^
cperf_test_verify.c:384:11: error: ‘auth_offset’ may be used
uninitialized in this function
[-Werror=maybe-uninitialized]
res += memcmp(data + auth_offset,
^~~~~~~~~~~~~~~~~~~~~~~~~~
vector->digest.data,
~~~~~~~~~~~~~~~~~~~~
options->digest_sz);
~~~~~~~~~~~~~~~~~~~
cperf_test_verify.c:377:11: error: ‘cipher_offset’ may be used
uninitialized in this function
[-Werror=maybe-uninitialized]
res += memcmp(data + cipher_offset,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector->plaintext.data,
~~~~~~~~~~~~~~~~~~~~~~~
options->test_buffer_size);
~~~~~~~~~~~~~~~~~~~~~~~~~~
There is no default case in the switch statement, so if options->op_type
is an unknown value, the function will use uninitialized values. Fix it
by adding a default.
Fixes: f8be1786b1b8 ("app/crypto-perf: introduce performance test application") Cc: stable@dpdk.org Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The compilation with gcc-6.3.0 and EXTRA_CFLAGS=-Og gives the following
error:
CC rte_metrics.o
rte_metrics.c: In function
‘rte_metrics_reg_names’: rte_metrics.c:153:22: error: ‘entry’
may be used uninitialized in this function
[-Werror=maybe-uninitialized]
entry->idx_next_set = 0;
~~~~~~~~~~~~~~~~~~~~^~~
This is a false positive, gcc is not able to see that we always go in
the loop at least once, initializing entry.
The compilation with gcc-6.3.0 and EXTRA_CFLAGS=-Og gives the following
error:
CC cmdline_parse.o
cmdline_parse.c: In function ‘match_inst’:
cmdline_parse.c:227:5: error: ‘token_p’ may be used uninitialized in
this function [-Werror=maybe-uninitialized]
if (token_p) {
^
This is a false positive, gcc is not able to see that we always go in
the loop at least once, initializing token_p.
The compilation with gcc-6.3.0 and EXTRA_CFLAGS=-Og gives the following
error:
CC eal_pci_uio.o
eal_pci_uio.c: In function ‘pci_get_uio_de ’:
eal_pci_uio.c:221:9: error: ‘uio_num’ may be used uninitialized in
this function [-Werror=maybe-uninitialized]
return uio_num;
^~~~~~~
This is a false positive: gcc is not able to see that when e != NULL,
uio_num is always set. Fix the warning by initializing uio_num to -1,
and by the way, change the type to int.
The compilation with gcc-6.3.0 and EXTRA_CFLAGS=-Og gives the following
error:
CC qede_rxtx.o
qede_rxtx.c: In function ‘qede_start_queues’:
qede_rxtx.c:797:9: error: ‘rc’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
return rc;
^~
If there is no Rx or Tx queue, rc will not be initialized. Fix it
by initializing rc to -1.
The compilation with gcc-6.3.0 and EXTRA_CFLAGS=-Og gives the following
error:
CC rte_pmd_bnxt.o
rte_pmd_bnxt.c: In function ‘rte_pmd_bnxt_set_all_queues_drop_en’:
rte_pmd_bnxt.c:116:6: error: ‘rc’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
int rc;
^~
This can happen if both bp->nr_vnics and bp->pf.active_vfs are 0.
Fix it by initializing rc to -EINVAL.
Fixes: 49947a13ba9e ("net/bnxt: support Tx loopback, set VF MAC and queues drop") Cc: stable@dpdk.org Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Yongseok Koh [Thu, 5 Oct 2017 21:37:29 +0000 (14:37 -0700)]
net/mlx5: fix overflow of Rx SW ring
If vectorized Rx burst is short of mbufs in replenishment, Rx SW ring can
overflow as the Rx burst handles 4 packets in a loop. This is because the
function fills SW ring and its mbufs first and checks validity of
each completion later. So, there should be some buffer slots at the tail of
the ring to protect mbufs which are already owned by application.
Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86") Cc: stable@dpdk.org Reported-by: Martin Weiser <martin.weiser@allegro-packets.com> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Matan Azrad [Tue, 3 Oct 2017 14:55:56 +0000 (14:55 +0000)]
net/tap: allow RSS flow action
One of the main identified use cases for the tap PMD is to be used in
combination with the fail-safe PMD as a fallback for a physical device.
Fail-safe is very strict about making sure its current configuration is
properly applied to all slave devices, they get rejected otherwise in
order to maintain a consistent state.
The problem is that tap's RSS support is currently limited to the
default (non-Toeplitz) balancing performed by the kernel on all Rx
queues. While proper RSS support emulation in the tap PMD is a work in
progress, the lack of rte_flow counterpart prevents validation of the
above use case in the meantime.
Given that unlike most PMDs, tap is more about convenience than
performance, support for the RSS action can be temporarily faked with
a minimum amount of code and mostly correct behavior by treating it
like a QUEUE action. Traffic is directed to the first queue of the set.
Beilei Xing [Thu, 5 Oct 2017 08:14:57 +0000 (16:14 +0800)]
net/i40e: enable cloud filter for GTP-C and GTP-U
This patch sets TEID of GTP-C and GTP-U as filter type
by replacing existed filter types inner_mac and TUNNEL_KEY.
This configuration will be set when adding GTP-C or
GTP-U filter rules, and it will be invalid only by
NIC core reset.
Beilei Xing [Thu, 5 Oct 2017 08:14:54 +0000 (16:14 +0800)]
net/i40e: finish integration FDIR with generic flow API
rte_eth_fdir_* structures are still used in FDIR functions.
This patch adds i40e private FDIR related structures and
functions to finish integration FDIR with generic flow API.
Ajit Khaparde [Thu, 5 Oct 2017 15:06:44 +0000 (10:06 -0500)]
net/bnxt: fix number of MAC addresses for VMDq
We were hardcoding the max MAC addresses to 32, while the HW
can support more than that. This was restricting the number of VMDQ
pools that we could support. Use the value obtained from FW instead.
Add new commands to manipulate with dynamic flow type to
pctype mapping table in i40e PMD.
Commands allow to print table, modify it and reset to default value.
net/i40e: add dynamic mapping of SW flow types to HW pctypes
Implement dynamic mapping of software flow types to hardware pctypes.
This allows to add new flow types and pctypes for DDP without changing
API of the driver. The mapping table is located in private
data area for particular network adapter and can be individually
modified with set of appropriate functions.
Shahaf Shuler [Wed, 4 Oct 2017 08:18:00 +0000 (11:18 +0300)]
ethdev: add mbuf fast free Tx offload
PMDs which expose this offload cap supports optimization for fast release
of mbufs following successful Tx.
Such optimization requires that per queue, all mbufs come from the same
mempool and has refcnt = 1.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Shahaf Shuler [Wed, 4 Oct 2017 08:17:59 +0000 (11:17 +0300)]
ethdev: introduce Tx queue offloads API
Introduce a new API to configure Tx offloads.
In the new API, offloads are divided into per-port and per-queue
offloads. The PMD reports capability for each of them.
Offloads are enabled using the existing DEV_TX_OFFLOAD_* flags.
To enable per-port offload, the offload should be set on both device
configuration and queue configuration. To enable per-queue offload, the
offloads can be set only on queue configuration.
In addition the Tx offloads will be disabled by default and be
enabled per application needs. This will much simplify PMD management of
the different offloads.
Applications should set the ETH_TXQ_FLAGS_IGNORE flag on txq_flags
field in order to move to the new API.
The old Tx offloads API is kept for the meanwhile, in order to enable a
smooth transition for PMDs and application to the new API.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Shahaf Shuler [Wed, 4 Oct 2017 08:17:58 +0000 (11:17 +0300)]
ethdev: introduce Rx queue offloads API
Introduce a new API to configure Rx offloads.
In the new API, offloads are divided into per-port and per-queue
offloads. The PMD reports capability for each of them.
Offloads are enabled using the existing DEV_RX_OFFLOAD_* flags.
To enable per-port offload, the offload should be set on both device
configuration and queue configuration. To enable per-queue offload, the
offloads can be set only on queue configuration.
Applications should set the ignore_offload_bitfield bit on rxmode
structure in order to move to the new API.
The old Rx offloads API is kept for the meanwhile, in order to enable a
smooth transition for PMDs and application to the new API.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
At present, creating bonding devices using --vdev is broken for PMD like
mlx5 as it is neither UIO nor VFIO based and hence PMD driver is unknown
to find_port_id_by_pci_addr(), as below.
PMD: bond_ethdev_parse_slave_port_kvarg(150) - Invalid slave port value
(<PCI ID>) specified
EAL: Failed to parse slave ports for bonded device net_bonding0
This patch fixes parsing PCI ID from bonding device params by verifying
it in RTE PCI bus, rather than checking dev->kdrv.