git.droids-corp.org - dpdk.git/log

ethdev: remove missed packets from error counter

Comment for "ierrors" counter says that it counts erroneous received
packets. But for some reason "imissed" counter is added to "ierrors"
counter in most drivers.
It is a mistake, because missed packets are obviously not received.
This patch fixes it.

Fixes: 70bdb18657da ("ethdev: add Rx error counters for missed, badcrc and badlen packets")
Fixes: 6bfe648406b5 ("i40e: add Rx error statistics")
Fixes: 856505d303f4 ("cxgbe: add port statistics")

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

maintainers: sort examples

Keep sorting examples and fix l2fwd-cat path.

Fixes: ab129e9065a5 ("examples/ptpclient: add minimal PTP client")
Fixes: f6baccbc2b3b ("examples/l2fwd-cat: add sample application for PQoS CAT and CDP")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>

examples/l2fwd-cat: add sample application for PQoS CAT and CDP

This patch implements PQoS as a sample application.
PQoS allows management of the CPUs last level cache,
which can be useful for DPDK to ensure quality of service.
The sample app links against the existing 01.org PQoS library
(https://github.com/01org/intel-cmt-cat).

White paper demonstrating example use case "Increasing Platform Determinism
with Platform Quality of Service for the Data Plane Development Kit"
(http://www.intel.com/content/www/us/en/communications/increasing-platform-determinism-pqos-dpdk-white-paper.html)

Signed-off-by: Wojciech Andralojc <wojciechx.andralojc@intel.com>
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
Signed-off-by: Marcel D Cornu <marcel.d.cornu@intel.com>

examples/l3fwd: fix exact match performance

It seems that for the most use cases, previous hash_multi_lookup provides
better performance, and more, sequential lookup can cause significant
performance drop.

This patch sets previously optional hash_multi_lookup method as default.
It also provides some minor optimizations such as queue drain only on used
tx ports.

Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")
Fixes: dc81ebbacaeb ("lpm: extend IPv4 next hop field")
Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")

Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>

examples/l3fwd: fix build with hash multi-lookup

l3fwd does not compile with HASH_MULTI_LOOKUP.
2 issues:
* in 64d395 mask0 changed type from xmm_t to rte_xmm_t
-> use x field from rte_xmm_t
* in dc81eb dst_port parameter changed to uint32_t
-> change uint16_t dst_port to uin32_t dsp_port

Fixes: dc81ebbacaeb ("lpm: extend IPv4 next hop field")
Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")

Signed-off-by: Maciej Czekaj <maciej.czekaj@caviumnetworks.com>

lpm: fix pipeline apps

Updated ip_pipeline app is using new changes from LPM library
(Increased number of next hops and added new config structure
for LPM IPv4).

Fixes: f1f7261838b3 ("lpm: add a new config structure for IPv4")

Signed-off-by: Michal Kobylinski <michalx.kobylinski@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

vhost: remove unnecessary memset when enqueueing

We have to reset the virtio net hdr at virtio_enqueue_offload()
before, due to all mbufs share a single virtio_hdr structure:

struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0};

foreach (mbuf) {
virtio_enqueue_offload(mbuf, &virtio_hdr.hdr);

copy net hdr and mbuf to desc buf
}

However, after the vhost rxtx refactor, the code looks like:

copy_mbuf_to_desc(mbuf)
{
struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0}

virtio_enqueue_offload(mbuf, &virtio_hdr.hdr);

copy net hdr and mbuf to desc buf
}

foreach (mbuf) {
copy_mbuf_to_desc(mbuf);
}

Therefore, the memset at virtio_enqueue_offload() is not necessary
any more; remove it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>

mk: fix linker script when re-building

The linker script is generated by simply finding all libraries in
RTE_OUTPUT/lib.

The issue shows up when re-building the DPDK, hence already having a
linker script in that directory, resulting in the linker script
including itself.

That does not play well with the linker.

Simply filtering the linker script from all the found libraries solves
the problem.

Fixes: 948fd64befc3 ("mk: replace the combined library with a linker script")

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>

version: 16.04-rc1

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>

doc: add packet framework release notes

This patch updates the release notes with the features that
have been added to ip_pipeline application.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

pci: fix ioport support for uio_pci_generic on x86

uio_pci_generic does not offer the same sysfs helpers as igb_uio.
In this case, ioport number can only be retrieved by parsing /proc/ioports.

Fixes: 756ce64b1ecd ("eal: introduce PCI ioport API")

Reported-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Signed-off-by: David Marchand <david.marchand@6wind.com>

pci: separate ioport handlers per UIO driver

Prepare for fixes on x86 by separating igb_uio and uio_pci_generic cases.

Signed-off-by: David Marchand <david.marchand@6wind.com>

pci: align ioport special case for x86 in read/write/unmap

Commit b8eb345378bd ("pci: ignore devices already managed in Linux when
mapping x86 ioport") did not update other parts of the ioport api.

The application is not supposed to call these read/write/unmap ioport
functions if map call failed but I prefer aligning the code for the sake
of consistency.

Signed-off-by: David Marchand <david.marchand@6wind.com>

pci: align ioport unmap error handling to ioport map

Same idea as commit bd80d4730aca ("pci: rework ioport map error handling").

Signed-off-by: David Marchand <david.marchand@6wind.com>

bonding: fix crash when no slave device

If a bonded device is created when there are no slave devices
there is a loop in bond_ethdev_promiscuous_enable() which results
in a segmentation fault.

The solution is to initialise the current_primary_port to an
invalid port value when the bonded port is created.

Fixes: 2efb58cbab6e ("bond: new link bonding library")

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

bonding: do not activate slave twice

The current code for detecting link during slave addition can cause a
slave interface to be activated twice -- once during slave_configure()
and again at the end of __eth_bond_slave_add_lock_free(). This will
either cause the active slave count to be incorrect or will cause the
802.3ad activation function to panic. Ensure that the interface is not
activated more than once.

Fixes: 46fb43683679 ("bond: add mode 4")

Signed-off-by: Eric Kinzie <ekinzie@brocade.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Declan Doherty <declan.doherty@intel.com>

bonding: fix active slaves with no primary

If the link state of a slave is "up" when added, it is added to the list
of active slaves but, even if it is the only slave, is not selected as
the primary interface. Generally, handling of link state interrupts
selects an interface to be primary, but only if the active count is zero.
This change avoids the situation where there are active slaves but
no primary.

Fixes: 2efb58cbab6e ("bond: new link bonding library")

Signed-off-by: Eric Kinzie <ekinzie@brocade.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Declan Doherty <declan.doherty@intel.com>

bonding: do not ignore multicast in mode 4

The bonding PMD in mode 4 puts all enslaved interfaces into promiscuous
mode in order to receive LACPDUs and must filter unwanted packets
after the traffic has been "collected". Allow broadcast and multicast
through so that ARP and IPv6 neighbor discovery continue to work.

Fixes: 46fb43683679 ("bond: add mode 4")

Signed-off-by: Eric Kinzie <ekinzie@brocade.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Declan Doherty <declan.doherty@intel.com>

bonding: copy entire config structure in mode 4

Copy all needed fields from the mode8023ad_private structure in
bond_mode_8023ad_conf_get(). This help ensure that a subsequent call
to rte_eth_bond_8023ad_setup() is not passed uninitialized data that
would result in either incorrect behavior or a failed sanity check.

Fixes: 46fb43683679 ("bond: add mode 4")

Signed-off-by: Eric Kinzie <ekinzie@brocade.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Declan Doherty <declan.doherty@intel.com>

bonding: fix detach of slave devices

Ensure that a bonded slave device is not detached,
until it is removed from the bonded device.

Fixes: 2efb58cbab6e ("bond: new link bonding library")
Fixes: a45b288ef21a ("bond: support link status polling")
Fixes: 494adb7f63f2 ("ethdev: add device fields from PCI layer")
Fixes: b1fb53a39d88 ("ethdev: remove some PCI specific handling")

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>

bonding: fix detach of bonded device

Check that the bonded device has no slaves before detaching it.

Fixes: 8d30fe7fa737 ("bonding: support port hotplug")

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>

null: remove duplicate fields in internal struct

1- remove duplicate nb_rx/tx_queues fields from internals
2- remove duplicate numa_node field from internals

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Nicolás Pernas Maradei <nicolas.pernas.maradei@emutex.com>

ring: free rings when detaching device

When a device is created with "CREATE" as action, new rings are
allocated for it, then it is a good practice to free them when the
rte_ethdev_dettach method is invoked by the application.

Rings are not freeded when "ATTACH" is used or when the device is
created by means of the rte_eth_from_rings function.

Signed-off-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

ring: clean up driver

Rename nb_rx/tx_queues fields in internals struct to max_rx/tx_queues
Updated fields required to keep max queue numbers configured. For current
queue number requirements data->nb_rx/tx_queues fields used.

Some checkpatch corrections and code clenaup.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

pcap: reduce duplication

1- Remove duplicate nb_rx/tx_queues fields from internals
2- Move duplicate code into a common function

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Nicolás Pernas Maradei <nicolas.pernas.maradei@emutex.com>

pcap: fix captured frame length

The actual captured length is header.caplen, whereas header.len is
the original length on the wire.

Fixes: 4c173302c307 ("pcap: add new driver")

Signed-off-by: Dror Birkman <dror.birkman@lightcyber.com>
Acked-by: Nicolás Pernas Maradei <nicolas.pernas.maradei@emutex.com>

af_packet: make the device detachable

Allow dynamic deallocation of af_packet device through proper
API functions. To achieve this:
* set device flag to RTE_ETH_DEV_DETACHABLE
* implement rte_pmd_af_packet_devuninit() and expose it
through rte_driver.uninit()
* copy device name to ethdev->data to make discoverable with
rte_eth_dev_allocated()
Moreover, make af_packet init function static, as there is no
reason to keep it public.

Signed-off-by: Wojciech Zmuda <woz@semihalf.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>

vmxnet3: support setting MAC address

Allow overriding the base mac address of the device.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Remy Horton <remy.horton@intel.com>

vmxnet3: fix VLAN filtering

During an MTU change, the adapter is restarted. If hardware VLAN offload
is in use, this existing filter table would also be cleared. Instead,
setup the shadow table once during device initialization and just update
during restart.

vmxnet3_dev_vlan_offload_set(dev, mask) was incorrectly treating the
mask parameter as the bitmask for vlan_strip and vlan_filter, whereas
the mask indicates only what has changed - the values for
vlan_stripping and vlan_filter needs to be taken from dev_conf.rxmode.

Fixes: f003fc383487 ("vmxnet3: enable vlan filtering")

Signed-off-by: Charles (Chas) Williams <ciwillia@brocade.com>
Signed-off-by: Nachiketa Prachanda <nprachan@brocade.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Remy Horton <remy.horton@intel.com>

vmxnet3: support jumbo frames

Add support for linking multi-segment buffers together to
handle Jumbo packets. The vmxnet3 API supports having header
and body buffer types. What this patch does is fill the primary
ring completely with header buffers and the secondary ring
with body buffers. This allows for non-jumbo frames to only
use one mbuf (from primary ring); and jumbo frames will have
first mbuf from primary ring and following mbufs from other
ring.

This could be optimized in future if the DPDK had API
to supply different sized mbufs (two pools) into driver.

Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Release note addition:
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

vmxnet3: announce offload capabilities

Signed-off-by: Yong Wang <yongwang@vmware.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

vmxnet3: support TSO

This commit adds vmxnet3 TSO support.

Verified with test-pmd (set fwd csum) that both tso and
non-tso pkts can be successfully transmitted and all
segmentes for a tso pkt are correct on the receiver side.

Signed-off-by: Yong Wang <yongwang@vmware.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

vmxnet3: add Tx L4 checksum offload

Support TCP/UDP checksum offload.

Signed-off-by: Yong Wang <yongwang@vmware.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

vmxnet3: rework Tx

Clean up txNumDeferred usage.

Signed-off-by: Yong Wang <yongwang@vmware.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

vmxnet3: restore Tx data ring support

Tx data ring support was removed in a previous change that
added multi-seg transmit. This change adds it back.

According to the original commit (2e849373), 64B pkt
rate with l2fwd improved by ~20% on an Ivy Bridge
server at which point we start to hit some bottleneck
on the rx side.

I also re-did the same test on a different setup (Haswell
processor, ~2.3GHz clock rate) on top of the master
and still observed ~17% performance gains.

Fixes: 7ba5de417e3c ("vmxnet3: support multi-segment transmit")

Signed-off-by: Yong Wang <yongwang@vmware.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

vmxnet3: clean up typos and unused code

Signed-off-by: Yong Wang <yongwang@vmware.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

vmxnet3: remove redundant function names in log

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

virtio: remove redundant function names in log

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

virtio: optimize Tx enqueue

All the error checks in virtqueue_enqueue_xmit are already done
by the caller. Therefore they can be removed to improve performance.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>

virtio: use any layout on Tx

Virtio supports a feature that allows sender to put transmit
header prepended to data. It requires that the mbuf be writeable, correct
alignment, and the feature has been negotiatied. If all this works out,
then it will be the optimum way to transmit a single segment packet.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>

virtio: use indirect ring elements

The virtio ring in QEMU/KVM is usually limited to 256 entries
and the normal way that virtio driver was queuing mbufs required
nsegs + 1 ring elements. By using the indirect ring element feature
if available, each packet will take only one ring slot even for
multi-segment packets.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>

virtio: remove broadcast packets from multicast statistics

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Applied with coding standards fixes:
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

virtio: fix descriptors pointing to the same buffer

The virtio_net_hdr desc all pointed to the same buffer. It doesn't cause
issue because in the simple TX mode we don't use the header. This patch
makes the header desc point to different buffer.

Fixes: b4ae9c505f2e ("virtio: optimize ring layout")

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

virtio: fix crash in statistics functions

This initialisation of nb_rx_queues and nb_tx_queues has been removed
from eth_virtio_dev_init.

The nb_rx_queues and nb_tx_queues were being initialised in
eth_virtio_dev_init before the tx_queues and rx_queues arrays were
allocated.

The arrays are allocated when the ethdev port is configured and the
nb_tx_queues and nb_rx_queues are initialised.

If any of the following functions were called before the ethdev
port was configured there was a segmentation fault because
rx_queues and tx_queues were NULL:

rte_eth_stats_get
rte_eth_stats_reset
rte_eth_xstats_get
rte_eth_xstats_reset

Fixes: 823ad647950a ("virtio: support multiple queues")

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

virtio: fix restart

Fix the issue that virtio device cannot be started after stopped.

The field, hw->started, should be changed by virtio_dev_start/stop instead
of virtio_dev_close.

Fixes: a85786dc816f ("virtio: fix states handling during initialization")

Reported-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>

szedata2: support promiscuous and allmulticast modes

add functions for enabling/disabling promiscuous, allmulticast modes

Signed-off-by: Matej Vido <vido@cesnet.cz>

szedata2: support link state operations

Mmap PCI resource file and add inline functions for reading from and
writing to PCI resource address space.
Add description of IBUF and OBUF address space.
Add configuration option for setting which firmware type will be used.
Right address space values for IBUFs and OBUFs offsets are used
according to configuration option CONFIG_RTE_LIBRTE_PMD_SZEDATA2_AS.
Setting link up/down and getting info about link status is done through
mmapped PCI resource address space.

Signed-off-by: Matej Vido <vido@cesnet.cz>

szedata2: change to physical device type

PMD was of type PMD_VDEV which means that PCI device is not recognised
automatically during EAL initialization, but it has to be created by
EAL option --vdev.
Now, PMD is of type PMD_PDEV which means that PCI device is probed
and recognised during EAL initialization automatically.
Path to szedata2 device file is matched with device and the count
of available RX and TX DMA channels is found out during device
initialization.
Initialization, starting and stopping of queues is changed to better
correspond with Ethernet device API model. Function callbacks
(rx|tx)_queue_(start|stop) are added. Unnecessary items are removed
from ethernet device private data structure.

Signed-off-by: Matej Vido <vido@cesnet.cz>

nfp: fix Tx queue reset

When using start-stop functionality the per queue fields need to
be properly reset.

Fixes: b812daadad0d ("nfp: add Rx and Tx")

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>

nfp: fix how Tx checksum is advertised to firmware

Even with tx checksum offload available, do not set the flag by default.

Fixes: b812daadad0d ("nfp: add Rx and Tx")

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>

nfp: fix variable type in Tx checksum offload

The mbuf ol_flags field was changed to uin64_t with DPDK version 1.8

Fixes: b812daadad0d ("nfp: add Rx and Tx")

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>

nfp: fix non-x86 build

The file sys/io.h was included but it can be unavailable in some
non-x86 toolchains.
As others system includes in the file nfp_net.c, it seems useless,
so the easy fix is to remove them.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Alejandro Lucero <alejandro.lucero@netronome.com>

mlx5: fix Rx checksum offload in non L3/L4 packets

Change rxq_cq_to_ol_flags() to set checksum flags according to packet type,
so for non L3/L4 packets the mbuf chksum_bad flags will not be set.

Fixes: 67fa62bc672d ("mlx5: support checksum offload")

Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>

mlx5: add VLAN filtering for broadcast and IPv6 multicast

Unlike promiscuous and allmulticast flows, those should remain
VLAN-specific.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

mlx5: remove redundant debug message

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

mlx5: manage all special flow types at once

This commit adds helpers to remove redundant code.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

mlx5: check port is configured as ethernet device

If the port link layer is not Ethernet, notify the user.

Signed-off-by: Or Ami <ora@mellanox.com>

mlx5: fix possible crash during initialization

RSS configuration should not be freed when priv is NULL.

Fixes: 2f97422e7759 ("mlx5: support RSS hash update and get")

Signed-off-by: Or Ami <ora@mellanox.com>

mlx: use aligned memory to register regions

The first and last memory pool elements are usually cache-aligned but not
page-aligned, particularly when using huge pages.

Hardware performance can be improved significantly by registering memory
regions starting and ending on page boundaries.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

mlx5: free buffers immediately after completion

This lowers the amount of cache misses.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

mlx5: avoid lkey retrieval for inlined packets

Improves performance as the lkey is not needed by hardware in this case.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

mlx5: process offload flags only when requested

Improve performance by processing offloads only when requested by the
application.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

mlx5: remove one indirection level from Rx/Tx

Avoid dereferencing pointers twice to get to fast Verbs functions by
storing them directly in RX/TX queue structures.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>

mlx5: reorder Rx/Tx queue structure

Remove padding and move important fields to the beginning for better
performance.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

mlx5: prefetch next Tx mbuf header and data

This change improves performance noticeably.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

mlx5: support Rx VLAN stripping

Allows HW to strip the 802.1Q header from incoming frames and report it
through the mbuf structure.

This feature requires MLNX_OFED >= 3.2.

Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

mlx5: support flow director

Add support for flow director filters (RTE_FDIR_MODE_PERFECT and
RTE_FDIR_MODE_PERFECT_MAC_VLAN modes).

This feature requires MLNX_OFED >= 3.2.

Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Raslan Darawsheh <rdarawsheh@asaltech.com>

mlx5: make flow steering rule generator more generic

Upcoming flow director support will reuse this function to generate filter
rules.

Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

mlx5: add special flows for broadcast and IPv6 multicast

Until now, broadcast frames were handled like unicast. Moving the related
flow to the special flows table frees up the related unicast MAC entry.

The same method is used to handle IPv6 multicast frames.

Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

mlx5: refactor special flows handling

Merge redundant code by adding a static initialization table to manage
promiscuous and allmulticast (special) flows.

New function priv_rehash_flows() implements the logic to enable/disable
relevant flows in one place from any context.

Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

mlx5: fix header generation in parallel builds

Fixes: 771fa900b73a ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

mlx5: support setting primary MAC address

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

mlx4: support setting primary MAC address

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

mlx4: ensure number of Rx queues is a power of 2

In the documentation it is specified that the hardware only supports a
number of RX queues if it is a power of 2.

Since ibv_exp_create_qp may not return an error when the number of
queues is unsupported by hardware, sanitize the value in dev_configure.

Signed-off-by: Robin Jarry <robin.jarry@6wind.com>

mlx4: fix unneeded function error with clang 3.6

When compiling with clang 3.6, the mlx4 driver gives the following error
message about an unneeded function.

  CC mlx4.o
.../drivers/net/mlx4/mlx4.c:136:20: fatal error: function
      'wr_id_t_check' is not needed and will not be emitted
[-Wunneeded-internal-declaration]
static inline void wr_id_t_check(void)
                   ^
1 error generated.

The function is to compile-time check the size of wr_id_t, so use
the standard DPDK BUILD_BUG_ON macro to do so in the init function
instead.

Fixes: 7fae69eeff13 ("mlx4: new poll mode driver")

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

fm10k: enable FTAG based forwarding

This patch enables reading sglort (global resource tag) info into the
mbuf for RX and inserting an FTAG (Fabric Tag) at the beginning of the
packet for TX. The vlan_tci_outer field selected from rte_mbuf structure
for sglort is not used in fm10k now.
In FTAG based forwarding mode, the switch will forward packets according
to glort info in FTAG rather than mac and vlan table.

To activate this feature, user needs to pass a devargs parameter to eal
for fm10k device like "-w 0000:84:00.0,enable_ftag=1". Currently this
feature is supported only on PF, because FM10K_PFVTCTL register is
read-only for VF.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>

fm10k/base: remove unused struct element

Remove the unused element request_lport_map in struct fm10k_mac_ops.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: minor cleanups

Some cleanups to better reflect the code that was actually pushed out to
the upstream Linux community.

Among the above cleanups, a few macros such as FM10K_RXINT_TIMER_SHIFT are
removed, but they are needed in dpdk/fm10k, so we have to put all these
necessary macros into fm10k_osdep.h.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: move constants on right of binary operators

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: fix TLV structures alignment

Per comments from an upstream kernel patch, and looking at how TLV
LE_STRUCT code works, we actually want these structures to be 4byte
aligned, not 1byte aligned.

In practice, 1byte alignment has worked so far because all our
structures end up being a multiple of 4. But if a future TLV
structure were added that had a u8 or similar sticking on the end things
would break. Fix this by using 4byte alignment which will prevent the
TLV LE_STRUCT code from breaking. Update the comment explaining that we
need 4byte alignment of our structures.

Fixes: 925c862cbc21 ("fm10k/base: pack TLV overlay structures")

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: improve comments

The comment for fm10k_iov_msg_lport_state_pf was changed during
review of kernel driver, and the new wording is slightly clearer.
Re-write the comment in base code based on this new wording.

Fix a number of mailbox comment issues with function header comments,
lower-case acronyms (i.e. FIFO, TLV), incorrect function names in
DEBUGFUNC(), duplicate comments and a stubbed-out header comment for
fm10k_sm_mbx_init.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: expand VID to VLAN ID in comments

The vid variable name is shorthand for VLAN ID, so we should use this in
comments explaining what is happening.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: allow removal of slot appropriate check

The Linux Kernel provides the OS a call "pcie_get_minimum_link" which
can crawl the PCIe tree and determine the actual minimum link speed of a
device which is a more general check than provided by
is_slot_appropriate. Thus, the kernel driver does not use or want the
is_slot_appropriate function call. Add a NO_IS_SLOT_APPROPRIATE_CHECK
definition which can be defined to remove the code.
If left undefined (the default) then the code will all be active and no
driver changes should be necessary.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: use memcpy for MAC address copy

Use memcpy instead of copying MAC address byte-by-byte.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: remove CamelCase

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: add bit macro

Using the BIT macro can simplify the bit-shifting operation and make the
code look clean. Similar to how this is handled in the i40e base code,
define a macro for it in DPDK, so it can be used here too.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: remove useless else

"else" is not generally useful after a break or return.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: wrap long lines

Recommended line length maximum is 80 characters

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: document ITR scale workaround

Add comments which properly explain the undocumented use of bits in
TDLEN register prior to VF initializing it to the correct value. Note
that the mechanism is entirely software-defined and explain its purpose
to help reduce confusion in the future.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: fix max queues on VF initialization failure

VF drivers must detect how many queues are available. Previously, the
driver assumed that each VF has at minimum 1 queue. This assumption is
incorrect, since it is possible that the PF has not yet assigned the
queues to the VF by the time the VF checks.

To resolve this, we added a check first to ensure that the first queue
is, in fact, owned by the VF at init_hw_vf time.
However, the code flow did not reset hw->mac.max_queues to 0.
In some cases, such as during reinit flows, we call init_hw_vf
without clearing the previous value of hw->mac.max_queues. Due to this,
when init_hw_vf errors out, if its error code is not properly handled
the VF driver may still believe it has queues which no longer belong to
it. Fix this by clearing the hw->mac.max_queues on exit due to errors.

Fixes: 8b8264bdb90d ("fm10k/base: check VF has a queue")

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: use bit shift for ITR scale

Use bitshift instead of a divisor, because this is faster, and
eliminates any need for a '0' check. In our case, this even works
out because default Gen3 will be 0.

Because of this, we are also able to remove the check for non-zero value
in the VF code path since that will already be the default Gen3 case.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: clean up namespace pollution

Make functions that are only referenced locally static.

Wrap fm10k_msg_data fm10k_iov_msg_data_pf[] in the new ifndef
NO_DEFAULT_SRIOV_MSG_HANDLERS so that drivers with custom SR-IOV
message handlers can strip it.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k/base: fix typecast

Since the resultant data type of the mac_update.mac_upper field is u16,
it does not make sense to typecast u8 variables to u32 first.

Fixes: 7223d200c227 ("fm10k: add base driver")

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k: use default mailbox message handler for PF

The new share code makes fm10k_msg_update_pvid_pf function static, so we
can not refer to it now in fm10k_ethdev.c. The registered PF handler is
almost the same as the default PF handler, removing it has no impact on
mailbox.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k: handle error flags in vector Rx

Using SSE instructions to parse error flags in HW Rx descriptor,
then set corresponding bits of mbuf.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>

fm10k: optimize mbuf freeing in non-vector Tx

When the TX function tries to free a bunch of mbufs, it will free
them one by one. This change will scan the free list and merge the
requests in case they belongs to same pool, then free once, which
will reduce cycles on freeing mbufs.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>

fm10k: fix switch manager high CPU usage

fm10k switch core uses source MAC + VID + SGLORT to do
look up in MAC table. If no match, an exception interrupt
will be sent to the switch manager. Too much of this kind
of exception interrupts cause switch manager side high CPU
usage.
To reproduce this issue, one DPDK testpmd runs on a server
with one fm10k NIC, mac forwards test traffic from one of
fm10k ports to another port. The CPU usage for the switch
manager will go up to about 20% for test traffic rate at
10G bps, comparing to near 0% for no test traffic.

This patch fixes this issue. A default SGLORT is assigned
to each TX queue. This default value works for non-VMDq mode
and current VMDq example. For advanced VMDq usage, e.g.
different source MAC address for different TX queue, FTAG
forwarding function could be used to change this default
SGLORT value.

Fixes: 9ae6068c86da ("fm10k: add dev start/stop")

Signed-off-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

fm10k: enable broadcast loopback suppression

In FM10K, a single PCIe port can derive out a few logical ports,
like SRIOV PF/VF devices, VMDQ objects. To better manage them, FM10K
silicon assigns a Unique GLORT ID to each logical port.

When a logical port sends a broadcast packet, the silicon will flood
it to all logical ports, including the one that sent the broadcast packet.
To prevent this, silicon has an rxq register to store the glort id of
the logical port that queue binds to.

FM10K has a switch core inside, which has a loopback suppression
mechanism in the switch level. Switch level loopback suppression mostly
works for the ether port traffic.

This patch assigns a SGLORT for each RX queue, and enables PCIe port
level loopback suppression.

Signed-off-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>

examples/l3fwd-power: fix memory leak for non-IP packets

Previous l3fwd-power only processes IP and IPv6 packets, other
packets' mbufs are not freed, and this causes a memory leak.
This patch fixes this issue.

Fixes: 3c0184cc0c60 ("examples: replace some offload flags with packet type")

Signed-off-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
Acked-by: Michael Qiu <michael.qiu@intel.com>

fm10k: make default VID available in initialization

When the PF establishes a connection with Switch Manager(SM), it receives
a logical port range from SM, and registers certain logical ports from
that range. Then a default VID will be sent back from the SM.

This whole transaction - finishing with the default VID being set -
needs to be completed before dev_init returns. If not, the interrupt
setting will subsequently be changed in dev_start according to the RX
queue number, and that can cause this transaction to fail.

Signed-off-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
Acked-by: Michael Qiu <michael.qiu@intel.com>