Helin Zhang [Tue, 3 Nov 2015 16:07:54 +0000 (00:07 +0800)]
i40e: configure input fields for RSS or flow director
The default input set of fields of a received packet are loaded from
firmware, which cannot be modified even users want to use different
fields for RSS or flow director. Here adds more flexibilities of
selecting packet fields for hash calculation or flow director for
users.
Helin Zhang [Tue, 3 Nov 2015 15:40:28 +0000 (23:40 +0800)]
i40e: enlarge the number of supported queues
It enlarges the number of supported queues to hardware allowed
maximum. There was a software limitation of 64 per physical port
which is not reasonable.
Only seen since IPv6 RSS support was added, confusion about the purpose of
the hash_rxq_type_from_n() function has caused it to return invalid hash RX
queue types.
Refactor function for its intended purpose, rename it
hash_rxq_type_from_pos() and update comment with a better description.
Fixes: a76133214d88 ("mlx5: use separate indirection table for default hash Rx queue") Reported-by: Olga Shern <olgas@mellanox.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The following error occurs when CONFIG_RTE_LIBRTE_MLX5_DEBUG=y:
drivers/net/mlx5/mlx5.c:381:4: error: ISO C forbids braced-groups within expressions
RTE_MIN() uses the non-standard ({ ... }) syntax to declare variables within
parentheses, which is rejected by -pedantic.
Since the RSS_INDIRECTION_TABLE_SIZE check is meant to go away as soon as
DPDK supports larger/variable indirection tables, put it in a separate
condition.
Fixes: 634efbc2c8c0 ("mlx5: support RETA query and update") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
remove pci_dev, pci_drv, rte_bond_pmd and pci_id_table.
handle numa_node for vdevs
handle RTE_ETH_DEV_INTR_LSC for vdevs
rename the function valid_bonded_device to check_for_bonded_device
remove branches on pci_dev
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com> Acked-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: John W. Linville <linville@tuxdriver.com>
Ravi Kerur [Wed, 23 Sep 2015 21:16:17 +0000 (14:16 -0700)]
ethdev: clean port id retrieval when attaching
Removed following functions
> rte_eth_dev_save and
> rte_eth_dev_get_changed_port
Added 2 new functions
> rte_eth_dev_get_port_by_name
> rte_eth_dev_get_port_by_addr
Compiled on Linux for following targets
> x86_64-native-linuxapp-gcc
> x86_64-native-linuxapp-clang
> x86_x32-native-linuxapp-gcc
Compiled on FreeBSD for following targets
> x86_64-native-bsdapp-clang
> x86_64-native-bsdapp-gcc
Tested on Linux/FreeBSD:
> port attach eth_null
> port start all
> port stop all
> port close all
> port detach 0
> port attach eth_null
> port start all
> port stop all
> port close all
> port detach 0
Successful run of checkpatch.pl on the diffs
Successful validate_abi on Linux for following targets
Ivan Boule [Thu, 29 Oct 2015 08:46:15 +0000 (09:46 +0100)]
virtio: fix size of MAC address array
Make the virtio PMD allocate the array of unicast MAC addresses with
the maximum of entries (VIRTIO_MAX_MAC_ADDRS) that it exports.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com> Signed-off-by: David Marchand <david.marchand@6wind.com> Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Didier Pallard [Thu, 29 Oct 2015 08:47:53 +0000 (09:47 +0100)]
ixgbe: remove useless fields in checksum offload
According to Table 7-38: Valid Fields by Offload Option
of Intel ® 82599 10 GbE Controller Datasheet,
L4LEN field is not needed for L4 XSUM computation by the hardware.
So remove l4_len from tx_offload_mask in ixgbe_set_xmit_ctx
function used to build the context transmitted to the hardware.
Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Signed-off-by: David Marchand <david.marchand@6wind.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
ConnectX-4 adapters do not have a constant indirection table size, which is
set at runtime from the number of RX queues. The maximum size is retrieved
using a hardware query and is normally 512.
Since the current RETA API cannot handle a variable size, any query/update
command causes it to be silently updated to RSS_INDIRECTION_TABLE_SIZE
entries regardless of the original size.
Also due to the underlying type of the configuration structure, the maximum
size is limited to RSS_INDIRECTION_TABLE_SIZE (currently 128, at most 256
entries).
A port stop/start must be done to apply the new RETA configuration.
Helin Zhang [Tue, 13 Oct 2015 07:19:50 +0000 (15:19 +0800)]
i40evf: support AQ based RSS config
It supports both Admin queue based and directly writing registers
based RSS hash key and lookup table configuration, as X722 supports
AQ based configuration.
Signed-off-by: Helin Zhang <helin.zhang@intel.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Helin Zhang [Tue, 13 Oct 2015 07:19:49 +0000 (15:19 +0800)]
i40e: support AQ based RSS config
It supports both Admin queue based and directly writing registers
based RSS hash key and lookup table configuration, as X722 supports
AQ based configuration.
Signed-off-by: Helin Zhang <helin.zhang@intel.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Helin Zhang [Tue, 13 Oct 2015 07:19:48 +0000 (15:19 +0800)]
i40e: support X722 and its A0 hardware
In order to provide users early access of X722 and its A0 hardware,
new device IDs are added, and also compilation with those support
in base driver is enabled.
Signed-off-by: Helin Zhang <helin.zhang@intel.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Bruce Richardson [Wed, 30 Sep 2015 12:12:20 +0000 (13:12 +0100)]
ring: store memzone pointer
Add a new field to the rte_ring structure to store the memzone pointer which
contains the ring. For rings created using rte_ring_create(), the field will
be set automatically.
This new field will allow users of the ring to query the numa node a ring is
allocated on, or to get the physical address of the ring, if so needed.
The rte_ring structure will also maintain ABI compatibility, as the
structure members, after the new one, are set to be cache line aligned,
so leaving a space.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Bruce Richardson [Wed, 30 Sep 2015 12:12:19 +0000 (13:12 +0100)]
ring: enhance device setup from rings
The ring ethdev creation function creates an ethdev, but does not
actually set it up for use. Even if it's just a single ring, the user
still needs to create a mempool, call rte_eth_dev_configure, then call
rx and tx setup functions before the ethdev can be used.
This patch changes things so that the ethdev is fully set up after the
call to create the ethdev. The above-mentionned functions can still be
called - as will be the case, for instance, if the NIC is created via
commandline parameters - but they no longer are essential.
The function now also sets rte_errno appropriately on error, so the
caller can get a better indication of why a call may have failed.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Maryam Tahhan [Tue, 20 Oct 2015 10:34:18 +0000 (11:34 +0100)]
ethdev: do not deprecate imissed counter
Remove the deprecation tag and notice for imissed as it is a generic
register that accounts for packets that were dropped by the HW,
because there are no available mbufs (RX queues are full). imissed is
different to ierrors and can help with general debug.
This patch removes the mac local fault count and
mac remote fault count from rx errors. The mac
fault count registers count faults, not packets,
and hence should not be added to packet counters.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Update the strings used for presenting stats to adhere
to the scheme previously presented. Updated xstats_get()
function to handle Q information only if xstats() is not
implemented in the PMD, providing the PMD with the needed
flexibility to expose its extended Q stats.
Add extended statistic section to the programmers
guide, poll mode driver section. This section describes
how the strings stats are formatted, and how the client
code can use this to gather information about the stat.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Maryam Tahhan <maryam.tahhan@intel.com>
Marcel Apfelbaum [Thu, 15 Oct 2015 11:08:39 +0000 (14:08 +0300)]
vhost: enable virtio 1.0
Make vhost-user virtio 1.0 compatible by adding it to the
supported features and keeping the header length
the same as for mergeable RX buffers.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Huawei Xie [Thu, 29 Oct 2015 14:53:26 +0000 (22:53 +0800)]
virtio: add vector Rx
With fixed avail ring, we don't need to get desc idx from avail ring.
virtio driver only has to deal with desc ring.
This patch uses vector instruction to accelerate processing desc ring.
Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Huawei Xie [Thu, 29 Oct 2015 14:53:24 +0000 (22:53 +0800)]
virtio: optimize ring layout
In DPDK based switching environment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor
When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.
This idea of this optimization is:
allocate the fixed descriptor for each entry of avail ring, so avail ring will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.
This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
avail
idx
+
|
+----+----+---+-------------+------+
| 0 | 1 | 2 | ... | 254 | 255 | avail ring
+-+--+-+--+-+-+---------+---+--+---+
| | | | | |
| | | | | |
v v v | v v
+-+--+-+--+-+-+---------+---+--+---+
| 0 | 1 | 2 | ... | 254 | 255 | desc ring
+----+----+---+-------------+------+
|
|
+----+----+---+-------------+------+
| 0 | 1 | 2 | | 254 | 255 | used ring
+----+----+---+-------------+------+
|
+
This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.
++
||
||
+-----+-----+-----+--------------+------+------+------+
| 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| | | || | | |
v v v || v v v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 128 | 129 | ... | 255 || 128 | 129 | ... | 255 | desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| | | || | | |
v v v || v v v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
||
||
++
Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Vector RX function will process 4 packets at a time. When the RX
ring wrapps to the tail and the left descriptor size is not multiple
of 4, SW will overwrite memory that not belongs to it and cause crash.
The fix will allocate additional 4 HW/SW spaces at the tail to avoid
overwrite.
Vector TX use different way to manage TX queue, it's necessary
to use different functions to reset TX queue and release mbuf
in TX queue. So, introduce 2 function pointers to do such ops.
This patch add below functions:
1. Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
in RX HW ring.
2. Add 2 functions, in which using SSE instructions to parse RX desc
to get pkt_type and ol_flags in mbuf.
3. Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
includes possible chained packets.
4. Add func fm10k_recv_pkts_vec to receive single mbuf packet.
Add condition check in rx_queue_setup func. If number of RX desc
can't satisfy vPMD requirement, record it into a variable. Or
call fm10k_rxq_vec_setup to initialize Vector RX.
Add the ability for the upper layer to query RX/TX queue information.
Add into rte_eth_dev_info new fields to represent information about
RX/TX descriptors min/max/alig nnumbers per queue for the device.
Add new structures:
struct rte_eth_rxq_info
struct rte_eth_txq_info
new functions:
rte_eth_rx_queue_info_get
rte_eth_tx_queue_info_get
into rte_etdev API.
Left extra free space in the queue info structures,
so extra fields could be added later without ABI breakage.
Add new fields:
rx_desc_lim
tx_desc_lim
into rte_eth_dev_info.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Remy Horton <remy.horton@intel.com>
According to XL710 datasheet:
RX QLEN restrictions: When the PXE_MODE flag in the GLLAN_RCTL_0
register is cleared, the QLEN must be whole number of 32
descriptors.
TX QLEN restrictions: When the PXE_MODE flag in the GLLAN_RCTL_0
register is cleared, the QLEN must be whole number of 32
descriptors.
So make sure that for both RX and TX queues number of HW descriptors is
a multiple of 32.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Remy Horton <remy.horton@intel.com>
Tomasz Kulasek [Fri, 30 Oct 2015 14:25:52 +0000 (15:25 +0100)]
null: add virtual RSS configuration
This implementation allows to set and read RSS configuration for null
device, and is used to validate right values propagation over the slaves,
in test units for dynamic RSS configuration for bonding.
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Tomasz Kulasek [Fri, 30 Oct 2015 14:25:48 +0000 (15:25 +0100)]
bonding: support RSS dynamic configuration
Bonding device implements independent management of RSS settings. It
stores its own copies of settings i.e. RETA, RSS hash function and RSS
key. It’s required to ensure consistency.
1) RSS hash function set for bonding device is maximal set of RSS hash
functions supported by all bonded devices. That mean, to have RSS support
for bonding, all slaves should be RSS-capable.
2) RSS key is propagated over the slaves "as is".
3) RETA for bonding is an internal table managed by bonding API, and is
used as a pattern to set up slaves. Its size is GCD of all RETA sizes, so
it can be easily used as a pattern providing expected behavior, even if
slaves RETA sizes are different.
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Acked-by: Declan Doherty <declan.doherty@intel.com>
Shaopeng He [Sat, 31 Oct 2015 02:44:42 +0000 (10:44 +0800)]
fm10k: support VMDQ in MAC/VLAN filter
The patch does below things for fm10k MAC/VLAN filter:
- Add separate functions for VMDQ and main VSI to change
MAC filter.
- Disable modification to VLAN filter in VMDQ mode.
- In device close phase, delete logic ports to remove all
MAC/VLAN filters belonging to those ports.
Signed-off-by: Shaopeng He <shaopeng.he@intel.com> Acked-by: Jingjing Wu <jingjing.wu@intel.com> Acked-by: Michael Qiu <michael.qiu@intel.com>
Jingjing Wu [Sat, 31 Oct 2015 15:57:25 +0000 (23:57 +0800)]
i40e: support DCB mode
This patch enables DCB feature on Intel XL710/X710 NICs. It includes:
Receive queue classification based on traffic class
Round Robin ETS schedule (rx and tx)
Priority flow control
Adrien Mazarguil [Fri, 30 Oct 2015 18:55:14 +0000 (19:55 +0100)]
app/testpmd: fix missing init in RSS hash show command
The "show port X rss-hash" command sometimes displays garbage instead of the
expected RSS hash key because the maximum key length is undefined. When the
requested key is too large to fit in the buffer,
rte_eth_dev_rss_hash_conf_get() does not update it.
Nelio Laranjeiro [Fri, 30 Oct 2015 18:55:13 +0000 (19:55 +0100)]
app/testpmd: add missing type to RSS hash commands
DPDK uses a structure to get or set a new hash key (see
eth_rte_rss_hash_conf). rss_hf field from this structure is used in
rss_hash_get_conf to retrieve the hash key and in rss_hash_update uses
it to verify the key exists before trying to update it.
Nelio Laranjeiro [Fri, 30 Oct 2015 18:55:12 +0000 (19:55 +0100)]
mlx5: use one RSS hash key per flow type
DPDK expects to have an RSS hash key per flow type (IPv4, IPv6, UDPv4,
etc.), to handle this the PMD must keep a table of hash keys to be able
to reconfigure the queues at each start/stop call.
Nelio Laranjeiro [Fri, 30 Oct 2015 18:55:11 +0000 (19:55 +0100)]
mlx5: support RSS hash update and get
First implementation of rss_hash_update and rss_hash_conf_get, those
functions still lack in functionality but are usable to change the RSS
hash key. For now, the PMD does not handle an indirection table for
each kind of flow (IPv4, IPv6, etc.), the same RSS hash key is used
for all protocols. This situation explains why the rss_hash_conf_get
returns the RSS hash key for all DPDK supported protocols and why the
hash key is set for all of them too.
Olga Shern [Fri, 30 Oct 2015 18:55:10 +0000 (19:55 +0100)]
mlx5: use alternate method to configure promisc and allmulti modes
Promiscuous and allmulticast modes were historically enabled by adding
specific flows with types IBV_FLOW_ATTR_ALL_DEFAULT or
IBV_EXP_FLOW_ATTR_MC_DEFAULT to each hash RX queue, but this method is
deprecated.
- Promiscuous mode is now enabled by omitting destination MAC addresses from
basic flow specifications.
- Allmulticast mode is now enabled by using flow specifications that match
the broadcast bit in destination MAC addresses.
Olga Shern [Fri, 30 Oct 2015 18:55:09 +0000 (19:55 +0100)]
mlx5: define specific flow steering rules for each hash Rx QP
All hash RX QPs currently use the same flow steering rule (L2 MAC filtering)
regardless of their type (TCP, UDP, IPv4, IPv6), which prevents them from
being dispatched properly. This is fixed by adding flow information to the
hash RX queue initialization data and generating specific flow steering
rules for each of them.