Shijith Thotton [Sat, 25 Mar 2017 06:24:24 +0000 (11:54 +0530)]
net/liquidio: add APIs to allocate and free IQ
Instruction queue (IQ) is used to send control and data packets to
device from host. IQ 0 is used to send device configuration commands
during initialization and later re-allocated as per application
requirement.
Wenzhuo Lu [Fri, 24 Mar 2017 02:51:03 +0000 (10:51 +0800)]
net/ixgbe: fix TC bandwidth setting
4 and 8 TCs are supported on ixgbe. By default there're
8 TCs. So when initializing the device, the bandwidth for
8 TCs is set.
When changing the TC number, it's only considered setting
the bandwidth for 4 TCs. If the user change the number
from 4 to 8, the TCs' bandwidth is not right.
Pascal Mazon [Wed, 22 Mar 2017 08:40:01 +0000 (09:40 +0100)]
net/tap: add link status notification
As tap is a virtual device, there's no physical way a link can be cut.
However, it has an associated kernel netdevice and possibly a remote
netdevice too. These netdevices link status may change outside of the
DPDK scope, through an external command such as:
ip link set dev tapX down
This commit implements link status notification through netlink.
Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com> Acked-by: Keith Wiles <keith.wiles@intel.com>
Pascal Mazon [Wed, 22 Mar 2017 08:40:00 +0000 (09:40 +0100)]
net/tap: improve link update
Reflect device link status according to the state of the tap netdevice
and the remote netdevice (if any). If both are UP and RUNNING, then the
device link status is set to ETH_LINK_UP, otherwise ETH_LINK_DOWN.
Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com> Acked-by: Keith Wiles <keith.wiles@intel.com>
Pascal Mazon [Thu, 23 Mar 2017 08:42:11 +0000 (09:42 +0100)]
net/tap: add remote netdevice traffic capture
By default, a tap netdevice is of no use when not fed by a separate
process. The ability to automatically feed it from another netdevice
allows applications to capture any kind of traffic normally destined to
the kernel stack.
This patch implements this ability through a new optional "remote"
parameter.
Packets matching filtering rules created with the flow API are matched
on the remote device and redirected to the tap PMD, where the relevant
action will be performed.
Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com> Acked-by: Olga Shern <olgas@mellanox.com> Acked-by: Keith Wiles <keith.wiles@intel.com>
Pascal Mazon [Thu, 16 Mar 2017 08:59:21 +0000 (09:59 +0100)]
net/tap: support segmented mbufs
Support for segmented packets (scatter/gather) is mandatory for most
purposes, regardless of the MTU size. Tx packets are often the result of
mbuf concatenation, and an mbuf is not necessarily large enough for Rx
packets to fit in a single one.
Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com> Acked-by: Keith Wiles <keith.wiles@intel.com>
Jerin Jacob [Mon, 20 Mar 2017 14:10:40 +0000 (19:40 +0530)]
net/thunderx: sync mailbox definitions with Linux PF driver
- bgx_link_status mbox definition was changed in Linux
commit 1cc702591bae ("net: thunderx: Add ethtool support")
- NIC_MBOX_MSG_RES_BIT related changes were never part of Linux PF driver
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Pascal Mazon [Thu, 23 Mar 2017 08:33:57 +0000 (09:33 +0100)]
net/tap: add basic flow API patterns and actions
Supported flow rules are now mapped to TC rules on the tap netdevice.
The netlink message used for creating the TC rule is stored in struct
rte_flow. That way, by simply changing a metadata in it, we can require
for the rule deletion without further parsing.
Supported items:
- eth: src and dst (with variable masks), and eth_type (0xffff mask).
- vlan: vid, pcp, tpid, but not eid.
- ipv4/6: src and dst (with variable masks), and ip_proto (0xffff mask).
- udp/tcp: src and dst port (0xffff) mask.
Supported actions:
- DROP
- QUEUE
- PASSTHRU
It is generally not possible to provide a "last" item. However, if the
"last" item, once masked, is identical to the masked spec, then it is
supported.
Only IPv4/6 and MAC addresses can use a variable mask. All other
items need a full mask (exact match).
Support for VLAN requires kernel headers >= 4.9, checked using
auto-config.sh.
Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com> Acked-by: Olga Shern <olgas@mellanox.com> Acked-by: Keith Wiles <keith.wiles@intel.com>
Pascal Mazon [Thu, 23 Mar 2017 08:33:56 +0000 (09:33 +0100)]
net/tap: add netlink back-end for flow API
Each kernel netdevice may have queueing disciplines set for it, which
determine how to handle the packet (mostly on egress). That's part of
the TC (Traffic Control) mechanism.
Through TC, it is possible to set filter rules that match specific
packets, and act according to what is in the rule. This is a perfect
candidate to implement the flow API for the tap PMD, as it has an
associated kernel netdevice automatically.
Each flow API rule will be translated into its TC counterpart.
To leverage TC, it is necessary to communicate with the kernel using
netlink. This patch introduces a library to help that communication.
Inside netlink.c, functions are generic for any netlink messaging.
Inside tcmsgs.c, functions are specific to deal with TC rules.
Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com> Acked-by: Olga Shern <olgas@mellanox.com> Acked-by: Keith Wiles <keith.wiles@intel.com>
Yongseok Koh [Tue, 21 Mar 2017 17:50:51 +0000 (10:50 -0700)]
net/mlx5: fix reusing Rx/Tx queues
When configuring Rx/Tx queue, if queue already exists, it is reused. But if
the queue size is changed, it must be resized to not access/overwrite
invalid memory.
Fixes: 2e22920b85d9 ("mlx5: support non-scattered Tx and Rx") Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Jingjing Wu [Wed, 22 Mar 2017 09:24:59 +0000 (17:24 +0800)]
net/i40e/base: new AQ commands for cloud filter
Add new admin queue function and extended fields for cloud filter:
- Add admin queue function for Replace filter command (Opcode: 0x025F)
- Define big buffer for extended general fields in Add/Remove
Cloud filters command
Signed-off-by: Laura Stroe <laura.stroe@intel.com> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com> Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Jingjing Wu [Wed, 22 Mar 2017 09:24:58 +0000 (17:24 +0800)]
net/i40e/base: add VF offload flags
This patch adds:
- ENCAP offload negotiation flag. Use the existing ENCAP_CSUM offload
flag to negotiate GSO_UDP_TUNNEL_CSUM capability and create new ENCAP
flag for negotiating offloads for encapsulated packets
- RX_ENCAP_CSUM offload negotiation flag for VF to negotiate RX
checksum capability for tunnelled packet types.
Jingjing Wu [Wed, 22 Mar 2017 09:24:57 +0000 (17:24 +0800)]
net/i40e/base: reduce wait time for adminq command
When sending an adminq command, we wait for the command to complete in
a loop. This loop waits for an entire millisecond, when in practice the
adminq command is processed often much faster.
Change the loop to use i40e_usec_delay instead, and wait for 50 usecs
each time instead. This appears to be about the minimum time required,
based on some manual observation and testing.
The primary benefit of this change is reducing latency of various
operations in the PF driver, especially when related to having a large
number of VFs enabled.
Jingjing Wu [Wed, 22 Mar 2017 09:24:55 +0000 (17:24 +0800)]
net/i40e/base: fix potential out of bound array access
This is fix for klocwork issue where dcbcfg->numapps could
be greater than size of array (i.e dcbcfg->app[I40E_DCBX_MAX_APPS]).
The fix makes sure the array is not accessed past size of array
(i.e. I40E_DCBX_MAX_APPS).
Keith Wiles [Tue, 21 Mar 2017 15:12:12 +0000 (10:12 -0500)]
net/bonding: reduce slave starvation on Rx poll
When polling the bonded ports for RX packets the old driver would
always start with the first slave in the list. If the requested
number of packets is filled on the first port in a two port config
then the second port could be starved or have larger number of
missed packet errors.
The code attempts to start with a different slave each time RX poll
is done to help eliminate starvation of slave ports. The effect of
the previous code was much lower performance for two slaves in the
bond then just the one slave.
The performance drop was detected when the application can not poll
the rings of Rx packets fast enough and the packets per second for
two or more ports was at the threshold throughput of the application.
At this threshold the slaves would see very little or no drops in
the case of one slave. Then enable the second slave you would see
a large drop rate on the two slave bond and reduction in throughput.
Signed-off-by: Keith Wiles <keith.wiles@intel.com> Acked-by: Declan Doherty <declan.doherty@intel.com>
Alejandro Lucero [Tue, 21 Mar 2017 10:43:20 +0000 (10:43 +0000)]
net/nfp: fix packet/data length conversion
Chained mbufs hold data_len as the length of that particular mbuf
and pkt_len as the full packet length including all the chained
mbufs. It is not clear from the mbuf definition if pkt_len should
be set for all the mbufs in a chain, but code there for handling
mbufs suggests just the first mbuf requires to have pkt_len set.
NFP PMD was assuming pkt_len is set in all the chained mbufs and
unit tests for gather dma were building mbufs with pkt_len always
set. This patch gets rid of that assumption.
Rasesh Mody [Sat, 18 Mar 2017 06:53:31 +0000 (23:53 -0700)]
net/qede/base: fix first VF index calculation
When a server doesn't support ARI, VF offsets begin at a much higher
number. As a result, ecore miscalculates the first_vf_in_pf and
initialization fails since base driver incorrectly learns there are
no SBs for its VF [as its VFs are out of range].
Rasesh Mody [Sat, 18 Mar 2017 06:50:21 +0000 (23:50 -0700)]
net/qede/base: fix numbering L2 VF queues
There are some constellations where Due to lack of resource allocation
in MFW, There would be an insufficient number of L2 queues for all the
VFs.
This introduces a new feature ECORE_VF_L2_QUE which correctly numbers
the number of VF queues. Notice it might be larger than the actual
number of VFs in configuration space, in which case its the ecore
client responsibility not to try activating that many.
As part of the fix, also correct the nubmering of the VF queues. As
their numbering is dependent on the SBs of the PF, which might only be
partially used by L2 [as half would be assigned for RDMA which doesn't
require L2 queues], we make the numbering consecutive with that of the
L2 queues only.
Fixes: ec94dbc57362 ("qede: add base driver") Signed-off-by: Rasesh Mody <rasesh.mody@cavium.com>
Rasesh Mody [Sat, 18 Mar 2017 06:50:19 +0000 (23:50 -0700)]
net/qede/base: fix printout
Prints in ecore_get_dev_info showed only chip revision,
and did that as number instead of letter.
I.e., BB A0 --> BB0, BB B0 --> BB1, AH A0 --> AH0, AH A1 --> AH0.
Correct the printing scheme into
{AH, BB} {A, B}{0, 1}
Fixes: ec94dbc57362 ("qede: add base driver") Signed-off-by: Rasesh Mody <rasesh.mody@cavium.com>
Rasesh Mody [Sat, 18 Mar 2017 06:50:18 +0000 (23:50 -0700)]
net/qede/base: fix TM block ILT initialization
Fix Timer(TM) block Internal Lookup Table(or ILT for logical to
physical address translation) initialization for SRIOV's coexistence
with other protocols.
Fixes: ec94dbc57362 ("qede: add base driver") Signed-off-by: Rasesh Mody <rasesh.mody@cavium.com>
Rasesh Mody [Sat, 18 Mar 2017 06:48:36 +0000 (23:48 -0700)]
net/qede/base: fix incorrect typecasting of flag
dcbx-update-flag is incorrectly converted to boolean before assigining
it to ramrod data, fix this typecasting. Also, added more debug
messages in the dcbx code paths.
Andrew Rybchenko [Mon, 20 Mar 2017 10:15:07 +0000 (10:15 +0000)]
net/sfc: use different callbacks for event queues
Use different sets of libefx EvQ callbacks for management,
transmit and receive event queue. It makes event handling
more robust against unexpected events.
Also it is required for alternative datapath support.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Yongseok Koh [Wed, 15 Mar 2017 23:55:44 +0000 (16:55 -0700)]
net/mlx5: add enhanced multi-packet send for ConnectX-5
ConnectX-5 supports enhanced version of multi-packet send (MPS). An MPS Tx
descriptor can carry multiple packets either by including pointers of
packets or by inlining packets. Inlining packet data can be helpful to
better utilize PCIe bandwidth. In addition, Enhanced MPS supports hybrid
mode - mixing inlined packets and pointers in a descriptor. This feature is
enabled by default if supported by HW.
Ivan Malov [Thu, 16 Mar 2017 11:01:35 +0000 (11:01 +0000)]
net/sfc: add callback to retrieve FW version
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andrew Lee <alee@solarflare.com>
Ivan Malov [Thu, 16 Mar 2017 11:01:34 +0000 (11:01 +0000)]
net/sfc/base: add advanced function to extract FW version
Some libefx-based drivers might need this functionality to
indicate DPCPU FW IDs as part of FW version info to assist
experienced users.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andrew Lee <alee@solarflare.com> Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Wenzhuo Lu [Thu, 16 Mar 2017 03:09:45 +0000 (11:09 +0800)]
net/i40e: fix broadcast promiscuous mode setting
When setting up the VSIs, MAC filter is used for
receiving MAC broadcast packets.
We should follow it to implement the broadcast
promiscuous mode setting.
Fixes: 61fff9b4c68b ("net/i40e: set VF broadcast mode from PF") Cc: stable@dpdk.org Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Chas Williams [Wed, 15 Mar 2017 12:35:10 +0000 (08:35 -0400)]
net/vmxnet3: fix queue size changes
If the user reconfigures the queue size, then the previously allocated
memzone may potentially be too small. Release the memzone when a queue
is released and allocate a new one each time a queue is setup.
While here convert to rte_eth_dma_zone_reserve() which does basically
the same things as the private function.
Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver implementation") Cc: stable@dpdk.org Signed-off-by: Chas Williams <ciwillia@brocade.com> Acked-by: Jan Blunck <jblunck@infradead.org> Acked-by: Shrikrishna Khare <skhare@vmware.com>
Pascal Mazon [Wed, 15 Mar 2017 14:48:17 +0000 (15:48 +0100)]
net/tap: add multicast addresses management
A tap netdevice actually receives every packet, without any filtering
whatsoever. There is no need for any multicast address registration
to receive multicast packets.
Pascal Mazon [Wed, 15 Mar 2017 14:48:15 +0000 (15:48 +0100)]
net/tap: add MTU management
The MTU is assigned to the tap netdevice according to the argument, but
packet transmission and reception just write/read on an fd with the
default limit being the socket buffer size.
As a new rte_eth_dev_data is allocated during tap device init, ensure it
is set again dev->data->mtu.
Once the actual netdevice is created via tun_alloc(), make sure to apply
the desired MTU to the netdevice.
Pascal Mazon [Wed, 15 Mar 2017 14:48:12 +0000 (15:48 +0100)]
net/tap: remove NO-ARP setting
There is no reason not to support ARP on a tap netdevice. Remove
IFF_NOARP flags.
Focus on IFF_UP when a link status change is required.
Fixes: f457b472b1f2 ("net/tap: add link up and down operations") Signed-off-by: Pascal Mazon <pascal.mazon@6wind.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Ivan Malov [Thu, 9 Mar 2017 17:23:03 +0000 (17:23 +0000)]
net/sfc: add VFs to the table of PCI IDs for supported NICs
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andrew Lee <alee@solarflare.com>
Ivan Malov [Thu, 9 Mar 2017 17:23:02 +0000 (17:23 +0000)]
net/sfc: support MCDI proxy
The patch is to add support for MCDI proxy which comes in
useful, particularly, while running over VF: few commands
will normally fail with EPERM, but in some cases the host
driver (i.e. running over the corresponding PF, typically,
within a hypervisor) may set itself as a proxy to conduct
authorization for the commands coming from VFs; these are
forwarded to the corresponding access control application
which may decline or approve authorization by replying to
the requests; all in all, the guest driver has to process
the replies forwarded back by the firmware MC in order to
give up gracefully (by setting return code which could be
understood by 'libefx') or re-issue the original commands
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Ivan Malov [Thu, 9 Mar 2017 17:23:01 +0000 (17:23 +0000)]
net/sfc: avoid failure on port start if Rx mode is rejected
If Rx mode is unacceptable, in particular, when promiscuous
or all-multicast filters are not allowed while running over
PCI function which is not a member of appropriate privilege
groups, the driver has to cope with the failures gracefully
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andrew Lee <alee@solarflare.com>
Ivan Malov [Thu, 9 Mar 2017 17:23:00 +0000 (17:23 +0000)]
net/sfc: poll MAC stats if periodic DMA is not supported
If periodic DMA statistics feature is absent (particularly,
while running over VF), the PMD must provide an ability to
cope with it using explicit update requests which are kept
restrained according to 'stats_update_period_ms' parameter
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Ivan Malov [Thu, 9 Mar 2017 17:22:59 +0000 (17:22 +0000)]
net/sfc: add kvarg control for MAC statistics update period
The patch is to make MAC statistics update interval tunable
by means of 'stats_update_period_ms' kvarg parameter making
it possible to use values different from 1000 ms in case of
SFN8xxx boards provided that firmware version is 6.2.1.1033
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andrew Lee <alee@solarflare.com>