dpdk.git
6 years agonet/enic: support UDP RSS on 1400 series adapters
John Daley [Wed, 4 Apr 2018 23:54:52 +0000 (16:54 -0700)]
net/enic: support UDP RSS on 1400 series adapters

Recent models support IPv4/IPv6 UDP RSS. There is no control bit to
enable UDP RSS alone. Instead, the NIC enables/disables TCP and UDP
RSS together.

Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Hyong Youb Kim <hyonkim@cisco.com>
6 years agonet/enic: do not flush descriptor cache when opening vNIC
Hyong Youb Kim [Wed, 4 Apr 2018 23:54:50 +0000 (16:54 -0700)]
net/enic: do not flush descriptor cache when opening vNIC

The firmware on new hardware models flushes the global descriptor
cache by default. Use CMD_OPENF_IG_DESCCACHE to avoid cache
flushing. This flag has no effect on older models.

Suggested-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
6 years agonet/axgbe: support meson build
Ravi Kumar [Fri, 6 Apr 2018 12:36:51 +0000 (08:36 -0400)]
net/axgbe: support meson build

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add workaround for ethernet training
Ravi Kumar [Fri, 6 Apr 2018 12:36:50 +0000 (08:36 -0400)]
net/axgbe: add workaround for ethernet training

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: support 32-bit build mode
Ravi Kumar [Fri, 6 Apr 2018 12:36:49 +0000 (08:36 -0400)]
net/axgbe: support 32-bit build mode

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: support generic Rx/Tx stats
Ravi Kumar [Fri, 6 Apr 2018 12:36:48 +0000 (08:36 -0400)]
net/axgbe: support generic Rx/Tx stats

This patch adds support for port statistics api defined
for ethernet PMDs.

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: support promiscuous mode
Ravi Kumar [Fri, 6 Apr 2018 12:36:47 +0000 (08:36 -0400)]
net/axgbe: support promiscuous mode

This patch enables promiscuous and multicast support for AXGBE PMD.

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add configure flow control while link adjustment
Ravi Kumar [Fri, 6 Apr 2018 12:36:46 +0000 (08:36 -0400)]
net/axgbe: add configure flow control while link adjustment

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add link status update
Ravi Kumar [Fri, 6 Apr 2018 12:36:45 +0000 (08:36 -0400)]
net/axgbe: add link status update

Added support to update device link status atomically.

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agodoc: add guide for AMD axgbe Ethernet PMD
Ravi Kumar [Fri, 6 Apr 2018 12:36:44 +0000 (08:36 -0400)]
doc: add guide for AMD axgbe Ethernet PMD

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add Rx/Tx data path
Ravi Kumar [Fri, 6 Apr 2018 12:36:43 +0000 (08:36 -0400)]
net/axgbe: add Rx/Tx data path

Supported scalar implementation for RX data path.
Supported scalar and vector implementation for TX data path.

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add DMA programming and start/stop
Ravi Kumar [Fri, 6 Apr 2018 12:36:42 +0000 (08:36 -0400)]
net/axgbe: add DMA programming and start/stop

This patch adds support to program DMA and DPDK device start
and stop apis.

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add Rx/Tx setup
Ravi Kumar [Fri, 6 Apr 2018 12:36:41 +0000 (08:36 -0400)]
net/axgbe: add Rx/Tx setup

Add support for data path setup apis defined for PMDs.

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add interrupt handler for autonegotiation
Ravi Kumar [Fri, 6 Apr 2018 12:36:40 +0000 (08:36 -0400)]
net/axgbe: add interrupt handler for autonegotiation

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add phy programming APIs
Ravi Kumar [Fri, 6 Apr 2018 12:36:39 +0000 (08:36 -0400)]
net/axgbe: add phy programming APIs

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add phy init and related APIs
Ravi Kumar [Fri, 6 Apr 2018 12:36:38 +0000 (08:36 -0400)]
net/axgbe: add phy init and related APIs

Added device phy initialization, read/write and other
maintenance apis to be used within PMD.

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add structs for MAC init and reset
Ravi Kumar [Fri, 6 Apr 2018 12:36:37 +0000 (08:36 -0400)]
net/axgbe: add structs for MAC init and reset

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add phy register map and helper macros
Ravi Kumar [Fri, 6 Apr 2018 12:36:36 +0000 (08:36 -0400)]
net/axgbe: add phy register map and helper macros

Added phy related register definitions.

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add register map and related macros
Ravi Kumar [Fri, 6 Apr 2018 12:36:35 +0000 (08:36 -0400)]
net/axgbe: add register map and related macros

Added DMA and MAC related register definitions.

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/axgbe: add minimal init and uninit support
Ravi Kumar [Fri, 6 Apr 2018 12:36:34 +0000 (08:36 -0400)]
net/axgbe: add minimal init and uninit support

Add ethernet poll mode driver for AMD 10G devices embedded in
AMD EPYC™ EMBEDDED 3000 family processors.

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
6 years agonet/tap: fix memcpy with incorrect size
Pavan Nikhilesh [Fri, 6 Apr 2018 11:30:31 +0000 (17:00 +0530)]
net/tap: fix memcpy with incorrect size

Fix incorrect sizeof operation being used for getting mac addr size.

Found while compiling with arm64 clang.
drivers/net/tap/rte_eth_tap.c:1410:40: error: argument to 'sizeof' in
    'memcpy' call is the same pointer type 'struct ether_addr *' as the
    destination; expected 'struct ether_addr' or an explicit length
    [-Werror,-Wsizeof-pointer-memaccess]
       rte_memcpy(&pmd->eth_addr, mac_addr, sizeof(mac_addr));
                  ~~~~~~~~~~~~~~            ^~~~~~~~~~~~~~~~

Fixes: bcab6c1d27fa ("net/tap: allow user MAC to be passed as args")

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Zhiyong Yang <zhiyong.yang@intel.com>
6 years agonet/nfp: support new HW offloads API
Alejandro Lucero [Thu, 15 Mar 2018 14:30:37 +0000 (14:30 +0000)]
net/nfp: support new HW offloads API

In next 18.05 the old hw offload API will be removed. This patch adds
support for just the new hw offload API.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
6 years agonet/nfp: remove files
Alejandro Lucero [Thu, 5 Apr 2018 14:42:47 +0000 (15:42 +0100)]
net/nfp: remove files

New CPP interface makes NSPU interface obsolete. These files are
not needed anymore.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
6 years agodoc: update NFP guide
Alejandro Lucero [Thu, 5 Apr 2018 14:42:46 +0000 (15:42 +0100)]
doc: update NFP guide

New CPP interface changes the way firmware upload is managed by
the PMD. It also supports different firmware file names for
having specific firmware applications per card.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
6 years agonet/nfp: use new CPP interface
Alejandro Lucero [Thu, 5 Apr 2018 14:42:45 +0000 (15:42 +0100)]
net/nfp: use new CPP interface

PF PMD support was based on NSPU interface. This patch changes the
PMD for using the new CPP user space interface which gives more
flexibility for adding new functionalities.

This change just affects initialization with the datapath being the
same than before.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
6 years agonet/nfp: support CPP
Alejandro Lucero [Thu, 5 Apr 2018 14:42:44 +0000 (15:42 +0100)]
net/nfp: support CPP

CPP refers to the internal NFP Command Push Pull bus. This patch allows
to create CPP commands from user space allowing to access any single
part of the chip.

This CPP interface is the base for having other functionalities like
mutexes when accessing specific chip components, chip resources management,
firmware upload or using the NSP, an embedded arm processor which can
perform tasks on demand.

NSP was the previous only way for doing things in the chip by the PMD,
where a NSPU interface was used for commands like firmware upload or
port link configuration. CPP interface supersedes NSPU, but it is still
possible to use NSP through CPP.

CPP interface adds a great flexibility for doing things like extended
stats or firmware debugging.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
6 years agonet/szedata2: implement dynamic logging
Matej Vido [Wed, 4 Apr 2018 13:46:35 +0000 (15:46 +0200)]
net/szedata2: implement dynamic logging

Signed-off-by: Matej Vido <vido@cesnet.cz>
6 years agonet/szedata2: convert license headers to SPDX tags
Matej Vido [Wed, 4 Apr 2018 13:45:47 +0000 (15:45 +0200)]
net/szedata2: convert license headers to SPDX tags

Signed-off-by: Matej Vido <vido@cesnet.cz>
6 years agonet/szedata2: fix format string for PCI address
Matej Vido [Wed, 4 Apr 2018 13:42:21 +0000 (15:42 +0200)]
net/szedata2: fix format string for PCI address

For fscanf() function SCN macros should be used but PRI macros were
wrongly used.
Also use correct sizes of variables for read values.

Fixes: 83556fd2c0fc ("szedata2: change to physical device type")
Cc: stable@dpdk.org
Signed-off-by: Matej Vido <vido@cesnet.cz>
6 years agonet/szedata2: add stat of mbuf allocation failures
Matej Vido [Wed, 4 Apr 2018 13:42:20 +0000 (15:42 +0200)]
net/szedata2: add stat of mbuf allocation failures

Signed-off-by: Matej Vido <vido@cesnet.cz>
6 years agonet/szedata2: use dynamically allocated queues
Matej Vido [Wed, 4 Apr 2018 13:42:19 +0000 (15:42 +0200)]
net/szedata2: use dynamically allocated queues

Previously the queues were the part of private data structure of the
Ethernet device.
Now the queues are allocated at setup thus numa-aware allocation is
possible.

Signed-off-by: Matej Vido <vido@cesnet.cz>
6 years agonet/szedata2: fix total stats
Matej Vido [Wed, 4 Apr 2018 13:42:18 +0000 (15:42 +0200)]
net/szedata2: fix total stats

Counters from all queues have to be summed up for total stats
even though the number of queue stats counters is not sufficient.

Fixes: 83556fd2c0fc ("szedata2: change to physical device type")
Cc: stable@dpdk.org
Signed-off-by: Matej Vido <vido@cesnet.cz>
6 years agonet/bonding: switch to new offloading API
Ferruh Yigit [Thu, 22 Mar 2018 18:13:24 +0000 (18:13 +0000)]
net/bonding: switch to new offloading API

Switch to new ethdev offloading API.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>
6 years agonet/bonding: clear started state if start fails
Chas Williams [Fri, 23 Mar 2018 17:05:32 +0000 (13:05 -0400)]
net/bonding: clear started state if start fails

There are several error paths where the bonding device may not start.
Clear dev_started before we return if we take one of these paths.

Fixes: 2efb58cbab6e ("bond: new link bonding library")
Cc: stable@dpdk.org
Signed-off-by: Chas Williams <chas3@att.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>
6 years agonet/bonding: fix setting VLAN ID on slave ports
Chas Williams [Tue, 3 Apr 2018 16:01:22 +0000 (12:01 -0400)]
net/bonding: fix setting VLAN ID on slave ports

The pos returned is just the offset of the slab.  You need to use this
to offset the bits in the slab.

Fixes: c771e4ef38 ("net/bonding: enable slave VLAN filter")
Cc: stable@dpdk.org
Signed-off-by: Chas Williams <chas3@att.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>
6 years agonet/octeontx: fix uninitialized speed variable
Stephen Hemminger [Thu, 5 Apr 2018 15:12:28 +0000 (08:12 -0700)]
net/octeontx: fix uninitialized speed variable

This is fix for Coverity Defect 268319 about uninitialized speed
in an error case. Also drop unnecessary assignment.

Coverity issue: 268319
Fixes: 4fac7c0a147e ("net/octeontx: add link update")
CC: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agonet/octeontx: remove redundant driver name update
Santosh Shukla [Mon, 2 Apr 2018 16:05:33 +0000 (16:05 +0000)]
net/octeontx: remove redundant driver name update

Cc: stable@dpdk.org
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Reviewed-by: David Marchand <david.marchand@6wind.com>
6 years agoethdev: fix library version in meson build
Andrew Rybchenko [Tue, 20 Mar 2018 11:26:26 +0000 (11:26 +0000)]
ethdev: fix library version in meson build

Fixes: 653e038efc9b ("ethdev: remove versioning of filter control function")

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
6 years agonet/cxgbe: update to Rx/Tx offload API
Shagun Agrawal [Wed, 4 Apr 2018 03:53:37 +0000 (09:23 +0530)]
net/cxgbe: update to Rx/Tx offload API

Update to new Rx/Tx offload API. Always set CRC stripping during
configuration, since it can't be disabled.

Signed-off-by: Shagun Agrawal <shaguna@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
6 years agonet/cxgbe: add option to keep outer VLAN tag in QinQ
Shagun Agrawal [Wed, 4 Apr 2018 03:53:36 +0000 (09:23 +0530)]
net/cxgbe: add option to keep outer VLAN tag in QinQ

Add devargs option to keep outer VLAN tag in Q-in-Q packets.

Signed-off-by: Shagun Agrawal <shaguna@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
6 years agodoc: add VF in CXGBE guide
Kumar Sanghvi [Wed, 4 Apr 2018 03:53:35 +0000 (09:23 +0530)]
doc: add VF in CXGBE guide

Add documentation on running DPDK on SR-IOV virtual functions for
Chelsio NICs.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
6 years agonet/szedata2: convert to new offload API
Matej Vido [Tue, 3 Apr 2018 15:06:21 +0000 (17:06 +0200)]
net/szedata2: convert to new offload API

Offload API is currently used only to setup correct receive function
for scattered packets.
Use offloads member instead of bitfield and advertise correct
capabilities.

Signed-off-by: Matej Vido <vido@cesnet.cz>
6 years agonet/bnxt: convert to SPDX license tag
Scott Branden [Mon, 2 Apr 2018 22:34:32 +0000 (15:34 -0700)]
net/bnxt: convert to SPDX license tag

Update the license header on bnxt files to be the standard
BSD-3-Clause license used for the rest of DPDK,
bring the files in compliance with the DPDK licensing policy.

Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
6 years agonet/octeontx: use the new offload APIs
Pavan Nikhilesh [Thu, 5 Apr 2018 13:23:32 +0000 (18:53 +0530)]
net/octeontx: use the new offload APIs

Use the new Rx/Tx offload APIs and remove the old style offloads.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
6 years agonet/mlx5: remove excessive data prefetch
Yongseok Koh [Mon, 12 Mar 2018 17:05:45 +0000 (10:05 -0700)]
net/mlx5: remove excessive data prefetch

In Enhanced Multi-Packet Send (eMPW), entire packet data is prefetched to
LLC if it isn't inlined. Even though this helps reducing jitter when HW
fetches data by DMA, this can thresh the LLC with evicting precious data.
And if the size of queue is large and there are many queues, this might not
be effective. Also, if application runs on a remote node from the PCIe
link, it may not be helpful and can even cause bad results.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
6 years agonet/mlx5: add packet type index for TCP ack
Bin Huang [Fri, 30 Mar 2018 05:13:38 +0000 (13:13 +0800)]
net/mlx5: add packet type index for TCP ack

According to CQE format:
- l4_hdr_type:
     0 - None
     1 - TCP header was present in the packet
     2 - UDP header was present in the packet
     3 - TCP header was present in the packet with Empty
         TCP ACK indication. (TCP packet <ACK> flag is set,
         and packet carries no data)
     4 - TCP header was present in the packet with TCP ACK indication.
         (TCP packet <ACK> flag is set, and packet carries data).

A packet should be identified as TCP packet if l4_hdr_type is 1, 3 or 4.
Add corresponding idx of TCP ACK to ptype table.

previous discussion:
https://www.mail-archive.com/users@dpdk.org/msg02980.html

Signed-off-by: Bin Huang <bin.huang@hxt-semitech.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
6 years agonet/mlx: fix warnings for unused compiler arguments
Bruce Richardson [Thu, 29 Mar 2018 13:36:20 +0000 (14:36 +0100)]
net/mlx: fix warnings for unused compiler arguments

When linking the mlx glue code libraries using CC, the linker arguments in
LDFLAGS are not prefixed with -Wl. [The EXTRA_LDFLAGS are though.] This
leads to warning messages on build:

clang-5.0: warning: argument unused during compilation: '-e xport-dynamic'

Fix this by checking for $LINK_USING_CC in the Makefiles and prefixing the
LDFLAGS appropriately if set.

Fixes: 27cea11686ff ("net/mlx4: spawn rdma-core dependency plug-in")
Fixes: 59b91bec12c6 ("net/mlx5: spawn rdma-core dependency plug-in")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
6 years agonet/mlx4: fix a typo in header file
Rami Rosen [Wed, 28 Mar 2018 01:07:09 +0000 (21:07 -0400)]
net/mlx4: fix a typo in header file

This patch fixes a trivial typo in mlx4 header file.

Fixes: 3d555728c933 ("net/mlx4: separate Rx/Tx definitions")
Cc: stable@dpdk.org
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
6 years agonet/virtio: move to new offloads API
Tiwei Bie [Fri, 9 Mar 2018 00:32:16 +0000 (08:32 +0800)]
net/virtio: move to new offloads API

Ethdev offloads API has changed since:

commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API")
commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
6 years agonet/i40e: fix flow RSS TCI use
Wei Zhao [Tue, 3 Apr 2018 06:09:36 +0000 (14:09 +0800)]
net/i40e: fix flow RSS TCI use

Vlan tci configuration from testpmd is stored in big endian, changing
it to little endian is needed before using it.

Fixes: ecad87d22383 ("net/i40e: move RSS to flow API")
Cc: stable@dpdk.org
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Tested-by: Yuan Peng <yuan.peng@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
6 years agonet/e1000: convert to new Tx offloads API
Wei Dai [Tue, 3 Apr 2018 02:54:56 +0000 (10:54 +0800)]
net/e1000: convert to new Tx offloads API

Ethdev Tx offloads API has changed since:
commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")
This commit support the new Tx offloads API.

Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
6 years agonet/e1000: convert to new Rx offloads API
Wei Dai [Tue, 3 Apr 2018 02:54:55 +0000 (10:54 +0800)]
net/e1000: convert to new Rx offloads API

Ethdev Rx offloads API has changed since:
commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API")
This commit support the new Rx offloads API.

Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
6 years agonet/ixgbe: convert to new Tx offloads API
Wei Dai [Thu, 22 Mar 2018 03:41:03 +0000 (11:41 +0800)]
net/ixgbe: convert to new Tx offloads API

Ethdev Tx offloads API has changed since:
commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")
This commit support the new Tx offloads API.

Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
6 years agonet/ixgbe: convert to new Rx offloads API
Wei Dai [Thu, 22 Mar 2018 03:41:02 +0000 (11:41 +0800)]
net/ixgbe: convert to new Rx offloads API

Ethdev Rx offloads API has changed since:
commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API")
This commit support the new Rx offloads API.

Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
6 years agonet/ixgbe: support VLAN strip per queue offloading in VF
Wei Dai [Thu, 22 Mar 2018 03:41:01 +0000 (11:41 +0800)]
net/ixgbe: support VLAN strip per queue offloading in VF

VLAN strip is a per queue offloading in VF. With this patch
it can be enabled or disabled on any Rx queue in VF.

Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
6 years agonet/ixgbe: support VLAN strip per queue offloading in PF
Wei Dai [Thu, 22 Mar 2018 03:41:00 +0000 (11:41 +0800)]
net/ixgbe: support VLAN strip per queue offloading in PF

VLAN strip is a per queue offloading in PF. With this patch
it can be enabled or disabled on any Rx queue in PF.

Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
6 years agonet/i40e: convert to new Tx offloads API
Yanglong Wu [Fri, 30 Mar 2018 08:22:12 +0000 (16:22 +0800)]
net/i40e: convert to new Tx offloads API

Ethdev Tx offloads API has changed since:
commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")
This commit support the new Tx offloads API.

Signed-off-by: Yanglong Wu <yanglong.wu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
6 years agonet/i40e: convert to new Rx offloads API
Yanglong Wu [Fri, 30 Mar 2018 08:22:11 +0000 (16:22 +0800)]
net/i40e: convert to new Rx offloads API

Ethdev Rx offloads API has changed since:
commit cba7f53b717d ("ethdev: introduce Rx queue offloads API")
This commit support the new Rx offloads API.

Signed-off-by: Yanglong Wu <yanglong.wu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
6 years agonet/avf: convert to new Rx and Tx offload API
Wenzhuo Lu [Thu, 1 Mar 2018 06:41:46 +0000 (14:41 +0800)]
net/avf: convert to new Rx and Tx offload API

Ethdev Tx offloads API has changed since:
commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")
This commit support the new Rx and Tx offloads API.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
6 years agonet/fm10k: convert to new Tx offloads API
Wei Dai [Wed, 28 Mar 2018 08:00:37 +0000 (16:00 +0800)]
net/fm10k: convert to new Tx offloads API

Ethdev Tx offloads API has changed since:
commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")
This commit support the new Rx and Tx offloads API.

Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
6 years agonet/fm10k: convert to new Rx offloads API
Wei Dai [Wed, 28 Mar 2018 08:00:36 +0000 (16:00 +0800)]
net/fm10k: convert to new Rx offloads API

Ethdev Rx offloads API has changed since:
commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API")
This commit support the new Rx offloads API.

Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
6 years agonet/virtio-user: fix port id type
Zhiyong Yang [Fri, 30 Mar 2018 08:31:50 +0000 (16:31 +0800)]
net/virtio-user: fix port id type

virtio-user port_id range should be increased from 8 bits to 16 bits.

Fixes: f8244c6399d9 ("ethdev: increase port id range")
Cc: stable@dpdk.org
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
6 years agovhost: add APIs for live migration
Zhihong Wang [Mon, 2 Apr 2018 11:46:56 +0000 (19:46 +0800)]
vhost: add APIs for live migration

This patch adds APIs to enable live migration for non-builtin data paths.

At src side, last_avail/used_idx from the device need to be set into the
virtio_net structure, and the log_base and log_size from the virtio_net
structure need to be set into the device.

At dst side, last_avail/used_idx need to be read from the virtio_net
structure and set into the device.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
6 years agovhost: adapt library for selective datapath
Zhihong Wang [Mon, 2 Apr 2018 11:46:55 +0000 (19:46 +0800)]
vhost: adapt library for selective datapath

This patch adapts vhost lib for selective datapath by calling device ops
at the corresponding stage.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
6 years agovhost: add APIs for datapath configuration
Zhihong Wang [Mon, 2 Apr 2018 11:46:54 +0000 (19:46 +0800)]
vhost: add APIs for datapath configuration

This patch adds APIs for datapath configuration.

The did of the vhost-user socket can be set to identify the backend device,
in this case each vhost-user socket can have only 1 connection. The did is
set to -1 by default when the software datapath is used.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
6 years agovhost: support selective datapath
Zhihong Wang [Mon, 2 Apr 2018 11:46:53 +0000 (19:46 +0800)]
vhost: support selective datapath

This patch set introduces support for selective datapath in DPDK vhost-user
lib. vDPA stands for vhost Data Path Acceleration. The idea is to support
virtio ring compatible devices to serve virtio driver directly to enable
datapath acceleration.

A set of device ops is defined for device specific operations:

     a. get_queue_num: Called to get supported queue number of the device.

     b. get_features: Called to get supported features of the device.

     c. get_protocol_features: Called to get supported protocol features of
        the device.

     d. dev_conf: Called to configure the actual device when the virtio
        device becomes ready.

     e. dev_close: Called to close the actual device when the virtio device
        is stopped.

     f. set_vring_state: Called to change the state of the vring in the
        actual device when vring state changes.

     g. set_features: Called to set the negotiated features to device.

     h. migration_done: Called to allow the device to response to RARP
        sending.

     i. get_vfio_group_fd: Called to get the VFIO group fd of the device.

     j. get_vfio_device_fd: Called to get the VFIO device fd of the device.

     k. get_notify_area: Called to get the notify area info of the queue.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
6 years agovhost: export vhost feature definitions
Zhihong Wang [Mon, 2 Apr 2018 11:46:52 +0000 (19:46 +0800)]
vhost: export vhost feature definitions

This patch exports vhost-user protocol features to support device driver
development.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
6 years agodoc: reduce initial offload API rework scope to drivers
Ferruh Yigit [Fri, 13 Apr 2018 21:20:59 +0000 (22:20 +0100)]
doc: reduce initial offload API rework scope to drivers

Do ethdev new offloading API switch in two steps.

In v18.05 target is implementing the new ethdev-PMD offload interface,
which means converting all PMDs to new offloading API.

Next target is removing the old ethdev offload API.
It will effect applications and will force them to implement new
offloading API.

Fixes: 3004d3454192 ("doc: update deprecation of ethdev offload API")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
6 years agohash: fix comment for lookup
Shreyansh Jain [Thu, 12 Apr 2018 12:33:58 +0000 (18:03 +0530)]
hash: fix comment for lookup

rte_hash_lookup_with_hash() has wrong comment for its 'sig' param.

Fixes: 1a9f648be291 ("hash: fix for multi-process apps")
Cc: stable@dpdk.org
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
6 years agoip_frag: fix double free of chained mbufs
Allain Legacy [Mon, 19 Mar 2018 14:25:23 +0000 (09:25 -0500)]
ip_frag: fix double free of chained mbufs

The first mbuf and the last mbuf to be visited in the preceding loop
are not set to NULL in the fragmentation table.  This creates the
possibility of a double free when the fragmentation table is later freed
with rte_ip_frag_table_destroy().

Fixes: 95908f52393d ("ip_frag: free mbufs on reassembly table destroy")
Cc: stable@dpdk.org
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
6 years agobus/fslmc: fix 64-bit format specifiers
Gowrishankar Muthukrishnan [Fri, 13 Apr 2018 11:22:30 +0000 (16:52 +0530)]
bus/fslmc: fix 64-bit format specifiers

Instead of llX, use C99 standard "PRIu64" in format specifier. Former one
breaks compile in ppc64le.

Fixes: c2c167fdb3 ("bus/fslmc: support memory event callbacks for VFIO")

Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
6 years agoapp/testpmd: enable device hotplug monitoring
Jeff Guo [Fri, 13 Apr 2018 08:30:40 +0000 (16:30 +0800)]
app/testpmd: enable device hotplug monitoring

Use testpmd for example, to show how an application uses device event
APIs to monitor the hotplug events, including both hot removal event
and hot insertion event.

The process is that, testpmd first enable hotplug by below commands,

E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug

then testpmd starts the device event monitor by calling the new API
(rte_dev_event_monitor_start) and register the user's callback by call
the API (rte_dev_event_callback_register), when device being hotplug
insertion or hotplug removal, the device event monitor detects the event
and call user's callbacks, user could process the event in the callback
accordingly.

This patch only shows the event monitoring, device attach/detach would
not be involved here, will add from other hotplug patch set.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
6 years agoeal/linux: add uevent parse and process
Jeff Guo [Fri, 13 Apr 2018 08:30:39 +0000 (16:30 +0800)]
eal/linux: add uevent parse and process

In order to handle the uevent which has been detected from the kernel
side, add uevent parse and process function to translate the uevent into
device event, which user has subscribed to monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
6 years agoeal: add device event monitor framework
Jeff Guo [Fri, 13 Apr 2018 08:30:38 +0000 (16:30 +0800)]
eal: add device event monitor framework

This patch aims to add a general device event monitor framework at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other types of device event
monitor, but not in this scope at the stage.

To get started, users firstly call below new added APIs to enable/disable
the device event monitor mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Then users shell register or unregister callbacks through the new added
APIs. Callbacks can be some device specific, or for all devices.
  -rte_dev_event_callback_register
  -rte_dev_event_callback_unregister

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could benefit further fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
6 years agoeal: add device event handle in interrupt thread
Jeff Guo [Fri, 13 Apr 2018 08:30:37 +0000 (16:30 +0800)]
eal: add device event handle in interrupt thread

Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
6 years agovfio: fix device hotplug when several devices per group
Anatoly Burakov [Tue, 10 Apr 2018 10:23:30 +0000 (11:23 +0100)]
vfio: fix device hotplug when several devices per group

We only need to perform DMA mapping for first device in first group.
At the time of mapping, we haven't yet added the device into the group,
so the count is expected to be zero.

Fixes: 810bfa64c673 ("vfio: fix index for tracking devices in a group")
Fixes: a9c349e3a100 ("vfio: fix device unplug when several devices per group")
Fixes: 94c0776b1bad ("vfio: support hotplug")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
6 years agovfio: export some internal functions
Hemant Agrawal [Thu, 12 Apr 2018 06:23:37 +0000 (11:53 +0530)]
vfio: export some internal functions

This patch moves some of the internal vfio functions from
eal_vfio.h to rte_vfio.h for common uses with "rte_" prefix.

This patch also change the FSLMC bus usages from the internal
VFIO functions to external ones with "rte_" prefix

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
6 years agodoc: add VFIO API in doxygen
Hemant Agrawal [Thu, 12 Apr 2018 06:23:36 +0000 (11:53 +0530)]
doc: add VFIO API in doxygen

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
6 years agomem: set fd to -1 for anonymous mmap
Neil Horman [Thu, 12 Apr 2018 11:16:40 +0000 (07:16 -0400)]
mem: set fd to -1 for anonymous mmap

https://dpdk.org/tracker/show_bug.cgi?id=18

Indicated that several mmap call sites in the [linux|bsd]app eal code
set fd that was not -1 in their calls while using MAP_ANONYMOUS.  While
probably not a huge deal, the man page does say the fd should be -1 for
portability, as some implementations don't ignore fd as they should for
MAP_ANONYMOUS.

Suggested-by: Solal Pirelli <solal.pirelli@gmail.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
6 years agobus/fslmc: configure separate portal for Ethernet Rx
Nipun Gupta [Mon, 9 Apr 2018 10:22:51 +0000 (15:52 +0530)]
bus/fslmc: configure separate portal for Ethernet Rx

In case of Receive from Ethernet we add a new pull request (prefetch)
but do not fetch the results from that pull request until next
dequeue operation. This keeps the portal in busy mode.

This patch updates the portals bifurcation to have separate portals
to receive packets for Ethernet and all other devices to use a
common portal.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
6 years agonet/dpaa2: fix xstats
Hemant Agrawal [Mon, 9 Apr 2018 10:22:50 +0000 (15:52 +0530)]
net/dpaa2: fix xstats

Fixes: 1d6329b2fc1f ("net/dpaa2: support extra stats")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
6 years agonet/dpaa: update checksum for external pool obj
Akhil Goyal [Mon, 9 Apr 2018 10:22:49 +0000 (15:52 +0530)]
net/dpaa: update checksum for external pool obj

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
6 years agobus/dpaa: fix resource leak
Hemant Agrawal [Mon, 9 Apr 2018 10:22:48 +0000 (15:52 +0530)]
bus/dpaa: fix resource leak

Coverity issue: 268337
Fixes: 1459585888b5 ("bus/dpaa: fix memory allocation during scan")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
6 years agonet/dpaa: fix oob access
Hemant Agrawal [Mon, 9 Apr 2018 10:22:47 +0000 (15:52 +0530)]
net/dpaa: fix oob access

Coverity issue: 268318
Fixes: b21ed3e2a16d ("net/dpaa: support extended statistics")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
6 years agonet/dpaa: fix array overrun
Hemant Agrawal [Mon, 9 Apr 2018 10:22:46 +0000 (15:52 +0530)]
net/dpaa: fix array overrun

Coverity issue: 268342
Fixes: 62f53995caaf ("net/dpaa: add frame count based tail drop with CGR")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
6 years agobus/dpaa: fix unchecked return value
Sunil Kumar Kori [Mon, 9 Apr 2018 10:22:45 +0000 (15:52 +0530)]
bus/dpaa: fix unchecked return value

Coverity issue: 268323
Fixes: 5d944582d028 ("bus/dpaa: check portal presence in the caller function")
Cc: stable@dpdk.org
Signed-off-by: Sunil Kumar Kori <sunil.kori@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
6 years agobus/dpaa: fix resource leak
Sunil Kumar Kori [Mon, 9 Apr 2018 10:22:44 +0000 (15:52 +0530)]
bus/dpaa: fix resource leak

Coverity issue: 268332
Fixes: 9d32ef0f5d61 ("bus/dpaa: support creating dynamic HW portal")
Cc: stable@dpdk.org
Signed-off-by: Sunil Kumar Kori <sunil.kori@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
6 years agombuf: remove control mbuf
Olivier Matz [Tue, 3 Apr 2018 13:39:13 +0000 (15:39 +0200)]
mbuf: remove control mbuf

The rte_ctrlmbuf structure is not used by any example application
in dpdk. Remove it, as announced on the mailing list.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
6 years agoigb_uio: bind error if PCIe bridge
Darren Edamura [Thu, 29 Mar 2018 16:37:35 +0000 (09:37 -0700)]
igb_uio: bind error if PCIe bridge

Probe function should exit immediately if pcie bridge detected

Signed-off-by: Darren Edamura <darren.edamura@broadcom.com>
Signed-off-by: Rahul Gupta <rahul.gupta@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
6 years agoeal: fix ARM build with clang
Pavan Nikhilesh [Wed, 11 Apr 2018 17:01:50 +0000 (22:31 +0530)]
eal: fix ARM build with clang

Use __atomic_exchange_n instead of __atomic_exchange_(2/4/8).

The error was:
include/generic/rte_atomic.h:215:9: error:
implicit declaration of function '__atomic_exchange_2'
is invalid in C99
include/generic/rte_atomic.h:494:9: error:
implicit declaration of function '__atomic_exchange_4'
is invalid in C99
include/generic/rte_atomic.h:772:9: error:
implicit declaration of function '__atomic_exchange_8'
is invalid in C99

Fixes: ff2863570fcc ("eal: introduce atomic exchange operation")

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
6 years agomem: prevent preallocated pages from being freed
Anatoly Burakov [Wed, 11 Apr 2018 12:30:45 +0000 (13:30 +0100)]
mem: prevent preallocated pages from being freed

It is common sense to expect for DPDK process to not deallocate any
pages that were preallocated by "-m" or "--socket-mem" flags - yet,
currently, DPDK memory subsystem will do exactly that once it finds
that the pages are unused.

Fix this by marking pages as unfreebale, and preventing malloc from
ever trying to free them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
6 years agomalloc: enable validation before new page allocation
Anatoly Burakov [Wed, 11 Apr 2018 12:30:44 +0000 (13:30 +0100)]
malloc: enable validation before new page allocation

Before allocating a new page, give a chance to the user to
allow or deny allocation via callbacks.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
6 years agomem: add validator callback
Anatoly Burakov [Wed, 11 Apr 2018 12:30:43 +0000 (13:30 +0100)]
mem: add validator callback

This API will enable application to register for notifications
on page allocations that are about to happen, giving the application
a chance to allow or deny the allocation when total memory utilization
as a result would be above specified limit on specified socket.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
6 years agoeal: enable non-legacy memory mode
Anatoly Burakov [Wed, 11 Apr 2018 12:30:42 +0000 (13:30 +0100)]
eal: enable non-legacy memory mode

Now that every other piece of the puzzle is in place, enable non-legacy
init mode.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
6 years agobus/fslmc: support memory event callbacks for VFIO
Anatoly Burakov [Wed, 11 Apr 2018 12:30:41 +0000 (13:30 +0100)]
bus/fslmc: support memory event callbacks for VFIO

VFIO needs to map and unmap segments for DMA whenever they
become available or unavailable, so register a callback for
memory events, and provide map/unmap functions.

Remove unneeded check for number of segments, as in non-legacy
mode this now becomes a valid scenario.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
6 years agobus/fslmc: move VFIO DMA map into bus probe
Anatoly Burakov [Wed, 11 Apr 2018 12:30:40 +0000 (13:30 +0100)]
bus/fslmc: move VFIO DMA map into bus probe

fslmc bus needs to map all allocated memory for VFIO before
device probe. This bus doesn't support hotplug, so at the time
of this call, all possible device that could be present, are
present. This will also be the place where we install VFIO
callback, although this change will come in the next patch.

Since rte_fslmc_vfio_dmamap() is now only called at bus probe,
there is no longer any need to check if DMA mappings have been
already done.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
6 years agovfio: support memory event callbacks
Anatoly Burakov [Wed, 11 Apr 2018 12:30:39 +0000 (13:30 +0100)]
vfio: support memory event callbacks

Enable callbacks on first device attach, disable callbacks
on last device attach.

PPC64 IOMMU does memseg walk, which will cause a deadlock on
trying to do it inside a callback, so provide a local,
thread-unsafe copy of memseg walk.

PPC64 IOMMU also may remap the entire memory map for DMA while
adding new elements to it, so change user map list lock to a
recursive lock. That way, we can safely enter rte_vfio_dma_map(),
lock the user map list, enter DMA mapping function and lock the
list again (for reading previously existing maps).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
6 years agomalloc: enable callbacks on alloc/free and mp sync
Anatoly Burakov [Wed, 11 Apr 2018 12:30:38 +0000 (13:30 +0100)]
malloc: enable callbacks on alloc/free and mp sync

Callbacks will be triggered just after allocation and just
before deallocation, to ensure that memory address space
referenced in the callback is always valid by the time
callback is called.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
6 years agomalloc: support callbacks on memory events
Anatoly Burakov [Wed, 11 Apr 2018 12:30:37 +0000 (13:30 +0100)]
malloc: support callbacks on memory events

Each process will have its own callbacks. Callbacks will indicate
whether it's allocation and deallocation that's happened, and will
also provide start VA address and length of allocated block.

Since memory hotplug isn't supported on FreeBSD and in legacy mem
mode, it will not be possible to register them in either.

Callbacks are called whenever something happens to the memory map of
current process, therefore at those times memory hotplug subsystem
is write-locked, which leads to deadlocks on attempt to use these
functions. Document the limitation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
6 years agomalloc: support multiprocess memory hotplug
Anatoly Burakov [Wed, 11 Apr 2018 12:30:36 +0000 (13:30 +0100)]
malloc: support multiprocess memory hotplug

This enables multiprocess synchronization for memory hotplug
requests at runtime (as opposed to initialization).

Basic workflow is the following. Primary process always does initial
mapping and unmapping, and secondary processes always follow primary
page map. Only one allocation request can be active at any one time.

When primary allocates memory, it ensures that all other processes
have allocated the same set of hugepages successfully, otherwise
any allocations made are being rolled back, and heap is freed back.
Heap is locked throughout the process, and there is also a global
memory hotplug lock, so no race conditions can happen.

When primary frees memory, it frees the heap, deallocates affected
pages, and notifies other processes of deallocations. Since heap is
freed from that memory chunk, the area basically becomes invisible
to other processes even if they happen to fail to unmap that
specific set of pages, so it's completely safe to ignore results of
sync requests.

When secondary allocates memory, it does not do so by itself.
Instead, it sends a request to primary process to try and allocate
pages of specified size and on specified socket, such that a
specified heap allocation request could complete. Primary process
then sends all secondaries (including the requestor) a separate
notification of allocated pages, and expects all secondary
processes to report success before considering pages as "allocated".

Only after primary process ensures that all memory has been
successfully allocated in all secondary process, it will respond
positively to the initial request, and let secondary proceed with
the allocation. Since the heap now has memory that can satisfy
allocation request, and it was locked all this time (so no other
allocations could take place), secondary process will be able to
allocate memory from the heap.

When secondary frees memory, it hides pages to be deallocated from
the heap. Then, it sends a deallocation request to primary process,
so that it deallocates pages itself, and then sends a separate sync
request to all other processes (including the requestor) to unmap
the same pages. This way, even if secondary fails to notify other
processes of this deallocation, that memory will become invisible
to other processes, and will not be allocated from again.

So, to summarize: address space will only become part of the heap
if primary process can ensure that all other processes have
allocated this memory successfully. If anything goes wrong, the
worst thing that could happen is that a page will "leak" and will
not be available to neither DPDK nor the system, as some process
will still hold onto it. It's not an actual leak, as we can account
for the page - it's just that none of the processes will be able
to use this page for anything useful, until it gets allocated from
by the primary.

Due to underlying DPDK IPC implementation being single-threaded,
some asynchronous magic had to be done, as we need to complete
several requests before we can definitively allow secondary process
to use allocated memory (namely, it has to be present in all other
secondary processes before it can be used). Additionally, only
one allocation request is allowed to be submitted at once.

Memory allocation requests are only allowed when there are no
secondary processes currently initializing. To enforce that,
a shared rwlock is used, that is set to read lock on init (so that
several secondaries could initialize concurrently), and write lock
on making allocation requests (so that either secondary init will
have to wait, or allocation request will have to wait until all
processes have initialized).

Any other function that wishes to iterate over memory or prevent
allocations should be using memory hotplug lock.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>