Andrew Rybchenko [Tue, 29 Nov 2016 16:19:03 +0000 (16:19 +0000)]
net/sfc: add init on attach
The setup and configuration of the PMD is not performance sensitive,
but is not thread safe either. It is possible that the multiple
read/writes during PMD setup and configuration could be corrupted
in a multi-thread environment. Since this is not performance
sensitive, the developer can choose to add their own layer to provide
thread-safe setup and configuration. It is expected that, in most
applications, the initial configuration of the network ports would be
done by a single thread at startup.
In the case of exception on the event queue, the event queue and
corresponding Rx/Tx queue should be restarted in the Rx/Tx queue
polling context. These operations require access to the device
control which should be serialized. The device level lock will do
the job.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andy Moreton <amoreton@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Andrew Rybchenko [Tue, 29 Nov 2016 16:18:58 +0000 (16:18 +0000)]
net/sfc/base: import NVRAM support
Provide API to work with NIC non-volatile memory. It is used
to update firmware, configure NIC including bootrom parameters,
manage licenses, store PCI Vital Product Data etc.
EFSYS_OPT_NVRAM should be enabled to use it.
From Solarflare Communications Inc.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Andrew Rybchenko [Tue, 29 Nov 2016 16:18:57 +0000 (16:18 +0000)]
net/sfc/base: import Rx packed stream mode
In packed stream mode, large buffers are provided to the NIC
into which many packets can be delivered. This reduces the
number of queue refills needed compared to delivering every
packet into a separate buffer.
EFSYS_OPT_RX_PACKED_STREAM should be enabled to use it.
From Solarflare Communications Inc.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Andrew Rybchenko [Tue, 29 Nov 2016 16:18:50 +0000 (16:18 +0000)]
net/sfc/base: import MAC statistics
MAC statistics are either periodically (if supported/requested)
or on-demand written to provided DMA-mapped memory.
If periodic update is not supported (e.g. for EF10 virtual
functions), it is the driver responsibility to handle it.
EFSYS_OPT_MAC_STATS should be enabled to use it.
From Solarflare Communications Inc.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Andrew Rybchenko [Tue, 29 Nov 2016 16:18:40 +0000 (16:18 +0000)]
net/sfc/base: import MCDI proxy authorization
MCDI proxy authorization may be used if privileged PCI
function (physical function) would like to intercept and
authorize MCDI requests done by unprivileged (e.g. virtual)
PCI function. It may be used to control unprivileged
function Rx mode (e.g. promiscuous, all-multicast), MTU
and default MAC address change requests etc.
Current libefx support is limited to client-side which
is required to work when function requests need to be
authorized.
Server side support required to request and do the
authorization is not implemented yet.
EFSYS_OPT_MCDI_PROXY_AUTH should be enabled to use it.
From Solarflare Communications Inc.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Andrew Rybchenko [Tue, 29 Nov 2016 16:18:38 +0000 (16:18 +0000)]
net/sfc/base: import MCDI implementation
Implement interface to talk to NIC management CPU. Provide
helpers to fill in MCDI requests, execute it and process
received response.
MCDI request is prepared in either PCI BAR mapped memory
(SFN5xxx/SFN6xxx) or DMA-mapped memory (SFN7xxx/SFN8xxx) and,
doorbell is pressed (memory-mapped register) to execute it.
Events about MCDI completion are delivered to house-keeping
event queue, but usage of these events is optional and MCDI
buffer may be simply polled waiting for completion
indication set.
From Solarflare Communications Inc.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Andrew Rybchenko [Tue, 29 Nov 2016 16:18:36 +0000 (16:18 +0000)]
net/sfc/base: import filters support
Filtering capabilities depend on NIC family and used firmware
variant. Provided API allows to get supported filter types
(in a priority order), add/delete individual filters and
restore entire filter table after, for example, NIC management
CPU reboot.
Rx filters allow to redirect matching flow to specified Rx queue.
Tx filters allow to control generated traffic (e.g. to implement
virtual function anti-spoofing control).
EFSYS_OPT_FILTER should be enabled to use it. It is required
for SFN7xxx and SFN8xxx adapter families support.
From Solarflare Communications Inc.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Andrew Rybchenko [Tue, 29 Nov 2016 16:18:34 +0000 (16:18 +0000)]
net/sfc/base: import libefx base
libefx is a platform-independent library to implement drivers
for Solarflare network adapters. It provides unified adapter
family independent interface (if possible).
Driver must provide efsys.h header which defines options
(EFSYS_OPT_*) to be used and macros/functions to allocate
memory, read/write DMA-mapped memory, read/write PCI BAR
space, locks, barriers etc.
efx.h and efx_types.h provide external interfaces intended
to be used by drivers. Other header files are internal.
From Solarflare Communications Inc.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Jingjing Wu [Wed, 30 Nov 2016 02:02:25 +0000 (10:02 +0800)]
net/i40evf: fix casting between structs
Casting from structs which lay out data in typed members
to structs which have flat memory buffers, will cause
problems if the alignment of the former isn't as expected.
This patch removes the casting between structs.
Wenzhuo Lu [Wed, 23 Nov 2016 17:22:56 +0000 (12:22 -0500)]
net/e1000/base: add workaround for possible stalled packet
This works around a possible stalled packet issue, which may occur due to
clock recovery from the PCH being too slow, when the LAN is transitioning
from K1 at 1G link speed.
Wenzhuo Lu [Wed, 23 Nov 2016 17:22:52 +0000 (12:22 -0500)]
net/e1000/base: clear ULP configuration register on ULP exit
There are some client PHY Ultra Low Power (ULP) register bits that are
configured by the Manageability Engine (ME) FW.
The driver must ensure that these bits are cleared on exit from ULP.
Ordinarily the ME FW would do that, but there are cases in which the
FW is not present, and the driver must handle that.
Wenzhuo Lu [Wed, 23 Nov 2016 17:22:47 +0000 (12:22 -0500)]
net/e1000/base: retry to get HW mailbox lock
The driver shouldn't give up if it fails to get the hardware mailbox lock.
This can happen in a situation where the PF-VF communication channel is
heavily loaded and causes complete communications failure between the PF
and VF drivers.
Add a counter and a delay. The driver will now retry ten times,
waiting one millisecond between retries.
David Marchand [Mon, 21 Nov 2016 18:06:13 +0000 (19:06 +0100)]
net: remove dead driver names
Since commit b1fb53a39d88 ("ethdev: remove some PCI specific handling"),
rte_eth_dev_info_get() relies on dev->data->drv_name to report the driver
name to caller.
Having the pmds set driver_info->driver_name in the pmds is useless,
since ethdev overwrites it right after.
The only thing the pmd must do is:
- for pci drivers, call rte_eth_copy_pci_info() which then sets
data->drv_name
- for vdev drivers, manually set data->drv_name
At this stage, virtio-user does not properly report a driver name (fixed in
next commit).
Signed-off-by: David Marchand <david.marchand@6wind.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Jan Blunck <jblunck@infradead.org>
The list of segments to free was wrongly manipulated ending by only freeing
the first segment instead of freeing all of them. The last one still
belongs to the NIC and thus should not be freed.
Fixes: a1bdb71a32da ("net/mlx5: fix crash in Rx") Reported-by: Liming Sun <lsun@mellanox.com> Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Remy Horton [Mon, 14 Nov 2016 06:14:48 +0000 (14:14 +0800)]
net/i40e: fix xstats value mapping
The offsets used in rte_i40evf_stats_strings for transmission
statistics were wrong, returning the total byte count rather than
the respective (unicast, multicast, broadcast, drop, & error)
packet counts.
When the virtio PMD is used on top of a vhost that does not support
offloads, Rx offload capabilities are still advertised by
virtio_dev_info_get(). But if an application tries to start the PMD with
Rx offloads enabled (rxmode.hw_ip_checksum = 1), the initialization of
the device will fail with -ENOTSUP and the following log:
rx ip checksum not available on this host
This patch fixes the Rx offload capabilities returned by
virtio_dev_info_get() to be consistent with features advertised by the
host.
Fixes: 96cb6711939e ("net/virtio: support Rx checksum offload") Fixes: 86d59b21468a ("net/virtio: support LRO") Cc: stable@dpdk.org Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tomasz Kulasek [Fri, 16 Dec 2016 15:15:16 +0000 (16:15 +0100)]
examples/performance-thread: add packet type parsing
Last changes in Niantic and Fortville NIC drivers causes that
vector Rx path is chosen by default in l3fwd-thread application.
This path doesn't support propagation of hw packet type recognition
to the packet_type field in mbuf, and packets cannot be classified
properly.
The approach to solve this problem is similar to the commit: 71a7e2424e07 ("examples/l3fwd: fix using packet type blindly").
To use sw packet analyzer, new command line option "--parse-ptype" is
introduced.
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Anand B Jyoti [Sun, 8 Jan 2017 21:55:49 +0000 (03:25 +0530)]
examples/ip_pipeline: check VLAN and MPLS parameters
This commit add to CLI command check for the following errors
1. SVLAN and CVLAN IDs greater than 12 bits
2. MPLS ID greater than 20 bits
3. max number of supported MPLS labels to avoid array overflow
It prevents running CLI commands with invalid parameters.
Signed-off-by: Anand B Jyoti <anand.b.jyoti@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Olivier Matz [Tue, 22 Nov 2016 13:52:15 +0000 (14:52 +0100)]
examples/l3fwd: rework long options parsing
Avoid the use of several strncpy() since getopt is able to
map a long option with an id, which can be matched in the
same switch/case than short options.
Olivier Matz [Tue, 22 Nov 2016 13:52:16 +0000 (14:52 +0100)]
examples/l2fwd: rework long options parsing
Do the same than in l3fwd to avoid strcmp() for long options.
For l2fwd, there is no long option that take advantage of this new
mechanism as --mac-updating and --no-mac-updating are directly setting a
flag without needing an entry in the switch/case.
So this patch just prepares the framework in case a new long option is
added in the future.
Yongseok Koh [Fri, 6 Jan 2017 22:40:10 +0000 (14:40 -0800)]
doc: fix links to Linux in contribution guide
A referenced document in the Linux Kernel has been moved to a
sub-directory. And kernel community has moved to RST/Sphinx. The links are
replaced with HTML rendered links.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: John McNamara <john.mcnamara@intel.com>
Pablo de Lara [Mon, 19 Dec 2016 16:34:12 +0000 (16:34 +0000)]
doc: simplify l3fwd example guide
L3 Forwarding sample app user guides have some inconsistencies
between the example command line and the configuration table.
Also, they were showing too complicated configuration, using two
different NUMA nodes for two ports, which will probably lead
to performance drop due to use cross-socket channel.
This patch simplifies the configuration of these examples,
by using a single NUMA node and a single queue per port.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>
Zhihong Wang [Wed, 7 Dec 2016 01:31:06 +0000 (20:31 -0500)]
eal: optimize aligned memcpy on x86
This patch optimizes rte_memcpy for well aligned cases, where both
dst and src addr are aligned to maximum MOV width. It introduces a
dedicated function called rte_memcpy_aligned to handle the aligned
cases with simplified instruction stream. The existing rte_memcpy
is renamed as rte_memcpy_generic. The selection between them 2 is
done at the entry of rte_memcpy.
The existing rte_memcpy is for generic cases, it handles unaligned
copies and make store aligned, it even makes load aligned for micro
architectures like Ivy Bridge. However alignment handling comes at
a price: It adds extra load/store instructions, which can cause
complications sometime.
DPDK Vhost memcpy with Mergeable Rx Buffer feature as an example:
The copy is aligned, and remote, and there is header write along
which is also remote. In this case the memcpy instruction stream
should be simplified, to reduce extra load/store, therefore reduce
the probability of load/store buffer full caused pipeline stall, to
let the actual memcpy instructions be issued and let H/W prefetcher
goes to work as early as possible.
This patch is tested on Ivy Bridge, Haswell and Skylake, it provides
up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging
from 64 to 1500 bytes.
The test can also be conducted without NIC, by setting loopback
traffic between Virtio and Vhost. For example, modify the macro
TXONLY_DEF_PACKET_LEN to the requested packet size in testpmd.h,
rebuild and start testpmd in both host and guest, then "start" on
one side and "start tx_first 32" on the other.
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com>
Jianfeng Tan [Tue, 17 Jan 2017 07:10:29 +0000 (07:10 +0000)]
examples/l3fwd-power: add --parse-ptype option
To support those devices that do not provide packet type info when
receiving packets, add a new option, --parse-ptype, to analyze
packet type in the Rx callback.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com>
Jianfeng Tan [Tue, 17 Jan 2017 08:00:03 +0000 (08:00 +0000)]
net/virtio: setup Rx queue interrupts
This patch mainly allocates structure to store queue/irq mapping,
and configure queue/irq mapping down through PCI ops. It also creates
eventfds for each Rx queue and tell the kernel about the eventfd/intr
binding.
Note: So far, we hard-code 1:1 queue/irq mapping (each rx queue has
one exclusive interrupt), like this:
vec 0 -> config irq
vec 1 -> rxq0
vec 2 -> rxq1
...
which means, the "vectors" option of QEMU should be configured with
a value >= N+1 (N is the number of the queue pairs).
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch implements interrupt enable/disable functions for each
Rx queue. And we rely on flags of avail queue as the hint for virtio
device to interrupt virtio driver or not.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Jianfeng Tan [Tue, 17 Jan 2017 07:10:22 +0000 (07:10 +0000)]
net/virtio: invoke method directly for setting IRQ config
We need to define a prototype for such wrapper, which makes thing
too complicated. Remove wrapper and call set_config_irq directly.
Suggested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Jianfeng Tan [Tue, 17 Jan 2017 07:10:21 +0000 (07:10 +0000)]
net/virtio: fix rewriting LSC flag
The LSC flag is decided according to if VIRTIO_NET_F_STATUS feature
is negotiated. Copy the PCI info after the judgement will rewrite
the correct result.
Fixes: 198ab33677c9 ("net/virtio: move device initialization in a function") CC: stable@dpdk.org Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Tested-by: Lei Yao <lei.a.yao@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Jianfeng Tan [Fri, 13 Jan 2017 12:18:40 +0000 (12:18 +0000)]
net/virtio-user: enable multiqueue with kernel vhost
With vhost kernel, to enable multiqueue, we need backend device
in kernel support multiqueue feature. Specifically, with tap
as the backend, as linux/Documentation/networking/tuntap.txt shows,
we check if tap supports IFF_MULTI_QUEUE feature.
And for vhost kernel, each queue pair has a vhost fd, and with a tap
fd binding this vhost fd. All tap fds are set with the same tap
interface name.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Jianfeng Tan [Fri, 13 Jan 2017 12:18:39 +0000 (12:18 +0000)]
net/virtio-user: enable offloading
When used with vhost kernel backend, we can offload at both directions.
- From vhost kernel to virtio_user, the offload is enabled so that
DPDK app can trust the flow is checksum-correct; and if DPDK app
sends it through another port, the checksum needs to be
recalculated or offloaded. It also applies to TSO.
- From virtio_user to vhost_kernel, the offload is enabled so that
kernel can trust the flow is L4-checksum-correct, no need to verify
it; if kernel will consume it, DPDK app should make sure the
l3-checksum is correctly set.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Jianfeng Tan [Fri, 13 Jan 2017 12:18:38 +0000 (12:18 +0000)]
net/virtio-user: support kernel vhost
This patch add support vhost kernel as the backend for virtio_user.
Three main hook functions are added:
- vhost_kernel_setup() to open char device, each vq pair needs one
vhostfd;
- vhost_kernel_ioctl() to communicate control messages with vhost
kernel module;
- vhost_kernel_enable_queue_pair() to open tap device and set it
as the backend of corresonding vhost fd (that is to say, vq pair).
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Jianfeng Tan [Fri, 13 Jan 2017 12:18:37 +0000 (12:18 +0000)]
net/virtio-user: abstract backend operations
Add a struct virtio_user_backend_ops to abstract three kinds of backend
operations:
- setup, create the unix socket connection;
- send_request, sync messages with backend;
- enable_qp, enable some queue pair.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Jianfeng Tan [Fri, 13 Jan 2017 12:18:36 +0000 (12:18 +0000)]
net/virtio-user: move vhost-user specific code
To support vhost kernel as the backend of net_virtio_user in coming
patches, we move vhost_user specific structs and macros into
vhost_user.c, and only keep common definitions in vhost.h.
Besides, remove VHOST_USER_MQ feature check.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Jianfeng Tan [Fri, 13 Jan 2017 12:18:35 +0000 (12:18 +0000)]
net/virtio-user: fix not properly reset device
virtio_user is not properly reset when users call vtpci_reset(),
as it ignores VIRTIO_CONFIG_STATUS_RESET status in
virtio_user_set_status().
This might lead to initialization failure as it starts to re-init
the device before sending RESET messege to backend. Besides, previous
callfds and kickfds are not closed.
To fix it, we add support to disable virtqueues when it's set to
DRIVER OK status, and re-init fields in struct virtio_user_dev.
Fixes: e9efa4d93821 ("net/virtio-user: add new virtual PCI driver") Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer") Cc: stable@dpdk.org Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Jianfeng Tan [Fri, 13 Jan 2017 12:18:34 +0000 (12:18 +0000)]
net/virtio-user: fix wrongly get/set features
Before the commit 86d59b21468a ("net/virtio: support LRO"), features
in virtio PMD, is decided and properly set at device initialization
and will not be changed. But afterward, features could be changed in
virtio_dev_configure(), and will be re-negotiated if it's changed.
In virtio-user, device features is obtained at driver probe phase
only once, but we did not store it. So the added feature bits in
re-negotiation will fail.
To fix it, we store it down, and will be used to feature negotiation
either at device initialization phase or device configure phase.
Fixes: e9efa4d93821 ("net/virtio-user: add new virtual PCI driver") Cc: stable@dpdk.org Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Yuanhan Liu [Thu, 12 Jan 2017 05:37:00 +0000 (13:37 +0800)]
net/virtio: do not store PCI device pointer at shared memory
hw->dev, a pointer to pci_dev, was actually not used, until the
refactor of decouping from PCI device. This would somehow break
the multiple process again, since "hw" is stored at shared memory,
while "pci_dev" is not: the primary and secondary process could
have different address for it, while just one value is allowed.
Thus we should not store it to "hw", instead, we could retrieve
it from the "eth_dev->device" field.
Fixes: ae34410a8a8a ("ethdev: move info filling of PCI into drivers") Fixes: eac901ce29be ("ethdev: decouple from PCI device") Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Yuanhan Liu [Thu, 12 Jan 2017 05:31:57 +0000 (13:31 +0800)]
net/virtio: access interrupt handler directly
Since commit 0e1b45a284b4 ("ethdev: decouple interrupt handling from
PCI device"), intr_handle is stored at eth_dev struct, that we could
use it directly. Thus there is no need to get it from hw.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Yuanhan Liu [Fri, 6 Jan 2017 10:16:19 +0000 (18:16 +0800)]
net/virtio: fix multiple process support
The introduce of virtio 1.0 support brings yet another set of ops, badly,
it's not handled correctly, that it breaks the multiple process support.
The issue is the data/function pointer may vary from different processes,
and the old used to do one time set (for primary process only). That
said, the function pointer the secondary process saw is actually from the
primary process space. Accessing it could likely result to a crash.
Kudos to the last patches, we now be able to maintain those info that may
vary among different process locally, meaning every process could have its
own copy for each of them, with the correct value set. And this is what
this patch does:
- remap the PCI (IO port for legacy device and memory map for modern
device)
- set vtpci_ops correctly
After that, multiple process would work like a charm. (At least, it
passed my fuzzy test)
Fixes: b8f04520ad71 ("virtio: use PCI ioport API") Fixes: d5bbeefca826 ("virtio: introduce PCI implementation structure") Fixes: 6ba1f63b5ab0 ("virtio: support specification 1.0") Cc: stable@dpdk.org Reported-by: Juho Snellman <jsnell@iki.fi> Reported-by: Yaron Illouz <yaroni@radcom.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>