dpdk.git
8 years agonet/mlx: fix compilation with glibc 2.20
Adrien Mazarguil [Mon, 20 Jun 2016 13:31:46 +0000 (15:31 +0200)]
net/mlx: fix compilation with glibc 2.20

Since _BSD_SOURCE was deprecated in favor of _DEFAULT_SOURCE in Glibc 2.19
and entirely removed in 2.20, various BSD ioctl macros are not exposed
anymore when _XOPEN_SOURCE is defined, and linux/if.h now conflicts with
net/if.h.

Add _DEFAULT_SOURCE and keep _BSD_SOURCE for compatibility with older
versions.

Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
8 years agonet/pcap: fix crash on close
Reshma Pattan [Fri, 27 May 2016 12:06:20 +0000 (13:06 +0100)]
net/pcap: fix crash on close

Testpmd application will crash in fclose() upon quit after running
the below command.

"sudo gdb --args ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf0 -n 4
          --vdev 'eth_pcap0,tx_iface=enp1s0f1,rx_pcap=/tmp/test.pcap' --
          --port-topology=chained -i"

The reason is, pcap vdev creation with tx stream type as "iface"
as in above command doesn't need member "dumpers" of
"struct tx_pcaps", hence will not have memory allocated for it.
It contains a garbage values, as local object of struct tx_pcaps
is not initialized to 0 inside rte_pmd_pcap_dev_init().
So calling pcap_dump_close() on dumper as part of eth_dev_stop()
is causing segfault in fclose().

Fix is to initialize local object of struct tx_pcaps to 0.
Also initialize local object of struct rx_pcaps to 0.
So during eth_dev_stop(), pcap_dump_close() will not be called if dumper
is NULL.

Fixes: 4c173302c307 ("pcap: add new driver")

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
8 years agonet/enic: fix Tx IP and UDP/TCP checksum offload
John Daley [Fri, 3 Jun 2016 00:22:57 +0000 (17:22 -0700)]
net/enic: fix Tx IP and UDP/TCP checksum offload

Private/conflicting ol_flags where used to enable UDP/TCP Tx
offloads. Use the common flags in PKT_TX_L4_MASK to support them.
When updating flags, also do some minor code rearranging for
slightly better performane.

Fixes: fefed3d1e62c ("enic: new driver")
Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/enic: expand local Tx mbuf flags variable to 64-bits
John Daley [Fri, 3 Jun 2016 00:22:56 +0000 (17:22 -0700)]
net/enic: expand local Tx mbuf flags variable to 64-bits

The offload flags variable (ol_flags) in rte_mbuf structure is 64-bits,
so local copy of it must be 64-bits too. Moreover bit comparison between
16-bits variable and 64-bits value make no sense. This breaks Tx vlan
IP and L4 offloads.

Coverity issue: 13218
Fixes: fefed3d1e62c ("enic: new driver")

Suggested-by: Piotr Azarewicz <piotrx.t.azarewicz@intel.com>
Signed-off-by: John Daley <johndale@cisco.com>
Acked-by: Piotr Azarewicz <piotrx.t.azarewicz@intel.com>
8 years agonet/enic: add an assert macro
John Daley [Fri, 3 Jun 2016 00:22:55 +0000 (17:22 -0700)]
net/enic: add an assert macro

Add an ASSERT macro for the enic driver which is enabled when the log
level is >= RTE_LOG_DEBUG. Assert that number of mbufs to return to
the pool in the Tx function is never greater than the max allowed.

Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/enic: remove unused code
John Daley [Fri, 3 Jun 2016 00:22:54 +0000 (17:22 -0700)]
net/enic: remove unused code

Remove some files, functions and variables left unused after
Tx performance improvements.

Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/enic: optimize the Tx function
John Daley [Fri, 3 Jun 2016 00:22:53 +0000 (17:22 -0700)]
net/enic: optimize the Tx function

Reduce host CPU overhead of Tx packet processing:
* Use local variables inside per-packet loop instead of fields in structs.
* Factor book keeping and conditionals out of the per-packet loop where
  possible.
* Post buffers to the nic at a maximum of every 64 packets

Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/enic: refactor Tx mbuf recycling
John Daley [Fri, 3 Jun 2016 00:22:52 +0000 (17:22 -0700)]
net/enic: refactor Tx mbuf recycling

Mbufs were returned to the pool one at a time. Use rte_mempool_put_bulk
instead. There were multiple function calls for each buffer returned.
Refactor this code into just 2 functions.

Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/enic: use Tx completion index instead of messages
John Daley [Fri, 3 Jun 2016 00:22:51 +0000 (17:22 -0700)]
net/enic: use Tx completion index instead of messages

The NIC can either DMA a separate completion message for each completed
send or periodically just DMA the index of the last completed send.
Switch to the latter method which improves cache locality and performance.

Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/enic: streamline mbuf handling in Tx path
John Daley [Fri, 3 Jun 2016 00:22:50 +0000 (17:22 -0700)]
net/enic: streamline mbuf handling in Tx path

The list of mbufs held by the driver on Tx was allocated in chunks
(a hold-over from the enic kernel mode driver). The structure used
next pointers across chunks which led to cache misses.

Allocate the array used to hold mbufs in flight on Tx with
rte_zmalloc_socket(). Remove unnecessary fields from the structure
and use head and tail pointers instead of next pointers.

Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/enic: remove unused functions in Tx path
John Daley [Fri, 3 Jun 2016 00:22:49 +0000 (17:22 -0700)]
net/enic: remove unused functions in Tx path

Functions existed which were never called. Removed them. Also
rename the 'pmd' from the name of the Tx function to improve clarity.

Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/enic: put Tx and Rx functions into same file
John Daley [Fri, 3 Jun 2016 00:22:48 +0000 (17:22 -0700)]
net/enic: put Tx and Rx functions into same file

The Tx functions were in enic_ethdev.c and enic_main.c - files in which
they did not logically belong.  To make things consistent with most
other drivers, we therefore extract them and place them with the equivalent
Rx functions into a file called enic_rxtx.c.

Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/enic: count truncated packets
John Daley [Fri, 3 Jun 2016 00:22:47 +0000 (17:22 -0700)]
net/enic: count truncated packets

Truncated packets occur on enic if an mbuf is not big enough to
receive it or there aren't enough mbufs if rx scatter is in use.
They show up as error packets but unlike other error packets (like
packets bad FCS) there are no nic drop counts incremented for them.
Truncated packets are calculated by subtracting hardware errors from
software errors. Note: this causes transient inaccuracies in the
ipackets count. Also, the length of truncated packets are counted
in ibytes even though truncated packets are dropped which can make
ibytes be slightly higher than it should be.

Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/enic: fix bad packet handling on Rx
John Daley [Fri, 3 Jun 2016 00:22:46 +0000 (17:22 -0700)]
net/enic: fix bad packet handling on Rx

Following the discussions from:
http://dpdk.org/ml/archives/dev/2015-July/021721.html
http://dpdk.org/ml/archives/dev/2016-April/038143.html

Remove the unused flag from enic driver. Also, the enic driver is
now modified to drop bad packets instead of using a non-existent
flag to try and identify them as bad.

Fixes: 947d860c821f ("enic: improve Rx performance")
Fixes: 5776c30293bb ("enic: fix error packets handling")
Fixes: 50765c820e98 ("enic: remove packet error conditional")

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/enic: fix Rx drop counters
John Daley [Fri, 3 Jun 2016 00:22:45 +0000 (17:22 -0700)]
net/enic: fix Rx drop counters

rx_no_bufs is a hardware counter of packets dropped on the
interface due to no host buffers and should be used to update
r_stats->imissed counter instead of rx_nombuf.

Include rx_drop in ierrors. rx_drop is incremented if packets
arrive when the receive queue is disabled.

Add a structure and functions for initializing and clearing
software counters. Add count of Rx mbuf allocation failures
(rx_nombuf) as the first counter.

Fixes: fefed3d1e62c ("enic: new driver")

Signed-off-by: John Daley <johndale@cisco.com>
8 years agonet/e1000: fix build with clang
Hiroyuki Mikita [Thu, 26 May 2016 11:36:39 +0000 (20:36 +0900)]
net/e1000: fix build with clang

GCC_VERSION is empty in case of clang:
/bin/sh: line 0: test: -ge: unary operator expected

It is the same issue as http://dpdk.org/dev/patchwork/patch/5994/

Fixes: 366113dbfb69 ("e1000: suppress misleading indentation warning")

Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
8 years agonet/af_packet: add byte counters
Rich Lane [Wed, 25 May 2016 21:03:20 +0000 (14:03 -0700)]
net/af_packet: add byte counters

Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: John W. Linville <linville@tuxdriver.com>
8 years agonet/i40e: fix unintended sign extension
Slawomir Mrozowicz [Fri, 20 May 2016 13:03:36 +0000 (15:03 +0200)]
net/i40e: fix unintended sign extension

Suspicious implicit sign extension: pf->fdir.match_counter_index
with type unsigned short (16 bits, unsigned) is promoted in
"pf->fdir.match_counter_index << 20" to type int (32 bits, signed),
then sign-extended to type unsigned long (64 bits, unsigned).
If "pf->fdir.match_counter_index << 20" is greater than 0x7FFFFFFF,
the upper bits of the result will all be 1.

To fix the issue explicitly cast pf->fdir.match_counter_index to uint32_t.

Coverity issue: 13315
Fixes: 05999aab4ca6 ("i40e: add or delete flow director")

Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
8 years agonet/i40e: support MTU configuration
Beilei Xing [Fri, 20 May 2016 15:17:04 +0000 (23:17 +0800)]
net/i40e: support MTU configuration

This patch enables configuring MTU for i40e.
Since changing MTU needs to reconfigure queue, the port must be
stopped before configuring MTU.

Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
8 years agonet/i40e: fix disabling flex payload selection rule
Jingjing Wu [Thu, 12 May 2016 08:11:40 +0000 (16:11 +0800)]
net/i40e: fix disabling flex payload selection rule

When setting up the flexible paylaod selection rules, the value
NONUSE_FLX_PIT_DEST_OFF (== 63) is meant to disable the rule.
However, since the MK_FLX_PIT macro always added on an additional
offset of I40E_FLX_OFFSET_IN_FIELD_VECTOR (== 50) to the value passed
the functionality to disable the rule was broken.
This patch fixes this by checking for the disable value and not adding
the offset in that case.

Fixes: d8b90c4eabe9 ("i40e: take flow director flexible payload configuration")

Reported-by: Michael Habibi <mikehabibi@gmail.com>
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Zhe Tao <zhe.tao@intel.com>
8 years agonet/i40e: fix link management
Jingjing Wu [Thu, 12 May 2016 07:21:04 +0000 (15:21 +0800)]
net/i40e: fix link management

Previously, there was a known issue "On Intel® 40G Ethernet
Controller stopping the port does not really down the port link."

There were two reasons why the port was always kept up.
1. Old firmware versions had issues when "Set PHY config command"
   was used on 40G NICs.
2. The kernel i40e driver didn't call "Set PHY config command" when
   ifconfig up/down was used, it assumes the link is always up. But
   in DPDK, ports are forced down when an applications quits. So if
   the port is then switched to being controlled by kernel the driver,
   the port can not be brought up through "ifconfig <ethx> up".

This patch fixes this issue by adding in "Set PHY config command"
into our driver. This is now possible because with newer firmware
there is no longer a problem using this command.

With this fix, after DPDK quit, if the port is switched to being used
by the kernel driver, "ethtool -s <ethx> autoneg on" can be used to
turn on the auto negotiation, and then port can be brought up through
"ifconfig <ethx> up".
NOTE: requires kernel i40e driver version >= 1.4.X

Fixes: 2f1e22817420 ("i40e: skip link control as firmware workaround")
Fixes: 16c979f9adf2 ("i40e: disable setting of PHY configuration")

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
8 years agonet/bnx2x: update driver version to 1.0.1.1
Rasesh Mody [Thu, 12 May 2016 00:06:25 +0000 (17:06 -0700)]
net/bnx2x: update driver version to 1.0.1.1

Signed-off-by: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: Harish Patil <harish.patil@qlogic.com>
8 years agonet/bnx2x: use single doorbell for Tx
Rasesh Mody [Thu, 12 May 2016 00:06:24 +0000 (17:06 -0700)]
net/bnx2x: use single doorbell for Tx

Change the Tx routine to ring the doorbell once per burst
and not on every Tx packet. This driver-level optimization
is necessary to achieve line rates for larger frame
sizes (1k or more).

Signed-off-by: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: Harish Patil <harish.patil@qlogic.com>
8 years agonet/bnx2x: restructure Tx routine
Rasesh Mody [Thu, 12 May 2016 00:06:23 +0000 (17:06 -0700)]
net/bnx2x: restructure Tx routine

- Process Tx completions based on configured Tx free threshold and
  determine how much TX BDs are required before invoking bnx2x_tx_encap()
- Change bnx2x_tx_encap() to void function as it can now never fail

Signed-off-by: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: Harish Patil <harish.patil@qlogic.com>
8 years agonet/bnx2x: fix dropped packet count in stats
Rasesh Mody [Thu, 12 May 2016 00:06:21 +0000 (17:06 -0700)]
net/bnx2x: fix dropped packet count in stats

Fix stats_get() routine to display drop counters under imissed counter.

Fixes: 540a211084a7 ("bnx2x: driver core")

Signed-off-by: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: Harish Patil <harish.patil@qlogic.com>
8 years agonet/qede: allow firmware to query LAN stats
Harish Patil [Sat, 7 May 2016 04:21:31 +0000 (21:21 -0700)]
net/qede: allow firmware to query LAN stats

Under certain scenarios, management firmware (MFW) periodically polls
the driver for LAN statistics. This patch implements the osal hook to
fill in the stats.

Fixes: ec94dbc57362 ("qede: add base driver")

Signed-off-by: Harish Patil <harish.patil@qlogic.com>
8 years agonet/qede: rename debug option
Rasesh Mody [Sat, 7 May 2016 04:21:30 +0000 (21:21 -0700)]
net/qede: rename debug option

Rename RTE_LIBRTE_QEDE_DEBUG_DRV to RTE_LIBRTE_QEDE_DEBUG_DRIVER
for consistency with other drivers.

Fixes: 3eae93a9bfd5 ("qede: enable PMD build")
Fixes: 2ea6f76aff40 ("qede: add core driver")

Signed-off-by: Rasesh Mody <rasesh.mody@qlogic.com>
8 years agonet/cxgbe: support register dump
Rahul Lakkireddy [Fri, 6 May 2016 07:43:19 +0000 (13:13 +0530)]
net/cxgbe: support register dump

Add operations to get register dump.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
8 years agonet/cxgbe: support EEPROM access
Rahul Lakkireddy [Fri, 6 May 2016 07:43:18 +0000 (13:13 +0530)]
net/cxgbe: support EEPROM access

Add operations to get/set EEPROM data.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
8 years agonet/cxgbe: set default PCIe completion timeout
Rahul Lakkireddy [Fri, 6 May 2016 07:43:17 +0000 (13:13 +0530)]
net/cxgbe: set default PCIe completion timeout

Program the PCIe completion timeout to 4 sec to give enough time
to allow completions to be received successfully in some older systems.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
8 years agonet/cxgbe: access to PCI config space
Rahul Lakkireddy [Fri, 6 May 2016 07:43:16 +0000 (13:13 +0530)]
net/cxgbe: access to PCI config space

Add helper functions to read/write PCI config space.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
8 years agopci: fix config space access on FreeBSD
Rahul Lakkireddy [Fri, 6 May 2016 07:43:15 +0000 (13:13 +0530)]
pci: fix config space access on FreeBSD

PCIOCREAD and PCIOCWRITE ioctls to read/write PCI config space fail
with EPERM due to missing write permission.  Fix by opening /dev/pci/
with O_RDWR instead.

Fixes: 632b2d1deeed ("eal: provide functions to access PCI config")

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
8 years agonet/ixgbe: rename x86 vector driver file
Jianbo Liu [Wed, 11 May 2016 03:45:09 +0000 (09:15 +0530)]
net/ixgbe: rename x86 vector driver file

To be consistent with the naming for ARM NEON implementation,
ixgbe_rxtx_vec.c is renamed to ixgbe_rxtx_vec_sse.c.

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
8 years agonet/i40evf: fix return value if admin queue command fails
Jingjing Wu [Tue, 10 May 2016 02:51:59 +0000 (10:51 +0800)]
net/i40evf: fix return value if admin queue command fails

Previously, if an adminq message is sent successfully, but no response is
received, function "i40evf_execute_vf_cmd" will return without error.
The root cause is value "err" is overwritten. This patch fixes this by
ensuring the value of err is set appropriately for each cmd.

Fixes: ae19955e7c86 ("i40evf: support reporting PF reset")

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
8 years agonet/ixgbe: implement vector driver for ARM
Jianbo Liu [Fri, 6 May 2016 06:25:46 +0000 (11:55 +0530)]
net/ixgbe: implement vector driver for ARM

Use ARM NEON intrinsic to implement ixgbe vPMD

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
[style fixes as highlighted by checkpatch.pl]
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
8 years agonet/ixgbe: extract non-x86 specific code from vector driver
Jianbo Liu [Fri, 6 May 2016 06:25:45 +0000 (11:55 +0530)]
net/ixgbe: extract non-x86 specific code from vector driver

move scalar code which does not use x86 intrinsic functions to new file
"ixgbe_rxtx_vec_common.h", while keeping x86 code in ixgbe_rxtx_vec.c.
This allows the scalar code to to be shared among vector drivers for
different platforms.

Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
8 years agonet/vmxnet3: fix VLAN tag placed in wrong mbuf in chain
John Guzik [Tue, 12 Apr 2016 23:08:04 +0000 (16:08 -0700)]
net/vmxnet3: fix VLAN tag placed in wrong mbuf in chain

The VLAN tag information should be stored in the first mbuf of a chain
of buffers, not in the last one.

Fixes: 9fd5e98b62e4 ("vmxnet3: support RSS and refactor Rx offload")

Signed-off-by: John Guzik <john@shieldxnetworks.com>
Acked-by: Yong Wang <yongwang@vmware.com>
8 years agoscripts: enable qede in build test
Thomas Monjalon [Wed, 29 Jun 2016 08:56:19 +0000 (10:56 +0200)]
scripts: enable qede in build test

The driver qede can be automatically enabled if libz is available.

Fixes: ec94dbc57362 ("qede: add base driver")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
8 years agoscripts: check first word of commit messages
Bruce Richardson [Tue, 28 Jun 2016 11:27:12 +0000 (12:27 +0100)]
scripts: check first word of commit messages

Avoid messages starting with "It" without describing what
it is talking about.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
8 years agomk: check shared library dependencies
Panu Matilainen [Tue, 21 Jun 2016 08:11:49 +0000 (11:11 +0300)]
mk: check shared library dependencies

Require all symbols used by a DSO to be resolvable via LDLIBS at
build-time. Previously it was possible to build a library with
incomplete dependencies which could then fail at run-time.

Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
8 years agopdump: fix missing dependency on libpthread
Panu Matilainen [Tue, 21 Jun 2016 08:11:48 +0000 (11:11 +0300)]
pdump: fix missing dependency on libpthread

Fixes: 278f945402c5 ("pdump: add new library for packet capture")

Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
8 years agomk: fix external dependencies of crypto drivers
Thomas Monjalon [Fri, 24 Jun 2016 22:25:01 +0000 (00:25 +0200)]
mk: fix external dependencies of crypto drivers

When linking drivers as shared libraries, the dependencies need
to be marked as DT_NEEDED entries.

The crypto dependencies (libsso and libIPSec) are static libraries.
To make them linked in the shared PMDs, the code must relocatable:
    - libIPSec_MB.a must be built with -fPIC
    - libsso_kasumi.a must be built with KASUMI_CFLAGS=-DKASUMI_C

Fixes: 924e84f87306 ("aesni_mb: add driver for multi buffer based crypto")
Fixes: eec136f3c54f ("aesni_gcm: add driver for AES-GCM crypto operations")
Fixes: 3aafc423cf4d ("snow3g: add driver for SNOW 3G library")
Fixes: 2773c86d061a ("crypto/kasumi: add driver for KASUMI library")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
8 years agomk: fix internal dependencies
Thomas Monjalon [Fri, 24 Jun 2016 15:13:44 +0000 (17:13 +0200)]
mk: fix internal dependencies

Some libraries were missing their dependency on eal, mbuf, mempool,
ring and kvargs.
It is revealed by the linker option "-z defs".

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
8 years agopipeline: fix truncated dependency list
Panu Matilainen [Tue, 21 Jun 2016 08:11:47 +0000 (11:11 +0300)]
pipeline: fix truncated dependency list

In other libraries, dependency list is always appended to, but
in commit 6cbf4f75e059 it with an assignment. This causes the
librte_eal dependency added in commit 6cbf4f75e059 to get discarded,
resulting in missing dependency on librte_eal.

Fixes: 6cbf4f75e059 ("mk: fix missing internal dependencies")

Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
8 years agomk: fix external library link
Thomas Monjalon [Sat, 25 Jun 2016 11:07:32 +0000 (13:07 +0200)]
mk: fix external library link

When building an external library with rte.extlib.mk, the internal
libraries were not found because the linker search path was the
external library install directory (RTE_OUTPUT/lib).
It is fixed by searching in the internal library install directory
(RTE_SDK_BIN/lib).
When building an internal library, RTE_SDK_BIN = RTE_OUTPUT.

Fixes: c6417ce61f83 ("mk: add build-time library directory to linker path")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
8 years agomk: remove traces of combined library
Thomas Monjalon [Sat, 25 Jun 2016 09:19:15 +0000 (11:19 +0200)]
mk: remove traces of combined library

Fixes: 948fd64befc3 ("mk: replace the combined library with a linker script")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
8 years agocryptodev: uninline parameter parsing
Thomas Monjalon [Fri, 24 Jun 2016 15:34:26 +0000 (17:34 +0200)]
cryptodev: uninline parameter parsing

There is no need to have this parsing inlined in the header.
It brings kvargs dependency to every crypto drivers.
The functions are moved into rte_cryptodev.c.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
8 years agomempool: fix symbol export
Thomas Monjalon [Fri, 24 Jun 2016 22:00:57 +0000 (00:00 +0200)]
mempool: fix symbol export

Every new symbols in release 16.07 are exported with the version
string DPDK_16.07.
Also remove the empty local: section which is not needed because
inherited from the DPDK_2.0 block.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
8 years agoscripts: add verbose option in build test help
Thomas Monjalon [Fri, 24 Jun 2016 10:16:32 +0000 (12:16 +0200)]
scripts: add verbose option in build test help

The verbose option was available but not advertised.

Fixes: 6e38dfe21389 ("scripts: add verbose test build option")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
8 years agoscripts: relax line length check for fixed commit
Thomas Monjalon [Thu, 23 Jun 2016 22:40:43 +0000 (00:40 +0200)]
scripts: relax line length check for fixed commit

It is better to keep the line "Fixes:" longer than 75 characters
than splitting.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
8 years agoapp/pdump: fix type casting of ring size
Reshma Pattan [Fri, 24 Jun 2016 16:36:23 +0000 (17:36 +0100)]
app/pdump: fix type casting of ring size

ring_size value is wrongly type casted to uint16_t.
It should be type casted to uint32_t, as maximum
ring size is 28bit long. Wrong type cast
wrapping around the ring size values bigger than 65535.

Fixes: caa7028276b8 ("app/pdump: add tool for packet capturing")

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
8 years agoapp/pdump: fix string overflow
Reshma Pattan [Fri, 24 Jun 2016 16:36:22 +0000 (17:36 +0100)]
app/pdump: fix string overflow

replaced strncpy with snprintf for safely
copying the strings.

Coverity issue: 127351

Fixes: caa7028276b8 ("app/pdump: add tool for packet capturing")

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
8 years agopdump: fix string overflow
Reshma Pattan [Fri, 24 Jun 2016 16:36:21 +0000 (17:36 +0100)]
pdump: fix string overflow

replaced strncpy with snprintf for safely
copying the strings.

Coverity issue: 127350

Fixes: 278f945402c5 ("pdump: add new library for packet capture")

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
8 years agopdump: check missing home environment variable
Reshma Pattan [Fri, 24 Jun 2016 16:36:20 +0000 (17:36 +0100)]
pdump: check missing home environment variable

inside pdump_get_socket_path(), getenv can return
a NULL pointer if the match for SOCKET_PATH_HOME is
not found in the environment. NULL check is added to
return -1 immediately. Since pdump_get_socket_path()
returns -1 now, wherever this function is called
there the return value is checked and error message
is logged.

Coverity issue: 127344, 127347

Fixes: 278f945402c5 ("pdump: add new library for packet capture")

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
8 years agopdump: fix default socket path
Reshma Pattan [Fri, 24 Jun 2016 16:36:19 +0000 (17:36 +0100)]
pdump: fix default socket path

SOCKET_PATH_HOME is to specify environment variable "HOME",
so it should not contain "/pdump_sockets"  in the macro.
So removed "/pdump_sockets" from SOCKET_PATH_HOME and
SOCKET_PATH_VAR_RUN. New changes will create pdump sockets under
/var/run/.dpdk/pdump_sockets for root users and
under HOME/.dpdk/pdump_sockets for non root users.
Changes are done in pdump_get_socket_path() to accommodate
new socket path changes.

Fixes: 278f945402c5 ("pdump: add new library for packet capture")

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
8 years agoapp/test: avoid freeing mbufs twice in qat test
Pablo de Lara [Mon, 27 Jun 2016 12:41:27 +0000 (13:41 +0100)]
app/test: avoid freeing mbufs twice in qat test

Test_multi_session was freeing mbufs used in the multiple sessions
created and setting obuf to NULL after it, but ibuf was not being
set to NULL, and therefore, it was being freed again (ibuf and obuf
are pointing at the same address), in the ut_teardown() function.

Fixes: 1b9cb73ecef1 ("app/test: fix qat autotest failure")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
8 years agoapp/test: fix PCI class probing
Thomas Monjalon [Fri, 24 Jun 2016 12:24:55 +0000 (14:24 +0200)]
app/test: fix PCI class probing

The PCI test was failing because some fake devices had no PCI class.

Fixes: 1dbba1650c89 ("app/test: remove real PCI ids")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
8 years agoapp/test: avoid freeing mbuf twice
Pablo de Lara [Sat, 25 Jun 2016 16:11:21 +0000 (17:11 +0100)]
app/test: avoid freeing mbuf twice

In cryptodev tests, when input and output buffers were the same,
the mbuf was being freed twice, causing refcnt_atomic to be negative.

Fixes: 202d375c60bc ("app/test: add cryptodev unit and performance tests")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
8 years agoapp/test: fix build with icc
Deepak Kumar Jain [Wed, 22 Jun 2016 16:13:55 +0000 (17:13 +0100)]
app/test: fix build with icc

Icc complains about variable may be used without setting.

Fixes: 97fe6461c7cbfb ("app/test: add SNOW 3G performance test")

Signed-off-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
Acked-by: John Griffin <john.griffin@intel.com>
8 years agoport: fix build without KNI
Panu Matilainen [Wed, 22 Jun 2016 11:34:02 +0000 (14:34 +0300)]
port: fix build without KNI

Commit 9fc37d1c071c is missing a conditional in the dependencies,
causing builds to fail when KNI is not enabled:
    == Build lib/librte_port
      LD librte_port.so.3
    /usr/bin/ld: cannot find -lrte_kni
    collect2: error: ld returned 1 exit status

Fixes: 9fc37d1c071c ("port: support KNI")

Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
8 years agokni: fix build with gcc 6.1
Pablo de Lara [Thu, 23 Jun 2016 14:38:24 +0000 (15:38 +0100)]
kni: fix build with gcc 6.1

Using gcc 6.1, in some cases, kni fails to compile
because of unused variables:

lib/librte_eal/linuxapp/kni/ixgbe_main.c:82:19:
error: ‘ixgbe_copyright’
defined but not used [-Werror=unused-const-variable=]

lib/librte_eal/linuxapp/kni/ixgbe_main.c:62:19:
error: ‘ixgbe_driver_string’
defined but not used [-Werror=unused-const-variable=]

Fixes: 3fc5ca2f6352 ("kni: initial import")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
8 years agomk: fix parallel build of test resources
Thomas Monjalon [Thu, 23 Jun 2016 22:22:59 +0000 (00:22 +0200)]
mk: fix parallel build of test resources

The build was failing sometimes when building with multiple
parallel jobs:
    # rm build/build/app/test/*res*
    # make -j6
    objcopy: 'resource.tmp': No such file

The reason is that each resource was built from the same temporary file.
The failure is seen because of a race condition when removing the
temporary file after each resource creation.
It also means that some resources may be created from the wrong source.

The fix is to have a different input file for each resource.
The source file is not directly used because it may have a long path
which is used by objcopy to name the symbols after some transformations.
When linking a tar resource, the input file is already in the current
directory. The hard case is for simply linked resources.
The trick is to create a symbolic link of the source file if it is not
already in the current build directory.
Then there is a replacement of dot by an underscore to predict the
symbol names computed by objcopy which must be redefined.

There is an additional change for the test_resource_c which is both
a real source file and a test resource. An intermediate file
test_resource.res is created to avoid compiling resource.c from the
wrong directory through a symbolic link.

Fixes: 1e9e0a6270 ("app/test: fix resource creation with objcopy on FreeBSD")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
8 years agohash: add scalable multi-writer insertion with Intel TSX
Wei Shen [Thu, 16 Jun 2016 22:14:14 +0000 (15:14 -0700)]
hash: add scalable multi-writer insertion with Intel TSX

This patch introduced scalable multi-writer Cuckoo Hash insertion
based on a split Cuckoo Search and Move operation using Intel
TSX. It can do scalable hash insertion with 22 cores with little
performance loss and negligible TSX abortion rate.

* Added an extra rte_hash flag definition to switch default single writer
  Cuckoo Hash behavior to multiwriter.
    - If HTM is available, it would use hardware feature for concurrency.
    - If HTM is not available, it would fall back to spinlock.

* Created a rte_cuckoo_hash_x86.h file to hold all x86-arch related
  cuckoo_hash functions. And rte_cuckoo_hash.c uses compile time flag to
  select x86 file or other platform-specific implementations. While HTM check
  is still done at runtime (same idea with
  RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT)

* Moved rte_hash private struct definitions to rte_cuckoo_hash.h, to allow
  rte_cuckoo_hash_x86.h or future platform dependent functions to include.

* Following new functions are created for consistent names when new platform
  TM support are added.
    - rte_hash_cuckoo_move_insert_mw_tm: do insertion with bucket movement.
    - rte_hash_cuckoo_insert_mw_tm: do insertion without bucket movement.

* One extra multi-writer test case is added.

Signed-off-by: Wei Shen <wei1.shen@intel.com>
Signed-off-by: Sameh Gobriel <sameh.gobriel@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
8 years agombuf: fix dump format
Simon Kagstrom [Mon, 20 Jun 2016 10:44:35 +0000 (12:44 +0200)]
mbuf: fix dump format

Do not add 0x when using %p in format strings to avoid dump messages
with double 0x0x, e.g.,

  dump mbuf at 0x0x7fac7b17c800, phys=17b17c880, buf_len=2176
    pkt_len=2064, ol_flags=0, nb_segs=1, in_port=255
    segment at 0x0x7fac7b17c800, data=0x0x7fac7b17c8f0, data_len=2064

Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
8 years agombuf: use default mempool handler from config
David Hunt [Wed, 22 Jun 2016 09:27:29 +0000 (10:27 +0100)]
mbuf: use default mempool handler from config

By default, the mempool ops used for mbuf allocations is a multi
producer and multi consumer ring. We could imagine a target (maybe some
network processors?) that provides an hardware-assisted pool
mechanism. In this case, the default configuration for this architecture
would contain a different value for RTE_MBUF_DEFAULT_MEMPOOL_OPS.

Signed-off-by: David Hunt <david.hunt@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
8 years agoapp/test: add mempool handler
David Hunt [Wed, 22 Jun 2016 09:27:28 +0000 (10:27 +0100)]
app/test: add mempool handler

Create a minimal custom mempool handler and check that it
passes basic mempool autotests.

Signed-off-by: David Hunt <david.hunt@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
8 years agomempool: support handler operations
David Hunt [Wed, 22 Jun 2016 09:27:27 +0000 (10:27 +0100)]
mempool: support handler operations

Until now, the objects stored in a mempool were internally stored in a
ring. This patch introduces the possibility to register external handlers
replacing the ring.

The default behavior remains unchanged, but calling the new function
rte_mempool_set_ops_byname() right after rte_mempool_create_empty() allows
the user to change the handler that will be used when populating
the mempool.

This patch also adds a set of default ops (function callbacks) based
on rte_ring.

Signed-off-by: David Hunt <david.hunt@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
8 years agonet/virtio-user: fix 32-bit build
Thomas Monjalon [Thu, 23 Jun 2016 20:46:22 +0000 (22:46 +0200)]
net/virtio-user: fix 32-bit build

The compilation for 32-bit fails when CONFIG_RTE_VIRTIO_USER is enabled:

  drivers/net/virtio/virtio_user_ethdev.c:84:47:
    error: format ‘%llu’ expects argument of type ‘long long unsigned int’,
    but argument 5 has type ‘size_t {aka unsigned int}’

Fixes: e9efa4d93821 ("net/virtio-user: add new virtual PCI driver")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
8 years agonet/i40e: support NSH packet type
Jingjing Wu [Tue, 3 May 2016 05:51:12 +0000 (13:51 +0800)]
net/i40e: support NSH packet type

NSH packet can be recognized by Intel X710/XL710 series.
This patch enables the new packet type.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Tested-by: Yulong Pei <yulong.pei@intel.com>
Acked-by: Zhe Tao <zhe.tao@intel.com>
8 years agombuf: add NSH packet type
Jingjing Wu [Tue, 3 May 2016 05:51:11 +0000 (13:51 +0800)]
mbuf: add NSH packet type

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Zhe Tao <zhe.tao@intel.com>
8 years agoethdev: fix doxygen formatting
Hiroyuki Mikita [Fri, 17 Jun 2016 17:27:34 +0000 (02:27 +0900)]
ethdev: fix doxygen formatting

This commit fixes some functions missing in API documentation.

Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>
8 years agoethdev: align device structure with cache line
Jerin Jacob [Tue, 3 May 2016 12:42:07 +0000 (18:12 +0530)]
ethdev: align device structure with cache line

Elements of struct rte_eth_dev used in the fast path.
Make struct rte_eth_dev cache aligned to avoid the cases where
rte_eth_dev elements share the same cache line with other structures.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
8 years agoethdev: add RSS RETA size constant 256
Jerin Jacob [Wed, 22 Jun 2016 13:03:21 +0000 (18:33 +0530)]
ethdev: add RSS RETA size constant 256

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
8 years agoethdev: add tunnel and port RSS offload types
Jerin Jacob [Wed, 22 Jun 2016 13:03:20 +0000 (18:33 +0530)]
ethdev: add tunnel and port RSS offload types

- added VXLAN, GENEVE and NVGRE tunnel flow types
- added PORT flow type for accounting physical/virtual
port or channel number in flow creation

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
8 years agonet/virtio: fix used index retrieved only once
Huawei Xie [Sun, 19 Jun 2016 17:48:52 +0000 (01:48 +0800)]
net/virtio: fix used index retrieved only once

In the following loop:
    while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
            ...
    }
There is no external function call or any explict memory barrier
in the loop, the re-read of used->idx might be optimized and only
be retrieved once.

Use of voaltile normally should be prohibited, and access_once
is Linux kernel's style to handle this issue; Once we have that
macro in DPDK, we could change to that style.

virtio_recv_mergable_pkts might also have the same issue, so fix
it as well.

Fixes: 823ad647950a ("virtio: support multiple queues")
Fixes: 13ce5e7eb94f ("virtio: mergeable buffers")

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agonet/virtio: fix crash on querying xstats
Yuanhan Liu [Mon, 20 Jun 2016 10:43:32 +0000 (18:43 +0800)]
net/virtio: fix crash on querying xstats

Trying to access xstats_names after "if (xstats_names == NULL)" is
obviously wrong, which would result to a crash while running "show
port xstats 0" in testpmd with virtio PMD.

The fix is straightforward; just reverse the check.

Fixes: baf91c395b18 ("net/virtio: fetch extended statistics with integer ids")

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agovhost: check hugepage fstat error
Huawei Xie [Tue, 14 Jun 2016 17:45:23 +0000 (01:45 +0800)]
vhost: check hugepage fstat error

Value returned from fstat is not checked for errors before being used.
This patch fixes following coverity issue.

    static uint64_t
    get_blk_size(int fd)
    {
     struct stat stat;

     fstat(fd, &stat);
     return (uint64_t)stat.st_blksize;
    >>>  CID 107103 (#1 of 1): Unchecked return value from library
         (CHECKED_RETURN)
    >>>  check_return: Calling fstat(fd, &stat) without checking
         return value.
    >>>  This library function may fail and return an error code.

Fixes: 8f972312b8f4 ("vhost: support vhost-user")

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agovhost: unmap log memory on cleanup
Ilya Maximets [Thu, 16 Jun 2016 09:16:37 +0000 (12:16 +0300)]
vhost: unmap log memory on cleanup

Fixes memory leak on QEMU migration.

Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agovhost: fix leak of file descriptors
Ilya Maximets [Thu, 16 Jun 2016 09:16:36 +0000 (12:16 +0300)]
vhost: fix leak of file descriptors

While migration of vhost-user device QEMU allocates memfd
to store information about dirty pages and sends fd to
vhost-user process.

File descriptor for this memory should be closed to prevent
"Too many open files" error for vhost-user process after
some amount of migrations.

Ex.:
 # ls /proc/<ovs-vswitchd pid>/fd/ -alh
 total 0
 root qemu  .
 root qemu  ..
 root qemu  0 -> /dev/pts/0
 root qemu  1 -> pipe:[1804353]
 root qemu  10 -> socket:[1782240]
 root qemu  100 -> /memfd:vhost-log (deleted)
 root qemu  1000 -> /memfd:vhost-log (deleted)
 root qemu  1001 -> /memfd:vhost-log (deleted)
 root qemu  1004 -> /memfd:vhost-log (deleted)
 [...]
 root qemu  996 -> /memfd:vhost-log (deleted)
 root qemu  997 -> /memfd:vhost-log (deleted)

 ovs-vswitchd.log:
 |WARN|punix:ovs-vswitchd.ctl: accept failed: Too many open files

Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agonet/virtio-user: handle control queue in driver
Jianfeng Tan [Wed, 15 Jun 2016 09:07:17 +0000 (09:07 +0000)]
net/virtio-user: handle control queue in driver

In virtio-user driver, when notify ctrl-queue, invoke API of
virtio-user device emulation to handle ctrl-q command.

Besides, multi-queue requires ctrl-queue and ctrl-queue will be
enabled automatically when multi-queue is specified.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agonet/virtio-user: add multiple queues in device emulation
Jianfeng Tan [Wed, 15 Jun 2016 09:38:36 +0000 (17:38 +0800)]
net/virtio-user: add multiple queues in device emulation

The main purpose of this patch is to enable multi-queue. But
multi-queue requires ctrl-queue so that driver can send how many
queues will be enabled through ctrl-queue messages.

So we partially implement ctrl-queue to handle control command
with class of VIRTIO_NET_CTRL_MQ and with cmd of
VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET to handle mq support. This patch
provides a function, virtio_user_handle_cq(), for driver to handle
ctrl-queue messages.

Besides, multi-queue requires VIRTIO_NET_F_MQ and VIRTIO_NET_F_CTRL_VQ
are enabled when we do feature negotiation.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agonet/virtio-user: add multiple queues in vhost-user adapter
Jianfeng Tan [Wed, 15 Jun 2016 09:07:15 +0000 (09:07 +0000)]
net/virtio-user: add multiple queues in vhost-user adapter

This patch mainly adds method in vhost user adapter to communicate
enable/disable queues messages with vhost user backend, aka,
VHOST_USER_SET_VRING_ENABLE.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agonet/virtio-user: add virtual device
Jianfeng Tan [Wed, 15 Jun 2016 09:03:25 +0000 (09:03 +0000)]
net/virtio-user: add virtual device

Add a new virtual device named virtio-user, which can be used just like
eth_ring, eth_null, etc. To reuse the code of original virtio, we do
some adjustment in virtio_ethdev.c, such as remove key _static_ of
eth_virtio_dev_init() so that it can be reused in virtual device; and
we add some check to make sure it will not crash.

Configured parameters include:
  - queues (optional, 1 by default), number of queue pairs, multi-queue
    not supported for now.
  - cq (optional, 0 by default), not supported for now.
  - mac (optional), random value will be given if not specified.
  - queue_size (optional, 256 by default), size of virtqueues.
  - path (madatory), path of vhost user.

When enable CONFIG_RTE_VIRTIO_USER (enabled by default), the compiled
library can be used in both VM and container environment.

Examples:
path_vhost=<path_to_vhost_user> # use vhost-user as a backend

sudo ./examples/l2fwd/build/l2fwd -c 0x100000 -n 4 \
    --socket-mem 0,1024 --no-pci --file-prefix=l2fwd \
    --vdev=virtio-user0,mac=00:01:02:03:04:05,path=$path_vhost -- -p 0x1

Known issues:
 - Control queue and multi-queue are not supported yet.
 - Cannot work with --huge-unlink.
 - Cannot work with no-huge.
 - Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8)
   hugepages.
 - Root privilege is a must (mainly becase of sorting hugepages according
   to physical address).
 - Applications should not use file name like HUGEFILE_FMT ("%smap_%d").
 - Cannot work with vhost-net backend.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agonet/virtio-user: add new virtual PCI driver
Jianfeng Tan [Wed, 15 Jun 2016 09:03:24 +0000 (09:03 +0000)]
net/virtio-user: add new virtual PCI driver

This patch implements another new instance of struct virtio_pci_ops to
drive the virtio-user virtual device. Instead of rd/wr ioport or PCI
configuration space, this virtual pci driver will rd/wr the virtual
device struct virtio_user_hw, and when necessary, invokes APIs provided
by device emulation later to start/stop the device.

  ----------------------
  | ------------------ |
  | | virtio driver  | |----> (virtio_user_ethdev.c)
  | ------------------ |
  |         |          |
  | ------------------ | ------>  virtio-user PMD
  | | device emulate | |
  | |                | |
  | | vhost adapter  | |
  | ------------------ |
  ----------------------
            |
            |
            |
   ------------------
   | vhost backend  |
   ------------------

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agonet/virtio-user: add device emulation layer
Jianfeng Tan [Wed, 15 Jun 2016 09:03:23 +0000 (09:03 +0000)]
net/virtio-user: add device emulation layer

Few device emulation layer functions are added for virtio driver to
call:
  - virtio_user_start_device()
  - virtio_user_stop_device()
  - virtio_user_dev_init()
  - virtio_user_dev_uninit()

These functions will get called by virtio driver, and they call vhost
adapter layer functions to implement the functionality.

All stats related to virtual user device as logged in virtio_user_dev
structure.

  ----------------------
  | ------------------ |
  | | virtio driver  | |
  | ------------------ |
  |         |          |
  | ------------------ | ------>  virtio-user PMD
  | | device emulate |-|----> (virtio_user_dev.c, virtio_user_dev.h)
  | |                | |
  | | vhost adapter  | |
  | ------------------ |
  ----------------------
            |
            |
            |
   ------------------
   | vhost backend  |
   ------------------

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agonet/virtio-user: add vhost-user adapter layer
Jianfeng Tan [Wed, 15 Jun 2016 09:03:22 +0000 (09:03 +0000)]
net/virtio-user: add vhost-user adapter layer

This patch provides vhost adapter layer implementation. Two main
help functions are provided to upper layer (device emulation):
  - vhost_user_setup(), to set up vhost user backend;
  - vhost_user_sock(), to talk with vhost user backend.

  ----------------------
  | ------------------ |
  | | virtio driver  | |
  | ------------------ |
  |         |          |
  | ------------------ | ------>  virtio-user PMD
  | | device emulate | |
  | |                | |
  | | vhost adapter  |-|----> (vhost_user.c)
  | ------------------ |
  ----------------------
            |
            | -------------- --> (vhost-user protocol)
            |
   ------------------
   | vhost backend  |
   ------------------

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agonet/virtio: allow virtual address to fill vring descriptors
Jianfeng Tan [Wed, 15 Jun 2016 09:03:21 +0000 (09:03 +0000)]
net/virtio: allow virtual address to fill vring descriptors

This patch is related to how to calculate relative address for vhost
backend.

The principle is that: based on one or multiple shared memory regions,
vhost maintains a reference system with the frontend start address,
backend start address, and length for each segment, so that each
frontend address (GPA, Guest Physical Address) can be translated into
vhost-recognizable backend address. To make the address translation
efficient, we need to maintain as few regions as possible. In the case
of VM, GPA is always locally continuous. But for some other case, like
virtio-user, GPA continuous is not guaranteed, therefore, we use virtual
address here.

It basically means:
  a. when set_base_addr, VA address is used;
  b. when preparing RX's descriptors, VA address is used;
  c. when transmitting packets, VA is filled in TX's descriptors;
  d. in TX and CQ's header, VA is used.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agonet/virtio: hide vring address check inside PCI ops
Jianfeng Tan [Wed, 15 Jun 2016 09:03:20 +0000 (09:03 +0000)]
net/virtio: hide vring address check inside PCI ops

This patch moves phys addr check from virtio_dev_queue_setup
to pci ops. To make that happen, make sure virtio_ops.setup_queue
return the result if we pass through the check.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agovhost: fix null pointer dereference
Marcin Kerlin [Wed, 15 Jun 2016 09:47:22 +0000 (11:47 +0200)]
vhost: fix null pointer dereference

Return value of function get_device() is not checking before
dereference. Fix this problem by adding checking condition.

Coverity issue: 119262

Fixes: 77d20126b4c2 ("vhost-user: handle message to enable vring")

Signed-off-by: Marcin Kerlin <marcinx.kerlin@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agovhost: remove concurrent enqueue
Huawei Xie [Mon, 13 Jun 2016 11:52:12 +0000 (19:52 +0800)]
vhost: remove concurrent enqueue

All other DPDK PMDs doesn't support concurrent receiving or sending
packets to the same queue. The upper application should deal with
this, normally through queue and core bindings.

Due to historical reason, vhost internally supports concurrent lockless
enqueuing packets to the same virtio queue through costly cmpset operation.
This patch removes this internal lockless implementation and should improve
performance a bit.

Luckily DPDK OVS doesn't rely on this behavior.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agonet/virtio: fix crash when no devargs
Huawei Xie [Mon, 13 Jun 2016 14:53:08 +0000 (22:53 +0800)]
net/virtio: fix crash when no devargs

We skip kernel managed virtio devices, if it isn't whitelisted.
Before checking if the virtio device is whitelisted, check if devargs
is specified.

Fixes: ac5e1d838dc1 ("virtio: skip error when probing kernel managed device")

Reported-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agovhost: arrange struct fields for better cache sharing
Yuanhan Liu [Tue, 3 May 2016 00:46:18 +0000 (17:46 -0700)]
vhost: arrange struct fields for better cache sharing

The ifname[] field takes so much space, that it seperates some frequently
used fields into different caches, say, features and broadcast_rarp.

This patch moves all those fields that will be accessed frequently in Rx/Tx
together (before the ifname[] field) to let them share one cache line.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
8 years agovhost: optimize dequeue for small packets
Yuanhan Liu [Tue, 3 May 2016 00:46:17 +0000 (17:46 -0700)]
vhost: optimize dequeue for small packets

A virtio driver normally uses at least 2 desc buffers for Tx: the
first for storing the header, and the others for storing the data.

Therefore, we could fetch the first data desc buf before the main
loop, and do the copy first before the check of "are we done yet?".
This could save one check for small packets that just have one data
desc buffer and need one mbuf to store it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
8 years agovhost: pre update used ring for Tx and Rx
Yuanhan Liu [Tue, 3 May 2016 00:46:16 +0000 (17:46 -0700)]
vhost: pre update used ring for Tx and Rx

Pre update and update used ring in batch for Tx and Rx at the stage
while fetching all avail desc idx. This would reduce some cache misses
and hence, increase the performance a bit.

Pre update would be feasible as guest driver will not start processing
those entries as far as we don't update "used->idx". (I'm not 100%
certain I don't miss anything, though).

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
8 years agonet/vhost: add client option
Yuanhan Liu [Sat, 7 May 2016 05:51:04 +0000 (13:51 +0800)]
net/vhost: add client option

Add client option to vhost pmd, to let it act as the vhost-user client.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agoexamples/vhost: add client option
Yuanhan Liu [Sat, 7 May 2016 05:23:40 +0000 (13:23 +0800)]
examples/vhost: add client option

Add --client option to let vhost-switch acts as the client.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agovhost: workaround stale vring base
Yuanhan Liu [Fri, 6 May 2016 22:04:05 +0000 (06:04 +0800)]
vhost: workaround stale vring base

When DPDK app crashes (or quits, or gets killed), a restart of DPDK
app would get stale vring base from QEMU. That would break the kernel
virtio net completely, making it non-work any more, unless a driver
reset is done.

So, instead of getting the stale vring base from QEMU, Huawei suggested
we could get a much saner (and may not the most accurate) vring base
from used->idx. That would work because:

- there is a memory barrier between updating used ring entries and
  used->idx. So, even though we crashed at updating the used ring
  entries, it will not cause any issue, as the guest driver will not
  process those stale used entries, for used-idx is not updated yet.

- DPDK process vring in order, that means a crash may just lead some
  packet retransmission for Tx and drop for Rx.

Suggested-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
8 years agovhost: add reconnect ability
Yuanhan Liu [Thu, 12 May 2016 23:14:19 +0000 (07:14 +0800)]
vhost: add reconnect ability

Allow reconnecting on failure by default when:

- DPDK app starts first and QEMU (as the server) is not started yet.
  Without reconnecting, DPDK app would simply fail on vhost-user
  registration.

- QEMU restarts, say due to OS reboot.
  Without reconnecting, you can't re-establish the connection without
  restarting DPDK app.

This patch make it work well for both above cases. It simply creates
a new thread, and keep trying calling "connect()", until it succeeds.

The reconnect could be disabled when RTE_VHOST_USER_NO_RECONNECT flag
is set.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agovhost: add vhost-user client mode
Yuanhan Liu [Fri, 6 May 2016 21:26:03 +0000 (05:26 +0800)]
vhost: add vhost-user client mode

Add a new paramter (flags) to rte_vhost_driver_register(). DPDK
vhost-user acts as client mode when RTE_VHOST_USER_CLIENT flag
is set.  The flags would also allow future extensions without
breaking the API (again).

The rest is straingfoward then: allocate a unix socket, and
bind/listen for server, connect for client.

This extension is for vhost-user only, therefore we simply quit
and report error when any flags are given for vhost-cuse.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
8 years agovhost: rename structs for enabling client mode
Yuanhan Liu [Fri, 6 May 2016 20:13:22 +0000 (04:13 +0800)]
vhost: rename structs for enabling client mode

DPDK vhost-user just acts as server so far, so, using a struct named
as "vhost_server" is okay. However, if we add client mode, it doesn't
make sense any more. Here renames it to "vhost_user_socket".

There was no obvious wrong about "connfd_ctx", but I think it's obviously
better to rename it to "vhost_user_connection", as it does represent
a connection, a connection between the backend (DPDK) and the frontend
(QEMU).

Similarly, few more renames are taken, such as "vserver_new_vq_conn"
to "vhost_user_new_connection".

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>