John Daley [Thu, 22 Sep 2016 17:02:46 +0000 (10:02 -0700)]
net/enic: support scatter Rx in MTU update
Re-initialize Rq's when MTU is changed. This allows for more
efficient use of mbufs when moving from an MTU that is greater
than the mbuf size to one that is less. Also move to using Rx
scatter mode when moving from an MTU less than the mbuf size
to one that is greater.
Signed-off-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: John Daley <johndale@cisco.com>
Nelson Escobar [Thu, 22 Sep 2016 17:02:45 +0000 (10:02 -0700)]
net/enic: fix freeing memory for descriptor ring
The function vnic_dev_free_desc_ring() didn't actually free memory. Fix
this by first changing vnic_dev_alloc_desc_ring() to use the common
allocation function, then in vnic_dev_free_desc_ring call the common
free function.
Fixes: fefed3d1e62c ("enic: new driver") Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Nelson Escobar [Mon, 19 Sep 2016 18:50:09 +0000 (11:50 -0700)]
net/enic: unregister interrupt handler when stopping
enic_disable() wasn't calling rte_intr_disable() or
rte_intr_callback_unregister(). stopping/starting a port would
result in the same interrupt callback being registered multiple
times, which would then cause it to be called multiple times on
every interrupt.
Fixes: fefed3d1e62c ("enic: new driver") Signed-off-by: Nelson Escobar <neescoba@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
Previously, PTYPE filed in the RX descriptors is not set properly
for QinQ packets. The wrong PTYPE is generated because outer Tag did
not have ORT/PIT configured, so fix this issue by configuring ORT/PIT.
This patch also changes bitmask of outer VLAN tag in L2 header
to support RSS and flow director for QinQ.
Rich Lane [Tue, 2 Aug 2016 19:34:56 +0000 (12:34 -0700)]
net/i40e: fix null pointer dereferences when using VMDq+RSS
When using VMDQ+RSS, the queue ids used by the application are not
contiguous (see i40e_pf_config_rss). Most of the driver already handled
this, but there were a few cases where it assumed all configured queues
had been setup.
Fixes: 4861cde46116 ("i40e: new poll mode driver") Fixes: 6b4537128394 ("i40e: free queue memory when closing") Fixes: 8e109464c022 ("i40e: allow vector Rx and Tx usage") Signed-off-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Alex Zelezniak [Tue, 30 Aug 2016 01:23:29 +0000 (20:23 -0500)]
net/ixgbe: fix VF reset to apply to correct VF
In SR-IOV configuration, queues 0 - nb_rx_queues belong to VF0,
which means that with the current implementation when a reset mbox
message comes from any VF, it affects the settings of VF0.
Fix this by using PF queue index to update the correct queue.
Fixes: dbb0b8737f64 ("ixgbe: add vlan offload support") Signed-off-by: Alex Zelezniak <alexz@att.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Beilei Xing [Wed, 17 Aug 2016 01:58:06 +0000 (09:58 +0800)]
net/i40e: fix dropping packets with ethertype 0x88A8
In FW default settings, Ethertype 0x88A8 is treated as S-TAG,
and packets with S-TAG should be received in Port Virtualizer mode.
However, Port Virtualizer mode is not initialized in DPDK, so X710 will
drop packets with Ethertype 0x88A8.
This patch fixes this issue by turning off S-TAG identification.
John Daley [Wed, 17 Aug 2016 22:15:26 +0000 (15:15 -0700)]
net/enic: fix bad L4 checksum flag on ICMP packets
The bad L4 checksum flag was set on IP packets which were not
also TCP or UDP packets. This includes ICMP, IGMP and OSPF packets.
L4 ptypes were being treated as bits instead of values within the
L4 mask causing the code to check L4 checksum in the completion
queue and incorrectly set the L4 bad checksum flag.
Fixes: 947d860c821f ("enic: improve Rx performance") Reviewed-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: John Daley <johndale@cisco.com>
Xiao Wang [Fri, 5 Aug 2016 03:17:43 +0000 (11:17 +0800)]
net/fm10k: fix MAC address removal from switch
When testpmd quits with two ports, the second port's MAC address
remains in the MAC table of switch manager.
There needs to be some time for HW to quiesce when closing a port,
otherwise a subsequent port close won't be handled correctly.
This patch adds a delay after turning off a logic port, just as
the kernel driver does.
Fixes: 8b5c9ec20b7b ("fm10k: support VMDQ in MAC/VLAN filter") Reported-by: Xueqin Lin <xueqin.lin@intel.com> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Acked-by: Jing Chen <jing.d.chen@intel.com>
Nelson Escobar [Tue, 9 Aug 2016 21:42:04 +0000 (14:42 -0700)]
net/enic: move link checking init to probe time
The enic DMAs link status information to the host and this requires a
little setup. This setup was being done as a result of calling
rte_eth_dev_start(). But applications expect to be able to check link
status before calling rte_eth_dev_start().
This patch moves the link status setup to enic_init() which is called
at device probe time so that link status can be checked anytime.
Fixes: fefed3d1e62c ("enic: new driver") Signed-off-by: Nelson Escobar <neescoba@cisco.com> Reviewed-by: John Daley <johndale@cisco.com>
PMD uses only power of two number of Work Queue Elements (aka WQE), storing
the number of elements in log2 helps to reduce the size of the container to
store it.
PMD uses only power of two number of Completion Queue Elements (aka CQE),
storing the number of elements in log2 helps to reduce the size of the
container to store it.
Rework Work Queue Element (aka WQE) structures to fit PMD needs.
A WQE is an aggregation of 16 bytes elements known as "data segments"
(aka dseg).
The only common part is the first two elements i.e. the control one to
define the job type, and the Ethernet segment which embed offload requests
with other information, after that, it can have:
- a raw data packet,
- a data pointer to the packet itself,
- both.
Jerin Jacob [Thu, 21 Jul 2016 14:01:46 +0000 (19:31 +0530)]
net/thunderx: add tunneling extension info capability flag
Certain thunderx SoC pass has additional optional word
in Rx descriptor to hold tunneling extension info.
Based on this capability, the location where packet pointer
address stored in Rx descriptor will vary.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Jerin Jacob [Thu, 21 Jul 2016 14:01:45 +0000 (19:31 +0530)]
net/thunderx: remove generic passX references
thunderx pmd driver needs to support multiple SoC
variants in ThunderX family.
Remove generic pass references from driver as each SoC
can have same pass number.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Bruce Richardson [Mon, 19 Sep 2016 14:36:54 +0000 (15:36 +0100)]
net/mlx: fix debug build with gcc 6.1
With recent gcc versions, e.g. gcc 6.1, compilation of mlx drivers with
debug enabled produces lots of errors complaining that "pedantic" is
not a warning level that can be ignored.
error: ‘-pedantic’ is not an option that controls warnings [-Werror=pragmas]
#pragma GCC diagnostic ignored "-pedantic"
^~~~~~~~~~~
These errors can be removed by changing the "-pedantic" to "-Wpedantic".
Fixes: 7fae69eeff13 ("mlx4: new poll mode driver") Fixes: 771fa900b73a ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters") Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Ferruh Yigit [Fri, 26 Aug 2016 11:17:56 +0000 (12:17 +0100)]
net/pcap: fix missing Tx interface assignment
Missing pcap assignment may cause pcap file/interface to be opened
again, and previous one not closed.
Fixes: 1e38a7c66923 ("pcap: fix storage of name and type in queues") Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Ferruh Yigit [Fri, 26 Aug 2016 11:17:38 +0000 (12:17 +0100)]
net/pcap: convert config option to a macro
pcap PMD is using ring PMD configuration parameters to set max number of
queues. This creates an unnecessary dependency and confusion.
Stop using configuration parameter to set max number of queues and
convert this variable into a macro within source code, to simplify
configuration file.
Default value of macro is same as ring parameter's default.
pcap pmd doesn't need to be configured in a detail to set rx and tx max
queue numbers separately, so using same macro for both queues.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
net/e1000: fix returned number of available Rx descriptors
Fixes: 0f6b7c7f7a37 ("igb: use DD bit to count RX available descriptors") Signed-off-by: Ali Volkan Atli <volkan.atli@argela.com.tr> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
To improve performance the NIC expects for large packets to have a pointer
to a cache aligned address, old inline code could break this assumption
which hurts performance.
This function was supposed to be inlined, but was not because several
functions calls it. This function should always be inline avoid
external function calls and to optimize code in data-path.
net/mlx5: fix inconsistent return value in flow director
The return value in DPDK is negative errno on failure.
Since internal functions in mlx driver return positive
values need to negate this value when it returned to
dpdk layer.
Fixes: 76f5c99 ("mlx5: support flow director") Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Sagi Grimberg [Tue, 2 Aug 2016 14:41:21 +0000 (17:41 +0300)]
net/mlx5: fix possible NULL dereference in Rx path
The user is allowed to call ->rx_pkt_burst() even without free
mbufs in the pool. In this scenario we'll fail allocating a rep mbuf
on the first iteration (where pkt is still NULL). This would cause us
to deref a NULL pkt (reset refcount and free).
Jeff Guo [Wed, 7 Sep 2016 09:38:40 +0000 (05:38 -0400)]
net/i40e: add packet type translation for X722
To make the PCTYPE in x722 compatible with original PCTYPE in
flow director (FD) filters, the PCTYPE in the FD programming
descriptor needs to be translated into a different PCTYPE using
GLQF_FD_PCTYPE table.
Translation needs to be done before the FD filter is programmed.
Signed-off-by: Jeff Guo <jia.guo@intel.com> Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Yong Wang [Mon, 29 Aug 2016 19:18:47 +0000 (12:18 -0700)]
net/vmxnet3: reallocate shared memzone on re-config
When adding a DPDK port to ovs-vswitchd with DPDK, the vmxnet3 device
fails to activate due to mismatched magic number. This failure causes
following operations to run: start the port, stop the port,
reconfigure and re-start the port.
During reconfigure, if there is an existing memzone, driver will reuse
it. But reconfigure may request different number of Tx/Rx queues.
This results in a memzone with wrong size and potential invalid memory
access.
To fix this, free the memzone if found and reserve a new one.
Signed-off-by: Yong Wang <yongwang@vmware.com> Reviewed-by: Guolin Yang <gyang@vmware.com> Reviewed-by: Daniele Di Proietto <ddiproietto@vmware.com> Tested-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Panu Matilainen [Wed, 5 Oct 2016 12:14:08 +0000 (15:14 +0300)]
ip_frag: fix missing dependency on hash library
Not sure what exactly changed and where, but I've started getting
build failures on Fedora rawhide i386:
lib/librte_ip_frag/ip_frag_internal.c:36:23: fatal error:
rte_jhash.h: No such file or directory
#include <rte_jhash.h>
^
Looking at librte_ip_frag, it clearly depends on librte_hash so
its probably more a question of something commonly masking the issue.
Maxime Coquelin [Tue, 4 Oct 2016 12:05:24 +0000 (14:05 +0200)]
app/testpmd: reset headroom after txonly packet allocation
This patch fixes txonly raw packets allocations by resetting the
available headroom.
Indeed, some PMDs such as Virtio might prepend some data to the
packet, resulting in mbuf's data_off field to be decremented each
time the mbuf gets re-allocated.
For Virtio PMD, it means that we use only single descriptors for the
first times mbufs get allocated, as at some point there is not
enough headroom to store the header.
Other alternative would be use standard API to allocate the packets,
which does reset the headroom, but the impact on performance is too
big to consider this an option.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Maxime Coquelin [Tue, 4 Oct 2016 12:05:23 +0000 (14:05 +0200)]
mbuf: add function to reset headroom
Some application use rte_mbuf_raw_alloc() function to improve
performance by not resetting mbuf's fields to their default state.
This can be however problematic for mbuf consumers that need some
headroom, meaning that data_off field gets decremented after
allocation. When the mbuf is re-used afterwards, there might not
be enough room for the consumer to prepend anything, if the data_off
field is not reset to its default value.
This patch adds a new rte_pktmbuf_reset_headroom() function that
applications can call to reset the data_off field.
This patch also replaces current data_off affectations in the mbuf
lib with a call to this function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Byron Marohn [Tue, 4 Oct 2016 23:25:15 +0000 (00:25 +0100)]
hash: modify lookup bulk pipeline
This patch replaces the pipelined rte_hash lookup mechanism with a
loop-and-jump model, which performs significantly better,
especially for smaller table sizes and smaller table occupancies.
Signed-off-by: Byron Marohn <byron.marohn@intel.com> Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>
Byron Marohn [Tue, 4 Oct 2016 23:25:14 +0000 (00:25 +0100)]
hash: add vectorized comparison
In lookup bulk function, the signatures of all entries
are compared against the signature of the key that is being looked up.
Now that all the signatures are together, they can be compared
with vector instructions (SSE, AVX2), achieving higher lookup performance.
Also, entries per bucket are increased to 8 when using processors
with AVX2, as 256 bits can be compared at once, which is the size of
8x32-bit signatures.
Signed-off-by: Byron Marohn <byron.marohn@intel.com> Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>
Byron Marohn [Tue, 4 Oct 2016 23:25:13 +0000 (00:25 +0100)]
hash: reorganize bucket structure
Move current signatures of all entries together in the bucket
and same with all alternative signatures, instead of having
current and alternative signatures together per entry in the bucket.
This will be benefitial in the next commits, where a vectorized
comparison will be performed, achieving better performance.
The alternative signatures have been moved away from
the current signatures, to make the key indices be consecutive
to the current signatures, as these two fields are used by lookup,
so they are in the same cache line.
Pablo de Lara [Tue, 4 Oct 2016 23:25:12 +0000 (00:25 +0100)]
hash: reorder hash structure
In order to optimize lookup performance, hash structure
is reordered, so all fields used for lookup will be
in the first cache line.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>
For periodic timers, if the lag gets introduced, the current code
added additional delay when the next peridoc timer was initialized
by not taking into account the delay added, with this fix the code
would start the next occurrence of timer keeping in account the
lag added. Corrected the behavior.
Fixes: 9b15ba89 ("timer: use a skip list") Signed-off-by: Karmarkar Suyash <skarmarkar@sonusnet.com> Acked-by: Robert Sanford <rsanford@akamai.com>
Jean Tourrilhes [Tue, 4 Oct 2016 17:17:03 +0000 (10:17 -0700)]
mem: fix hugepage mapping error messages
Running secondary is tricky due to the need to map the memory region
at the right place in VM, which is whatever primary has chosen. If the
base address for primary happens to by already mapped in the
secondary, we will hit precisely these error messages (depending if we
fail on the config region or the hugepages). This is why there is
already a comment about ASLR.
The issue is that in most cases, remapping does not happen and "errno"
is not changed and therefore stale. In our case, we got a "permission
denied", which sent us down the wrong track. It's such a common error
for secondary that I feel this error message should be unambiguous and
helpful.
The call to close was also moved because close() may override errno.
When compiling with C++, it treats
void (*rte_delay_us)(unsigned int us);
as definition of the global variable.
So further linking with librte_eal fails.
Fixes: b4d63fb62240 ("eal: customize delay function")
Steps to reproduce:
$ cat rttm1.cpp
using namespace std;
int main(int argc, char *argv[])
{
int ret = rte_eal_init(argc, argv);
rte_delay_us(1);
cout << "return code ";
cout << ret;
return ret;
}
$ g++ -m64 -I/${RTE_SDK}/${RTE_TARGET}/include -c -o rttm1.o rttm1.cpp
$ gcc -m64 -pthread -o rttm1 rttm1.o -ldl -Wl,-lstdc++ \
-L/${RTE_SDK}/${RTE_TARGET}/lib -Wl,-lrte_eal
.../librte_eal.a(eal_common_timer.o):
(.bss+0x0): multiple definition of `rte_delay_us'
rttm1.o:(.bss+0x0): first defined here
collect2: error: ld returned 1 exit status