git.droids-corp.org - dpdk.git/log

vhost: fix crash with multiqueue enabled

The patch fixes wrong handling of virtqueue array index when
GET_VRING_BASE message comes.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

i40e: support flow control

Feature Add: Rx/Tx flow control support for the i40e

All the Rx/Tx LFC enable/disable operation is done by the F/W,
so PMD driver need to use the Set PHY Config AD command to trigger the PHY
to do the auto-negotiation, after the Tx/Rx pause ability is negotiated,
the F/W will help us to set the related LFC enable/disable registers.
PMD driver also need to configure the related registers to control
how often to send the pause frame and what the value in the pause frame.

Signed-off-by: Zhe Tao <zhe.tao@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>

i40evf: fix write back with virtual channel 1.1

If DPDK is used on VF while the host is using Linux Kernel driver
as PF driver on FVL NIC, then VF Rx is reported only in batches of
4 packets. It is due to the kernel driver assumes VF driver is working
in interrupt mode, but DPDK VF is working in Polling mode.
This patch fixes this issue by using the V1.1 virtual channel with
Linux i40e PF driver.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>

i40e: drop flow control frames from VF

This patch adds a workaround to drop flow control frames from being
transmitted from VSIs.
With this patch in place a malicious VF cannot send flow control or PFC
packets out on the wire.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>

app/testpmd: modify mac in csum forwarding

For some ethnet-switch like intel RRC, all the packet forwarded
out by DPDK will be dropped in switch side, so the packet
generator will never receive the packet.

Signed-off-by: Michael Qiu <michael.qiu@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

fm10k: support Boulder Rapid device

Boulder Rapid is Intel new NIC within fm10k family.
This patch make DPDK driver support this new NIC.

Signed-off-by: Michael Qiu <michael.qiu@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>

fm10k: enable TSO support

This patch enables fm10k TSO feature for both non-tunneling packet
and tunneling packet.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Acked-by: Michael Qiu <michael.qiu@intel.com>

examples/l3fwd-power: disable Rx interrupt when waking up

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>

e1000: add Rx interrupt handler

When datapath rxq interrupt is enabled, enable related device rxq.
Remove the interrupt handler after device stopped.
e1000 only support one type of interrupt cause, so remove lsc interrupt
handler if rxq enabled.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>

e1000: support Rx interrupt setup

Enable rx interrupt support on e1000 physical and emulated device.
Implement rxq interrupt related functions in eth_dev_ops structure.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>

e1000: separate link and Rx interrupt disabling

Separate lsc and rxq interrupt for they have different interrupt handlers.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>

e1000: restrict link interrupt setup scope

Only mask lsc interrupt bit when setup device interrupt.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>

ixgbe: support new flow director modes for X550

Implement the new CLIs for fdir mac vlan and tunnel modes, including
flow_director_filter and flow_director_mask. Set the mask of fdir.
Add, delete or update the entities of filter.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

app/testpmd: new flow director commands

The different fdir mode needs different parameters, so, the parameter *mode*
is introduced to the CLI flow_director_filter and flow_director_mask. This
parameter can pormpt the user to input the appropriate parameters for different
mode.
Please be aware, as we should set the fdir mode, the value of the parameter
pkt-filter-mode, when we start testpmd. We cannot set a different mode for
mask or filter.

The new CLIs are added for the mac vlan and tunnel modes, like this,
flow_director_mask X mode MAC-VLAN vlan XXXX mac XX,
flow_director_mask X mode Tunnel vlan XXXX mac XX tunnel-type X tunnel-id XXXX,
flow_director_filter X mode MAC-VLAN add/del/update mac XX:XX:XX:XX:XX:XX
vlan XXXX flexbytes (X,X) fwd/drop queue X fd_id X,
flow_director_filter X mode Tunnel add/del/update mac XX:XX:XX:XX:XX:XX
vlan XXXX tunnel NVGRE/VxLAN tunnel-id XXXX flexbytes (X,X) fwd/drop queue X
fd_id X.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

app/testpmd: show new flow director modes

There're fdir mask and supported flow type in the output of the CLI,
show port fdir. But not every parameter has meaning for all the fdir
modes, and the supported flow type is meaningless for mac vlan and
tunnel modes. So, we output different thing for different mode.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

app/testpmd: add parameters for new flow director modes

For testpmd CLI's parameter pkt-filter-mode, there're new values supported for
fdir new modes, perfect-mac-vlan, perfect-tunnel.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

app/testpmd: initialize new flow director masks

When a port is enabled, there're default values for the parameters of
fdir mask. For the new parameters, the default values also need to be
set.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

ethdev: add more flow director modes

Define the new modes and modify the filter and mask structures for
the mac vlan and tunnel modes.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

igb: fix build without ieee1588 enabled

This patch marks rxq with RTE_SET_USED in
rx_desc_hlen_type_rss_to_pkt_flags(), when
ieee1588 is disabled. Previously a compilation
error occurred on unused-parameter.

Fixes: 1ce6591e238a ("igb: fix ieee1588 frame identification in i210")

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

ixgbevf: support RSS reta/hash query and update

This patch implements the VF RSS reta/hash query and update function
on 10G NICs. But the update function is only provided for x550. Because
the other NICs don't have the separate registers for VF, we don't want
to let a VF NIC change the shared RSS reta/hash registers. It may cause
PF and other VF NICs' behavior change without being noticed.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

ixgbevf: support RSS config on x550

On x550, there're separate registers provided for VF RSS while on the other
10G NICs, for example, 82599, VF and PF share the same registers.
This patch lets x550 use the VF specific registers when doing RSS configuration
on VF. The behavior of other 10G NICs doesn't change.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

ixgbe: support 512 RSS entries on x550

Comparing with the older NICs, x550's RSS redirection table is enlarged to 512
entries. As the original code is for the NICs which have a 128 entries RSS table,
it means only part of the RSS table is set on x550. So, RSS cannot work as
expected on x550, it doesn't redirect the packets evenly.
This patch configs the entries beyond 128 on x550 to let RSS work well, and also
update the query and update functions to support 512 entries.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

ixgbe: drop flow control frames from VF

This patch will drop flow control frames from being transmitted
from VSIs.
With this patch in place a malicious VF cannot send flow control
or PFC packets out on the wire.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>

ixgbe: prefetch cacheline after pointer becomes valid

At the original point the rx_pkts[pos( + n)] pointers are not initialized,
so the code is prefetching random data.

Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

ixgbe: fix access to last byte of EEPROM

Incorrect operator in ixgbe_get_eeprom & ixgbe_set_eeprom prevents
last byte of EEPROM being read/written, and hence cannot be dumped
or updated in entirity using these functions.

Fixes: 0198848a47f5 ("ixgbe: add access to specific device info")

Signed-off-by: Remy Horton <remy.horton@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

ixgbe: add MAC short packet discard count to Rx errors

This patch adds the mspdc (MAC Short Packet Discard Count)
to the total rx errors, as discussed on the dev@dpdk mailing
list: http://comments.gmane.org/gmane.comp.networking.dpdk.devel/23717

Suggested-by: Igor Ryzhov <iryzhov@arccn.ru>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

ixgbe: remove jabber count from Rx errors

Remove receive jabber count (rjc) from ierrors count as the
register overlaps with the CRC error register, previously
causing some packets to be counted twice.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Maryam Tahhan <maryam.tahhan@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

ixgbe: fix statistics for 82598/82599 differences

Ixgbe based 82598 and 82599 have different priority receive link-on
register addresses. This is solved in base/ by providing in the
PXONRXC and PXONXCNT as separate macros. This patch ensures the
correct address is read, avoiding reading garbage values.

Also PXON2OFFCNT doesn't exist in 82598, so it is not read for
that MAC.

This issue has existed since the drivers were imported into DPDK,
but was not easily discoverable as xstats were not available.
Tested using testpmd> show port xstats all

Fixes: af75078fece3 ("first public release")

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

ixgbevf: fix TSO support

When setting TSO on VF ixgbe NICs, for example, 82599, x550, the
prompt that TSO is not supported will be printed. But TSO is
supported by VF ixgbe NICs.
We should add TSO to the capability flag, so, we will not see the
wrong prompt.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>

ixgbevf: fix statistic wraparound

Fix a misinterpretation of VF stats in ixgbe

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Roger Melton <rmelton@cisco.com>

igbvf: fix statistic wraparound

Fix a misinterpreatation of VF statistic macro in e1000/igb.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Roger Melton <rmelton@cisco.com>

igb: fix ieee1588 frame identification in i210

Fixed issue where the flag PKT_RX_IEEE1588_PTP was not being set
in Intel I210 NIC, as EtherType in RX descriptor is in bits 8:10 of
Packet Type and not in the default bits 0:2.

Fixes known issue "IEEE1588 support possibly not working
with an Intel Ethernet Controller I210 NIC"

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

igb: enable TSO support

This patch enables igb TSO feature, the feature works on both PF and VF.
The TCP segmentation offload needs to write the offload related information
into the advanced context descriptors, which is similar to checksum offload.

Signed-off-by: Wang Xiao W <xiao.w.wang@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000: fix total byte statistics

This patch fixes a bug in reading the 64 bit register reading
which was causing the total octets counters to show zero.
Now the code reads both the lower and higher 32 bits.
Tested in testpmd, byte values are correct.

Fixes: 805803445a02 ("e1000: support EM devices (also known as e1000/e1000e)")

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000: add new i218 devices

Add the new e1000 devices to the DPDK PCI device list.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: minor changes

Some minor code change. No functionality impact.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: allow both ULP and EEE in Sx state

This patch implements a modified flow that allows both ULP and EEE
in Sx (Sticky mode).

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: synchronize PHY interface on non-ME systems

On power up, the MAC - PHY interface needs to be set to PCIe, even if
cable is disconnected. In ME systems, the ME handles this on exit from
Sx(Sticky mode) state. In non-ME, the driver handles it. Added a check
for non-ME system to the driver code that handles that.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: increase timeout of reset check

Previously, in check_reset_block RSPCIPHY was polled for 100 ms before determining
that the ME veto is set. This needed to be increased to 300 ms.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: initialize 88E1543 PHY

The initialization process for 88E1543 PHY.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: disable IPv6 extension header parsing

All 1G Server products need to have IPv6 extension headers turned off.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: prevent ULP flow if cable connected

Enabling ulp on link down when cable is connect caused an infinite
loop of linkup/down indications in the NDIS driver.
After discussed, correct flow is to enable ULP only when cable is
disconnected.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: add flags to set EEE advertisement modes

Requires driver changes!

Change e1000_set_eee_i350 and e1000_set_eee_i354 to have flags allowing
changes in the advertised EEE speeds.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: support inverted format ETrackId

There are some images which contain ETrackID in inverted format. This patch
allows reading this format.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: support different EEARBC for i210

EEARBC has changed on i210. It means EEARBC has a different address on
i210 than on other NICs. So, add a new entity named EEARBC_I210 to the
register list and make sure the right one is being used on i210.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: add bit to disable packetbuffer read

Added bit FEXTNVM7[18], that controls disabling MAC packet buffer read.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: add defaults for i210 Rx/Tx PBSIZE

These are the defaults for the packet buffer size registers that need to
be explicitly set back if someone changes them and comes back to a normal
driver.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: fix K1 configuration

This patch is for the following updates to the K1 configurations:
Tx idle period for entering K1 should be 128 ns.
Minimum Tx idle period in K1 should be 256 ns.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: fix link detect flow

In case that auto-negotiate is not enabled, call
e1000_setup_copper_link_generic instead of e1000_phy_setup_autoneg.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: fix link check for i354 88E1112 PHY

e1000_check_for_link_media_swap() is supposed to check PHY page 0 for
copper and PHY page 1 for "other" (fiber) link. We switched back from
page 1 to page 0 too soon, before e1000_check_for_link_82575() is
executed and we were never finding link on fiber (other).

Note: The precedence of link type is controlled by the PHY settings.

If the link is copper, as the M88E1112 page address is set to 1, it should be
set back to 0 before checking this link.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: fix beacon duration for i217

Fix for I217 Packet Loss issue - The Management Engine sets the FEXTNVM4
Beacon Duration incorrectly. This fix ensures that the correct value will
always be set. Correct value for this field is 8 usec.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: fix TIPG for non 10 half duplex mode

TIPG value is increased when setting speed to 10 half to prevent
packet loss. However, it was never decreased again when speed
changes. This caused performance issues in the NDIS driver.
Fix this to restore TIPG to default value on non 10 half.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: fix reset of DH89XXCC SGMII

For DH89XXCC_SGMII, write flush leaves registers of this device trashed
(0xFFFFFFFF). Added check for this device.
Also, after both for Port SW Reset and Device Reset case, platform should
wait at least 3ms before reading any registers. Since waiting is
conditionally executed only for Device Reset - removed the condition.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: fix EEPROM access for i210

The i210 has two EEPROM access registers that are located in
non-standard offsets: EEARBC and EEMNGCTL. EEARBC was fixed previously
and EEMNGCTL should also be corrected.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: fix redundant PHY power down for i210

The wrong bit is being used in PHYREG16 for PHY power down. In addition,
the use of PHYREG 16 is unnecessary if bit 11 of PHYREG 0 is used.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: fix jumbo frame CRC failures

This is a patch to change the value of register 776.20[11:2] for jumbo
mode from 0x1A to 0x1F. This is to enlarge the gap between read and
write pointers in the TX Fifo.
And replace the magic number with a macro by the way.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: fix link flap on 82579

Several customers have reported a link flap issue on 82579. The symptoms
are random and intermittent link losses when 82579 is connected to specific
switches. Issue has been root caused as interoperability problem between
the NIC and at least some Broadcom PHYs in the Energy Efficient Ethernet
wake mechanism.
To fix the issue, we are disabling the Phase Locked Loop shutdown in 100M
Low Power Idle. This solution will cause an increase of power in 100M EEE
link. It may cost additional 28mW in this specific mode.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: return error in resume workaround

Add u32 return value to function e1000_resume_workarounds_pchlan,
so that calling function can detect PHY access failure during resuming
flow.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: check more errors for ESB2 init and reset

Adding code where missing to handle case where calls to
e1000_read_kmrn_reg_80003es2lan and e1000_write_kmrn_reg_80003es2lan return
an error value.
Also, when accessing the E1000_KMRNCTRLSTA_INBAND_PARAM offset to disable
far-end loopback on 80003es2lan devices, make the handling of a read or
write failure consistent between hw_init and hw_reset.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: check more NVM read errors

Adding code to a case where e1000_nvn_read is called, but there is no
consideration for when the read fails (returns an error code).
Also, this patch adds an error message to a base NVM reading function that
is missing it for consistency.
This patch is not covering all cases of these conditions, it only covers
the code used by the e1000e driver.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: return code after setting receive address register

Previously, the rar_set functions were of type void, and when they failed
to program an address register they would, at most,  put a message into
the log and end.  The fact that they failed to program an address into a
address register, if checked for, should be captured and passed back to
the caller so that the drivers can deal with the situation (or not) as
they deem best.
Drivers can ignore or use the return value.  No change to base drivers
is mandated by this change unless a driver wants to handle the failure
to program an address register (e.g. evaluate the return value).

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: remove useless return variables

Although this change should be optimized out by the compiler, just
return a constant directly rather than declare a variable.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: remove obsolete comment

The "FIXME" comment is revomed from e1000_acquire_swfw_sync_80003es2lan
but forgotten being removed from e1000_acquire_swfw_sync_82575 while
the similar changes were made to both.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: cleanup unused tag

Remove all NAHUM6LP_HW tags.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: apply paranoia to macro arguments

Macro arguments need to be in parens since we can pass in expressions.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: add new devices

Add some new i218 devices.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

e1000/base: update readme and copyright

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

vhost-user: enable multiple queue

By setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit, and
VIRTIO_NET_F_MQ feature bit.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>

vhost-user: handle message to enable vring

This message is used to enable/disable a specific vring queue pair.
The first queue pair is enabled by default.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>

virtio: fix deadloop after wrong config read

The old code adjusts the config bytes we want to read depending on
what kind of features we have, but we later cast the entire buf we
read with "struct virtio_net_config", which is obviously wrong.

The wrong config reading results to a dead loop at virtio_send_command()
while starting testpmd.

The right way to go is to read related config bytes when corresponding
feature is set, which is exactly what this patch does.

Fixes: 823ad647950a ("virtio: support multiple queues")

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>

vhost: use queue id instead of constant ring index

Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id
instead, which will be set to a proper value for a specific queue
when we have multiple queue support enabled.

For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ,
so it should not break anything.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>

vhost-user: prepare multiple queue setup

All queue pairs, including the default (the first) queue pair,
are allocated dynamically, when a vring_call message is received
first time for a specific queue pair.

This is a refactor work for enabling vhost-user multiple queue;
it should not break anything as it does no functional changes:
we don't support mq set, so there is only one mq at max.

This patch is based on Changchun's patch.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>

vhost-user: announce queue number in message

Add VHOST_USER_GET_QUEUE_NUM message
to tell the frontend (qemu) how many queue pairs we support.

And it is initiated to VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>

vhost-user: support protocol features

The two protocol features messages are introduced by qemu vhost
maintainer(Michael) for extendting vhost-user interface. Here is
an excerpta from the vhost-user spec:

    Any protocol extensions are gated by protocol feature bits,
    which allows full backwards compatibility on both master
    and slave.

The vhost-user multiple queue features will be treated as a vhost-user
extension, hence, we have to implement the two messages first.

VHOST_USER_PROTOCOL_FEATURES is initialized to 0, as we don't support
any yet.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>

eal: default to using all cores

This is a useful default for simple applications where the assignment
of lcores to CPUs doesn't matter. It's also useful for more complex
applications that automatically assign tasks to cores based on the
NUMA topology.

Signed-off-by: Rich Lane <rich.lane@bigswitch.com>

eal: make the -n argument optional

Obtaining the correct value of memory channels, especially from a
running system, can be anything from difficult to plain impossible.
Since the value is merely an optimization and does not affect functionality
otherwise, its pointless to force such a guess on users initially, such
things belong to performance tuning phase.

Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: David Marchand <david.marchand@6wind.com>

mempool: use a better default for number of memory channels

Optimize for quad-channel by default, this should work well for
all the cases, better than the previous value of one anyway.

Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: David Marchand <david.marchand@6wind.com>

doc: fix syntax in testpmd user guide

Fix a number of RST issues in the testpmd user guide and
refactored the structure to:

* Remove redundant roadmap section.
* Merge Overview section into Introduction.
* Move "set fwd" to the start of its section.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

doc: fix pdf build warning

Fix a pdf doc build warning where a link wasn't recognised:

doc/guides/contributing/documentation.rst::
WARNING: unusable reference target found: inkscape.org

Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>

eal: fix memory leak in stack dump

Free the memory allocated by the backtrace_symbols
to prevent the memory leak.

Signed-off-by: Zhe Tao <zhe.tao@intel.com>

igb_uio: remove unnecessary function to get device

Return value of igbuio_get_uio_pci_dev() is already kept in priv
variable.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>

mk: quote KERNELCC to allow ccache build

Otherwise building with KERNELCC="ccache gcc" will fail:
ccache: invalid option -- 'p'

Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

mbuf: move chaining from ip_frag library

Chaining/segmenting mbufs can be useful in many places, so make it
global.

Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Signed-off-by: Johan Faltstrom <johan.faltstrom@netinsight.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

acl: improve rules sorting

Replace O(n^2) list sort with an O(n log n) merge sort.
The merge sort is based on the solution suggested in:
http://cslibrary.stanford.edu/105/LinkedListProblems.pdf
Tested sort_rules() improvement:
100K rules: O(n^2): 31382 milliseconds; O(n log n): 10 milliseconds
259K rules: O(n^2): 133753 milliseconds; O(n log n): 22 milliseconds

Signed-off-by: Mark Smith <marsmith@akamai.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

app/testpmd: detect numa socket count

Currently, there is a MAX_SOCKET macro which artificially limits the
number of NUMA sockets testpmd can use. Anything on a higher socket
ends up using socket zero. This patch replaces this with a variable
set during set_default_fwd_lcores_config() and uses RTE_MAX_NUMA_NODES
where a hard-coded max number of sockets is required.

Signed-off-by: Stephen Hurd <shurd@broadcom.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

mpipe: return error for init allocation failure

In function rte_pmd_mpipe_devinit, if rte_eth_dev_allocate
fails return error which is inline with other drivers.

Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Zhigang Lu <zlu@ezchip.com>

cfgfile: increase entry name and value sizes

This patch refers to the ABI change proposed for
librte_cfgfile(rte_cfgfile.h). In order to allow
for longer names and values, the values of macro
CFG_NAME_LEN and CFG_VAL_LEN is increased.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

examples/qos_sched: remove duplicated cfgfile library

This is a supplement for previous patch that was incomplete.
Previous commit message: This is a modification of qos_sched
example to use librte_cfgfile for parsing configuration file.

Fixes: db935d0171dd ("examples/qos_sched: use librte_cfgfile")

Signed-off-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>

eal: fix C++ build

'virtual' is a keyword and can't be used if the code is to compile with
C++ compilers.

If rte_devargs.h was included in C++ code, compilation with clang++
failed with an error. g++ did not fail, but only because of a bug
that treats it as an anonymous struct with a decl-specifier which it
ignores.

This simply renames the member to 'virt'.

Reported-by: Ming Zhao <mzhao@luminatewireless.com>
Signed-off-by: Christoph Gysin <christoph.gysin@gmail.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>

eal/linux: make alarm not affected by system time jump

Due to eal_alarm_callback() and rte_eal_alarm_set() use gettimeofday()
to get the current time, and gettimeofday() is affected by jumps.

For example, set up a rte_alarm which will be triggerd next second (
current time + 1 second) by rte_eal_alarm_set(). And the callback
function of this rte_alarm sets up another rte_alarm which will be
triggered next second (current time + 2 second).
Once we change the system time when the callback function is triggered,
it is possible that rte alarm functionalities work out of expectation.

Replace gettimeofday() with clock_gettime(CLOCK_MONOTONIC_RAW, &now)
could avoid this phenomenon.

Signed-off-by: Wen-Chi Yang <wolkayang@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

virtio: fix Coverity unsigned warnings

There are some places in virtio driver where uint16_t or int are used
where it would be safer to use unsigned.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

virtio: do not report link state feature unless available

If host does not support virtio link state (like current DPDK vhost)
then don't set the flag. This keeps applications from incorrectly
assuming that link state is available when it is not. It also
avoids useless "guess what works in the config".

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>

vhost: fix missing device checks

virtio-net search for it's device in reset_owner.
The function don't check the return result of get_config_ll_entry.
Using get_config_ll_entry in reset_owner don't show any error when the
device is not found. This patch fix this by using get_device instead
instead of get_config_ll_entry.

In user_get_vring_base, get_device return is not checked and may cause
segfault when device is not found.

Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

vhost: keep device identifier after reset owner

virtio-net clean and init device after a VHOST_USER_RESET_OWNER.
This reset device identifier to 0 and break ll_root listing logic.
This patch keep the old device identifier and re-write it on the cleaned
device.

Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

virtio: fix crash when releasing null queue

if input parameter vq is NULL, hw = vq->hw, causes a segmentation fault.

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

eal: fix io permission for virtio interrupt handler

For virtio-net pmd, the interrupt management thread must be created after
this driver has initialised so that iopl() has been properly called and
its effects are inherited by all eal children threads.

Before this change, changing link status on a virtio-net device would
trigger a segfault in the interrupt thread :

$ mkdir -p /mnt/huge
$ echo 256 > /proc/sys/vm/nr_hugepages
$ mount -t hugetlbfs none /mnt/huge
$ lspci |grep Ethernet
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
$ modprobe uio
$ insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
$ echo 0000:00:03.0 > /sys/bus/pci/devices/0000\:00\:03.0/driver/unbind
$ echo 1af4 1000 > /sys/bus/pci/drivers/igb_uio/new_id
$ ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x6 -n 3 -w 0000:00:03.0 -- -i --txqflags=0xf01 --total-num-mbufs 2048
[snip]
EAL: PCI device 0000:00:03.0 on NUMA socket -1
EAL: probe driver: 1af4:1000 rte_virtio_pmd
Interactive-mode selected
Configuring Port 0 (socket 0)
Port 0: DE:AD:DE:01:02:03
Checking link statuses...
Port 0 Link Up - speed 10000 Mbps - full-duplex
Done
testpmd>

Then, from qemu monitor:
(qemu) set_link virtio-net-pci.0 off

testpmd> Segmentation fault

Fixes: 565b85dcd9f4 ("eal: set iopl only when needed")

Reported-by: Stephen Hemminger <shemming@brocade.com>
Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>

mlx4: do not expose broadcast address in MAC list

Use the last array entry to store the broadcast address and keep it hidden
by not reporting the entire array size.

This is done to prevent DPDK applications from attempting to modify or
remove it.

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>

mlx4: save bound interface

Allows applications to retrieve the name of the related netdevice.

Signed-off-by: Francesco Santoro <francesco.santoro@6wind.com>

mlx4: fix missing offload flags in scattered Rx

They were dropped by mistake in the commit below.

Fixes: ab351fe1c95c ("mbuf: remove packet type from offload flags")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

enic: fix hash creation when not using first numa node

If dpdk is run with memory only available on socket != 0, then hash
creation will fail and flow director feature won't be available.
Fix this by asking for allocation on caller socket.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked by: Sujith Sankar <ssujith@cisco.com>