Cristian Dumitrescu [Wed, 4 Jun 2014 18:08:22 +0000 (19:08 +0100)]
port: IPv4 fragmentation
This port presents the IPv4 fragmentation operation as a Packet Framework port.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
[Thomas: update to new ip_frag library]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Cristian Dumitrescu [Wed, 4 Jun 2014 18:08:21 +0000 (19:08 +0100)]
port: ring
ring_reader input port (on top of single consumer rte_ring)
ring writer output port (on top of single producer rte_ring)
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
Cristian Dumitrescu [Wed, 4 Jun 2014 18:08:20 +0000 (19:08 +0100)]
port: ethdev
The input port ethdev_reader implements the Packet Framework port API
on top of the Intel DPDK poll mode driver for a NIC RX queue.
The output port ethdev_writer implements the Packet Framework port API
on top of the Intel DPDK poll mode driver for a NIC TX queue.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
Cristian Dumitrescu [Wed, 4 Jun 2014 18:08:19 +0000 (19:08 +0100)]
port: new packet framework API
This file defines the port operations that have to be implemented
by Packet Framework ports.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
Cristian Dumitrescu [Wed, 4 Jun 2014 18:08:17 +0000 (19:08 +0100)]
lpm: check rule existence
Added API function for LPM IPv4 and IPv6 to query for the existence
of a rule/route and return the next hop ID associated with the route
if route is present.
This is used by the Packet Framework LPM table for implementing a
routing table.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
Cristian Dumitrescu [Wed, 4 Jun 2014 18:08:18 +0000 (19:08 +0100)]
mbuf: meta-data offset
Added zero-size field (offset in data structure) to specify the beginning
of packet meta-data in the packet buffer just after the mbuf.
The size of the packet meta-data is application specific and the packet
meta-data is managed by the application.
The packet meta-data should always be accessed through the provided macros.
This is used by the Packet Framework libraries (port, table, pipeline).
There is absolutely no performance impact due to this mbuf field, as it
does not take any space in the mbuf structure (zero-size field).
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
Thomas Monjalon [Tue, 17 Jun 2014 00:31:30 +0000 (02:31 +0200)]
ip_frag: clean includes
Add required rte_byteorder in rte_ip_frag.h.
Remove useless includes in *.c files.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Thomas Monjalon [Mon, 16 Jun 2014 21:10:08 +0000 (23:10 +0200)]
examples/vhost: restrict log type namespace
RTE_LOGTYPE_CONFIG, RTE_LOGTYPE_DATA and RTE_LOGTYPE_PORT are renamed
by adding VHOST prefix.
It prevents from conflict with new RTE_LOGTYPE_PORT of packet framework.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Jingjing Wu [Mon, 16 Jun 2014 07:31:46 +0000 (15:31 +0800)]
app/testpmd: add commands for filters
add commands in testpmd for NIC filters:
add_ethertype_filter
remove_ethertype_filter
get_ethertype_filter
add_2tuple_filter
remove_2tuple_filter
get_2tuple_filter
add_5tuple_filter
remove_5tuple_filter
get_5tuple_filter
add_syn_filter
remove_syn_filter
get_syn_filter
add_flex_filter
remove_flex_filter
get_flex_filter
Signed-off-by: jingjing.wu <jingjing.wu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Jingjing Wu [Mon, 16 Jun 2014 07:31:45 +0000 (15:31 +0800)]
ixgbe: add filters
This patch adds following ixgbe NIC filters implement:
syn filter, ethertype filter, 5tuple filter for intel NIC 82599
Signed-off-by: jingjing.wu <jingjing.wu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Jingjing Wu [Mon, 16 Jun 2014 07:31:44 +0000 (15:31 +0800)]
igb: add filters
This patch adds following igb NIC filters implement:
syn filter, ethertype filter, 2tuple filter, flex filter for intel NIC 82580 and i350
syn filter, ethertype filter, 5tuple filter for intel NIC 82576
Signed-off-by: jingjing.wu <jingjing.wu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Jingjing Wu [Mon, 16 Jun 2014 07:31:43 +0000 (15:31 +0800)]
ethdev: add filters
This patch adds APIs for NIC filters list below:
ethertype filter, syn filter, 2tuple filter, flex filter, 5tuple filter
Signed-off-by: jingjing.wu <jingjing.wu@intel.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:47 +0000 (18:32 +0100)]
examples/ip_reassembly: overhaul
New stuff:
* Support for regular traffic as well as IPv4 and IPv6
* Simplified config
* Routing table printed out on start
* Uses LPM/LPM6 for lookup
* Unmatched traffic is sent to the originating port
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:46 +0000 (18:32 +0100)]
ip_frag: add IPv6 reassembly
Mostly a copy-paste of IPv4, with a few caveats.
Only supported packets are those in which fragment extension header is
just after the IPv6 header.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:45 +0000 (18:32 +0100)]
examples/ip_fragmentation: overhaul
New stuff:
* Support for regular traffic as well as IPv4 and IPv6
* Simplified config
* Routing table printed out on start
* Uses LPM/LPM6 for lookup
* Unmatched traffic is sent to the originating port
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:44 +0000 (18:32 +0100)]
examples: rename ipv4_frag example to ip_fragmentation
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:43 +0000 (18:32 +0100)]
ip_frag: add IPv6 fragmentation support
Mostly a copy-paste of IPv4.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:42 +0000 (18:32 +0100)]
ip_frag: rename ipv4_fragmentation function
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:41 +0000 (18:32 +0100)]
ip_frag: refactor reassembly code into a proper library
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:40 +0000 (18:32 +0100)]
ip_frag: rename structures in fragmentation table
Technically, fragmentation table can work for both IPv4 and IPv6
packets, so we're renaming everything to be generic enough to make sense
in IPv6 context.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:39 +0000 (18:32 +0100)]
ip_frag: remove unneeded check and macro
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:38 +0000 (18:32 +0100)]
ip_frag: new internal common header
Moved out debug log macros into common, as reassembly code will later
need them as well.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:37 +0000 (18:32 +0100)]
ip_frag: fix code style
Issues were reported by checkpatch.pl.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:36 +0000 (18:32 +0100)]
ip_frag: refactor IPv4 fragmentation into a proper library
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
[Thomas: add in doxygen]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Wed, 28 May 2014 17:32:35 +0000 (18:32 +0100)]
ip_frag: move fragmentation/reassembly headers into a library
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:54 +0000 (15:52 +0100)]
tools: add vfio support to setup script
Support for loading/unloading VFIO drivers, binding/unbinding devices
to/from VFIO, also setting up correct userspace permissions.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:53 +0000 (15:52 +0100)]
tools: support vfio in dpdk_nic_bind
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Mon, 16 Jun 2014 12:05:28 +0000 (14:05 +0200)]
tools: rename igb_uio_bind to dpdk_nic_bind
Renaming the igb_uio_bind script to dpdk_nic_bind to have a generic name
before supporting two drivers.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:52 +0000 (15:52 +0100)]
igb_uio: remove PCI id table
Removing PCI ID list to make igb_uio more similar to a generic driver
like vfio-pci or pci_uio_generic. This is done to make it easier for
the binding script to support multiple drivers.
Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.
Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This is reflected
in changes to PCI binding script as well.
There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:49 +0000 (15:52 +0100)]
eal: add command line option to select vfio interrupt type
Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy", "msi"
or "msix" if VFIO support is compiled. Note that VFIO initialization
will fail if the interrupt type selected is not supported by the system.
If the interrupt type parameter wasn't specified, VFIO will try all
interrupt types (starting with MSI-X).
In unit tests, we don't know if VFIO is compiled (eal_vfio.h header is
internal to Linuxapp EAL), so we check this flag regardless.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:48 +0000 (15:52 +0100)]
pci: enable vfio device binding
Add support for binding VFIO devices if RTE_PCI_DRV_NEED_MAPPING is set
for this driver. Try VFIO first, if not mapped then try IGB_UIO too.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:47 +0000 (15:52 +0100)]
vfio: add multiprocess support
Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.
For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.
VFIO multiprocess sync communicates over a simple protocol. It defines
two requests - request for group fd, and request for container fd.
Possible replies are: SOCKET_OK (an OK signal), SOCKET_ERR (error
signal) and SOCKET_NO_FD (a signal that indicates that the requested
VFIO group is valid, but no fd is present for that group - indicating
that the respective group is simply not bound to VFIO driver).
Here is the logic in a nutshell:
1. secondary process sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
1a. in case of SOCKET_REQ_GROUP, client also then sends group number
2. primary process receives message
2a. in case of invalid group, SOCKET_ERR is sent back to secondary
2b. in case of unbound group, SOCKET_NO_FD is sent back to secondary
2c. in case of valid group, SOCKET_OK is sent and followed by fd
3. socket is closed
in case of any error, socket is closed and SOCKET_ERR is sent.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:46 +0000 (15:52 +0100)]
vfio: DMA mapping
Adding code to support VFIO mapping (primary processes only). Most of
the things are done via ioctl() calls on either /dev/vfio/vfio (the
container) or a /dev/vfio/$GROUP_NR (IOMMU group).
In a nutshell, the code does the following:
1. creates a VFIO container (an entity that allows sharing IOMMU DMA
mappings between devices)
2. checks if a given PCI device is a member of an IOMMU group (if it's
not, this indicates that the device isn't bound to VFIO)
3. calls open() the group file to obtain a group fd
4. checks if the group is viable (that is, if all the devices in the
same IOMMU group are either bound to VFIO or not bound to anything)
5. adds the group to a container
6. sets up DMA mappings (only done once, mapping whole DPDK hugepage
memory for DMA, with a 1:1 correspondence of IOVA to PA)
7. gets the actual PCI device fd from the group fd (can fail, which
simply means that this particular device is not bound to VFIO)
8. maps BARs (MSI-X BAR cannot be mmaped, so skipping it)
9. sets up interrupt structures (but not enables them!)
10. enables PCI bus mastering
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:44 +0000 (15:52 +0100)]
vfio: interrupts
Creating code to handle VFIO interrupts in EAL interrupts (supports all
types of interrupts).
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:42 +0000 (15:52 +0100)]
vfio: header for build support
Add VFIO compilation option to linuxapp config.
Adding a header that will determine if VFIO support should be compiled
in. If VFIO is enabled in config (and it's enabled by default), then the
header will also check for kernel version. If VFIO is enabled in config
and if the kernel version is 3.6+, then VFIO_PRESENT will be defined.
This is the macro that should be used to determine if VFIO support is
being compiled in.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:41 +0000 (15:52 +0100)]
eal: move interrupt type out of igb_uio
Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:40 +0000 (15:52 +0100)]
igb_uio: make compilation optional
Currently, igb_uio is always compiled. Some Linux distributions may not
want to include igb_uio with DPDK, so we need to make sure that igb_uio
compilation for Linuxapp targets can be optional.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: HuilongX Xu <huilongx.xu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:39 +0000 (15:52 +0100)]
pci: rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:38 +0000 (15:52 +0100)]
pci: distinguish between legitimate failures and non-fatal errors
Currently, EAL does not distinguish between actual failures and expected
initialization errors. E.g. sometimes the driver fails to initialize
because it was not supposed to be initialized in the first place, such
as device not being managed by said driver.
This patch makes EAL fail on actual initialization errors while still
skipping over expected initialization errors.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:37 +0000 (15:52 +0100)]
pci: fix code style
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:36 +0000 (15:52 +0100)]
pci: move uio mapping in a dedicated file
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:35 +0000 (15:52 +0100)]
pci: rework uio mapping to prepare for vfio
Separating mapping code and calls to open. This is a preparatory work
for VFIO patch since it'll need to map BARs too but it doesn't use path
in mapped_pci_resource. Also, renaming structs to be more generic.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:50 +0000 (15:52 +0100)]
mem: make --no-huge use mmap instead of malloc
This makes it possible to run DPDK without hugepage memory when VFIO
is used, as VFIO uses virtual addresses to set up DMA mappings.
Technically, malloc is just fine, but we want to guarantee that
memory will be page-aligned, so using mmap to be safe.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Anatoly Burakov [Fri, 13 Jun 2014 14:52:45 +0000 (15:52 +0100)]
eal: remove useless compilation flag
eal_hpet.c was renamed to eal_timer.c and, thanks to code changes, does
not need the -Wno-return-type any more.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Bruce Richardson [Fri, 13 Jun 2014 22:52:24 +0000 (23:52 +0100)]
ixgbe: new vectorized functions for Rx/Tx
New file containing optimized receive and transmit functions which
use 128bit vector instructions to improve performance. When conditions
permit, these functions will be enabled at runtime by the device
initialization routines already in the PMD.
The compilation of the vectorized RX and TX code paths is controlled by
a new setting in the build time configuration for the IXGBE driver. Also
added is a setting which allows an optional further performance increase
by disabling the use of the olflags field on packet RX.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: XiaonanX Zhang <xiaonanx.zhang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
[Thomas: code-style adjustments]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Konstantin Ananyev [Fri, 13 Jun 2014 11:26:53 +0000 (12:26 +0100)]
acl: new sample l3fwd-acl
Demonstrates the use of the ACL library in the DPDK application to
implement packet classification and L3 forwarding.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
[Thomas: some code-style changes]
Konstantin Ananyev [Fri, 13 Jun 2014 11:26:52 +0000 (12:26 +0100)]
acl: new test-acl application
Usage example and main test application for the ACL library.
Provides IPv4/IPv6 5-tuple classification.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
[Thomas: some code-style changes]
Konstantin Ananyev [Fri, 13 Jun 2014 11:26:51 +0000 (12:26 +0100)]
acl: update unit tests
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Konstantin Ananyev [Fri, 13 Jun 2014 11:26:50 +0000 (12:26 +0100)]
acl: new library
The ACL library is used to perform an N-tuple search over a set of rules with
multiple categories and find the best match for each category.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
[Thomas: some code-style changes]
Stephen Hemminger [Fri, 13 Jun 2014 01:32:50 +0000 (18:32 -0700)]
virtio: fix build with debug enabled
Remove useless message that breaks if VIRTIO_DEBUG_DRIVER is defined.
virtio_ethdev.c:224:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Stephen Hemminger [Fri, 13 Jun 2014 01:32:40 +0000 (18:32 -0700)]
virtio: checkpatch cleanups
This fixes style problems reported by checkpatch including:
* extra whitespace
* spaces before tabs
* strings broken across lines
* excessively long lines
* missing spaces after keywords
* unnecessary paren's in return statements
Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Thomas Monjalon [Thu, 12 Jun 2014 12:57:24 +0000 (14:57 +0200)]
config: minor cleanup
Move things at their right location and add missing comment.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Bruce Richardson [Thu, 29 May 2014 10:12:17 +0000 (11:12 +0100)]
distributor: add unit tests
Add a set of unit tests and some basic performance test for the
distributor library. These tests cover all the major functionality of
the library on both distributor and worker sides.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Bruce Richardson [Thu, 29 May 2014 10:12:14 +0000 (11:12 +0100)]
distributor: new packet distributor library
This adds the code for a new Intel DPDK library for packet distribution.
The distributor is a component which is designed to pass packets
one-at-a-time to workers, with dynamic load balancing. Using the RSS
field in the mbuf as a tag, the distributor tracks what packet tag is
being processed by what worker and then ensures that no two packets with
the same tag are in-flight simultaneously. Once a tag is not in-flight,
then the next packet with that tag will be sent to the next available
core.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
[Thomas: add doxygen @file comment]
Konstantin Ananyev [Wed, 11 Jun 2014 13:38:46 +0000 (14:38 +0100)]
examples/l3fwd: reorganise and optimize LPM code path
With latest HW and optimised RX/TX path there is a huge gap between
tespmd iofwd and l3fwd performance results.
So there is an attempt to optimise l3fwd LPM code path and reduce the gap:
- Instead of processing each input packet up to completion -
divide packet processing into several stages and perform
stage by stage for the whole burst.
- Unroll things by the factor of 4 whenever possible.
- Use SSE instincts for some operations (bswap, replace MAC addresses, etc).
- Avoid TX packet buffering whenever possible.
- Move some checks from RX/TX into setup phase.
Note that new(optimized) code path can be switched on/off by setting
ENABLE_MULTI_BUFFER_OPTIMIZE macro to 1/0.
Some performance data:
SUT: dual-socket board IVB 2.8GHz, 2x1GB pages.
4 ports on 4 NICs (all at socket 0) connected to the traffic generator.
kernel: 3.11.3-201.fc19.x86_64, gcc: 4.8.2.
64B packets, using the packet flooding method.
All 4 ports are managed by one logical core:
Optimised scalar PMD RX/TX was used.
DIFF % (NEW-OLD)
IPV4-CONT-BURST: +23%
IPV6-CONT-BURST : +13%
IPV4/IPV6-CONT-BURST: +8%
IPV4-4STREAMSX8: +7%
IPV4-4STREAMSX1: -2%
Test cases description:
IPV4-CONT-BURST - IPV4 packets all packets from the one input port
are destined for the same output port.
IPV6-CONT-BURST - IPV6 packets all packets from the one input port
are destined for the same output port.
IPV4/IPV6-CONT-BURST - mix of the first 2 with interleave=1
(e.g: IPV4,IPV6,IPV4,IPV6, ...)
IPV4-4STREAMSX1 - 4 streams of IPV4 packets, where all packets
from same stream are destined for the same output port
(e.g: IPV4_DST_P0, IPV4_DST_P1, IPV4_DST_P2, IPV4_DST_P3, IPV4_DST_P0, ...)
IPV4-4STREAMSX8 - same as above but packets for each stream
are coming in groups of 8
(e.g: IPV4_DST_P0 X 8, IPV4_DST_P1 X 8, IPV4_DST_P2 X 8, IPV4_DST_P3 X 8,
IPV4_DST_P0 X 8, ...)
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Konstantin Ananyev [Wed, 11 Jun 2014 13:38:45 +0000 (14:38 +0100)]
lpm: introduce rte_lpm_lookupx4
Allows to lookup four IP addresses in an LPM table.
Uses SSE instrincts.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Pawel Wodkowski [Wed, 11 Jun 2014 07:20:37 +0000 (08:20 +0100)]
pci: remove conditions on device definitions
This patch removes obsolete code that prevents defining
NICs
82575EB, I218 and I350.
Signed-off-by: Pawel Wodkowski <pawelx.wdkowski@intel.com>
[Thomas: remove conditions for I218]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ouyang Changchun [Mon, 26 May 2014 07:45:31 +0000 (15:45 +0800)]
app/testpmd: Tx rate limitation for queue and VF
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Ouyang Changchun [Mon, 26 May 2014 07:45:30 +0000 (15:45 +0800)]
ixgbe: Tx rate limitation for queue and VF
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Ouyang Changchun [Mon, 26 May 2014 07:45:29 +0000 (15:45 +0800)]
ethdev: Tx rate limitation for queue and VF
Add API to support setting TX rate for a queue and a VF.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Ouyang Changchun [Wed, 28 May 2014 07:15:02 +0000 (15:15 +0800)]
app/testpmd: add commands for link up and down
This patch adds commands to test the functionality of setting link up and down.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
Ouyang Changchun [Wed, 28 May 2014 07:15:01 +0000 (15:15 +0800)]
ixgbe: link up and down
It is implemented by enabling or disabling TX laser.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
Ouyang Changchun [Wed, 28 May 2014 07:15:00 +0000 (15:15 +0800)]
ethdev: API for link up and down
This patch adds API to support the functionality of setting link up and down.
It can be used to repeatedly stop and restart RX/TX of a port without
re-allocating resources for the port and re-configuring the port.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
Konstantin Ananyev [Mon, 9 Jun 2014 17:26:17 +0000 (18:26 +0100)]
ethdev: fix compiler warning on PMD_DEBUG_TRACE formats
icc 12.1 complains about RTE_LOG() format:
"argument is incompatible with corresponding format string conversion"
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Konstantin Ananyev [Mon, 9 Jun 2014 17:26:16 +0000 (18:26 +0100)]
ethdev: prevent from starting/stopping already started/stopped device
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Konstantin Ananyev [Mon, 9 Jun 2014 17:26:15 +0000 (18:26 +0100)]
igb/ixgbe: reset queue pointers after releasing
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Konstantin Ananyev [Mon, 9 Jun 2014 17:26:14 +0000 (18:26 +0100)]
e1000: do not release queue on alloc error
If igb_alloc_rx_queue_mbufs() would fail to allocate an mbuf for RX queue,
it calls igb_rx_queue_release(rxq).
That causes rxq to be silently freed, without updating
dev->data->rx_queues[].
So any further reference to it will trigger the SIGSEGV.
Same thing in em PMD too.
To fix: igb_alloc_rx_queue_mbufs() should just return an error to the
caller and let upper layer to deal with the probem.
That's what ixgbe PMD is doing right now.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Bruce Richardson [Tue, 3 Jun 2014 23:42:50 +0000 (00:42 +0100)]
remove trailing whitespaces
This commit removes trailing whitespace from lines in files. Almost all
files are affected, as the BSD license copyright header had trailing
whitespace on 4 lines in it [hence the number of files reporting 8 lines
changed in the diffstat].
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
[Thomas: remove spaces before tabs in libs]
[Thomas: remove more trailing spaces in non-C files]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Alan Carew [Thu, 5 Jun 2014 16:12:08 +0000 (17:12 +0100)]
pci: fix build for FreeBSD
Add __rte_unused to
pci_unbind_kernel_driver(struct rte_pci_device *dev)
Signed-off-by: Alan Carew <alan.carew@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Alan Carew [Thu, 5 Jun 2014 16:12:07 +0000 (17:12 +0100)]
eal: fix build for FreeBSD
Recent change to rte_dump_tailq (commit
591a9d7985c1230652),
which now uses a FILE parameter causes compilation to fail under FreeBSD
and sourced to a missing include of stdio.h.
Errors:
rte_tailq.h: unknown type name 'FILE' void rte_dump_tailq(FILE *f);
rte_memory.h: unknown type name 'FILE' void rte_dump_physmem_layout(FILE *f);
Signed-off-by: Alan Carew <alan.carew@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Thomas Monjalon [Tue, 10 Jun 2014 13:42:57 +0000 (15:42 +0200)]
mk: factorize config rules
Error message for missing template is factorized in notemplate rule.
RTE_OUTPUT directory is marked as order-only prerequisite.
RTE_OUTPUT is always created after having been cleaned for rte_config.h.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by Olivier Matz <olivier.matz@6wind.com>
Bruce Richardson [Wed, 14 May 2014 15:55:10 +0000 (16:55 +0100)]
mk: allow updates to build config on make install
When running "make config", an additional config.orig file is also
generated, which is intended to hold the original, clean configuration
from the template.
When running make install, we first check if there is no existing
.config file, and run make config if not. If there is a file, we then
check if it's unmodified, in which case we regenerate a new .config to
take account of any possible updates to the template. Finally, in the
case where there is an existing .config file, and it HAS been modified,
we then do a check to see if the template has had further updates, and
throw an error if so. If no updates, we continue with the build using
the existing, user-modified config.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
Thomas Monjalon [Mon, 19 May 2014 21:45:03 +0000 (23:45 +0200)]
mk: fix 32-bit link with gcc
Some linker options were not prefixed by -Wl, when using CC:
-z muldefs
-melf_i386 (CPU_LDFLAGS in 32-bit config)
I didn't see any error with -z muldefs but it isn't documented in gcc
manual. So it's safer to explicitly pass it to the linker.
Also building 32-bit shared library raises this error:
gcc: error: unrecognized command line option ‘-melf_i386’
Using macro linkerprefix fixes it.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Konstantin Ananyev [Wed, 28 May 2014 14:47:02 +0000 (15:47 +0100)]
pcap: fix Tx mbuf corruption
If pcap_sendpacket() fails, then eth_pcap_tx shouldn't silently free that
mbuf and continue.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Jijiang Liu [Tue, 3 Jun 2014 13:00:15 +0000 (21:00 +0800)]
xen: fix memory size calculation
The unit of allocated_size is MB, so the change below is made.
Otherwise, it will fail to free memory when available memory is not enough.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Jijiang Liu [Tue, 3 Jun 2014 12:59:33 +0000 (20:59 +0800)]
xen: fix for contiguous region API in kernel 3.13
Since Linux kernel version 3.13.0,
the xen_create/destroy_contiguous_region() API has been changed,
and the first parameter is physical address in the API.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Heng Ding <hengx.ding@intel.com>
Jijiang Liu [Fri, 23 May 2014 06:53:10 +0000 (14:53 +0800)]
xen: reserve memory at installing dom0_mm.ko
The patch changes the way of reserving memory in Dom0 driver.
It will reserve memory at installing rte_dom0_mm.ko kernel module
instead of requesting memory dynamically during DPDK application startup.
Meanwhile, now driver requests memory size of 4M once first,
if it failed, and request memory size of 2M once.
The main reasons for these changes are as follows:
First, to reduce the impact of increasing in memory fragment
after system run a long time.
Second, to reduce number of memory segment.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ouyang Changchun [Thu, 29 May 2014 07:18:20 +0000 (15:18 +0800)]
virtio: support multiple queues
This patch supports multiple queues feature in DPDK based virtio-net frontend.
It firstly gets max queue number of virtio-net from virtio PCI configuration and
then send command to negotiate the queue number with backend; When receiving and
transmitting packets, it negotiates multiple virtio-net queues which serve RX/TX;
To utilize this feature, the backend also need support multiple queues feature
and enable it.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ouyang Changchun [Thu, 29 May 2014 07:18:19 +0000 (15:18 +0800)]
virtio: code-style cleanup
This patch cleanups some coding style issue, and fixes some errors and warnings
reported by checkpatch.pl.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ouyang Changchun [Wed, 28 May 2014 08:06:38 +0000 (16:06 +0800)]
examples/vhost: zero copy mode
This patch supports user space vhost zero copy. It removes packets copying between host and guest in RX/TX.
It introduces an extra ring to store the detached mbufs. At initialization stage all mbufs will put into
this ring; when one guest starts, vhost gets the available buffer address allocated by guest for RX and
translates them into host space addresses, then attaches them to mbufs and puts the attached mbufs into
mempool.
Queue starting and DMA refilling will get mbufs from mempool and use them to set the DMA addresses.
For TX, it gets the buffer addresses of available packets to be transmitted from guest and translates
them to host space addresses, then attaches them to mbufs and puts them to TX queues.
After TX finishes, it pulls mbufs out from mempool, detaches them and puts them back into the extra ring.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ouyang Changchun [Wed, 28 May 2014 08:06:37 +0000 (16:06 +0800)]
ixgbe: queue start and stop
This patch implements queue start and stop functionality in IXGBE PMD;
it also enable hardware loopback for VMDQ mode in IXGBE PMD.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ouyang Changchun [Wed, 28 May 2014 08:06:36 +0000 (16:06 +0800)]
ethdev: queue start and stop
This patch adds API to support queue start and stop functionality for RX/TX.
It allows RX and TX queue is started or stopped one by one, instead of starting
and stopping all of them at the same time.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ivan Boule [Fri, 16 May 2014 08:58:43 +0000 (10:58 +0200)]
app/testpmd: allow to configure RSS hash key
Add the command "port config X rss-hash-key key" in the 'testpmd'
application to configure the RSS hash key used to compute the RSS
hash of input [IP] packets received on port X.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ivan Boule [Fri, 16 May 2014 08:58:42 +0000 (10:58 +0200)]
ethdev: allow to get RSS hash functions and key
1) Add a new function "rss_hash_conf_get" in the PMD API to retrieve the
current configuration of the RSS functions and/or of the RSS key used
by a NIC to compute the RSS hash of input packets.
The new function uses the existing data structure "rte_eth_rss_conf" for
returning the RSS hash configuration.
2) Add the ixgbe-specific function "ixgbe_dev_rss_hash_conf_get" and the
igb-specific function "eth_igb_rss_hash_conf_get" to retrieve the RSS
hash configuration of ixgbe and igb controllers respectively.
3) Add the command "show port X rss-hash [key]" in the testpmd application
to display the RSS hash configuration of port X.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ivan Boule [Fri, 16 May 2014 08:58:41 +0000 (10:58 +0200)]
app/testpmd: configure RSS without restart
The function cmd_config_rss_parsed() associated with the command
"port config rss all" required to first stop all ports, in order to
then entirely re-configure all ports with the new RSS hash computation
parameters.
Use now the new function rte_eth_dev_rss_hash_conf_update() that dynamically
only changes the RSS hash computation parameters of a port, without needing
to previously stop the port.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ivan Boule [Fri, 16 May 2014 08:58:40 +0000 (10:58 +0200)]
ethdev: allow to set RSS hash computation flags and/or key
1) Add a new function "rss_hash_update" in the PMD API to dynamically
update the RSS flags and/or the RSS key used by a NIC to compute the RSS
hash of input packets.
The new function uses the existing data structure "rte_eth_rss_conf" for
the argument that contains the new hash flags and/or the new hash key to
use.
2) Add the ixgbe-specific function "ixgbe_dev_rss_hash_update" and the
igb-specific function "eth_igb_rss_hash_update" to update the RSS
hash configuration of ixgbe and igb controllers respectively.
Before changing anything, these 2 functions check that the update RSS
operation does not attempt to disable RSS, if RSS was enabled at port
initialization time, or does not attempt to enable RSS, if RSS was
disabled at port initialization time.
Note:
Configuring the RSS hash flags and the RSS key used by a NIC consists in
updating appropriate PCI registers of the NIC.
These operations have been manually tested with the interactive commands
"write reg" and "write regbit" of the testpmd application.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ivan Boule [Fri, 16 May 2014 08:58:39 +0000 (10:58 +0200)]
ethdev: check RETA queue indices against number of queues
Each entry of the RSS redirection table (RETA) of igb and ixgbe ports
contains a 4-bit RX queue index, thus imposing RSS RX queue indices to
be strictly lower than 16.
In addition, if a RETA entry is configured with a RX queue index that is
strictly lower than 16, but is greater or equal to the number of RX queues
of the port, then all input packets whose RSS hash value indexes that RETA
entry are silently dropped by the NIC.
Make the function rte_eth_dev_rss_reta_update() check that RX queue indices
that are supplied in the reta_conf argument are strictly lower than
ETH_RSS_RETA_MAX_QUEUE (16) and are strictly lower than the number of
RX queues of the port.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Konstantin Ananyev [Tue, 6 May 2014 14:33:01 +0000 (15:33 +0100)]
igbvf: fix mac type for 82576
e1000_vfadapt type corresponds to 82576 VF devices,
check e1000_set_mac_type() for more details.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ivan Boule <ivan.boule@6wind.com>
Ivan Boule [Mon, 12 May 2014 14:12:30 +0000 (16:12 +0200)]
ixgbevf: assign a default mac address
When initializing a VF with no initial MAC address assigned by
the underlying Host PF driver, assign a default MAC address.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ivan Boule [Mon, 12 May 2014 14:11:54 +0000 (16:11 +0200)]
ixgbevf: reset unused mailbox data registers
The VF_RESET message of the 82599 PF/VF communication protocol issued by a
a Guest VF driver may include an optional permanent MAC address assigned to
the VF by the Guest OS, in order to make it recorded into the 82599 RAR
registers by the Host PF driver.
To indicate the absence of this optional MAC address, the VF_RESET command
assumes that a NULL MAC address is sent, instead of using a dedicated bit
for this purpose. However, when sending a VF_RESET command with no permanent
MAC address, the function ixgbe_reset_hw_vf() of the 82599 VF driver
directly invokes the function ixgbe_write_mbx_vf() with a message that does
not include a NULL MAC address, wrongly assuming that this function fills in
with zero all unused mailbox data registers.
More globally, it is safer to explicitely reset to zero all remaining mailbox
data registers that are not used to store the content of a message, in order
to reset the data sent in a previous VF/PF exchange (in either side),
including the last exchange performed by another Guest OS to which that VF
was previously assigned.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ivan Boule [Mon, 12 May 2014 14:11:36 +0000 (16:11 +0200)]
ixgbevf: skip null and permanent mac addresses
On a 82599 VF, the deletion of a dynamically added MAC address consists in
first flushing all added MAC addresses, then in adding again all remaining MAC
addresses.
For this purpose, the function ixgbevf_remove_mac_addr() parses the pool
of MAC addresses associated with a VF, and must skip the VF permanent MAC
address that is stored into it, as well as all NULL MAC addresses.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ivan Boule [Mon, 12 May 2014 14:11:06 +0000 (16:11 +0200)]
ixgbevf: avoid adding twice the permanent mac address
During the initialization of a VF device, the rte_eth_dev_start() function
indirectly invokes the PMD "mac_addr_add" function with the permanent MAC
address assigned to the device.
In the case of 82599 VFs, this operation leads to exhausting the very
limited set of PF resources used to store VF MAC addresses.
To address this issue, do nothing in the function ixgbevf_add_mac_addr()
if the added MAC address is equal to the permanent MAC address of the VF.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Ivan Boule [Mon, 12 May 2014 14:10:46 +0000 (16:10 +0200)]
ixgbevf: add/remove mac address
Add missing PMD functions in the ixgbevf driver to add (respectively remove)
a MAC address to/from a 82599 VF.
For this purpose, these 2 functions use the VF/PF mailbox-based protocol.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Konstantin Ananyev [Tue, 6 May 2014 14:31:12 +0000 (15:31 +0100)]
ixgbevf: fix jumbo frame
When latest Linux ixgbe PF is used, and DPDK VF is used in DPDK application,
jumbo frames are not received.
Also - if Linux ixgbe PF has MTU set to 1500 (default),
then normal sized packets can be received by DPDK VF.
However, if Linux PF has MTU > 1500, then DPDK VF receives no packets
(normal or jumbo).
With ixgbe_mbox_api_10 ixgbe simply didn't allow set VF MTU > 1514 for 82599.
With ixgbe_mbox_ajpi_11 it does, though now, if PF uses jumbo frames,
it simply disables RX for all VFs.
So to work with PF ithat using jumbo frames, at startup each VF has to:
1. negotiate with PF mbox_api_11.
2. Send to PF SET_LPE message with desired MTU.
Note, that if PF already uses MTU bigger then asked by the VF,
then PF wouldn't take any action.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ivan Boule <ivan.boule@6wind.com>
Vadim Suraev [Wed, 21 May 2014 20:35:55 +0000 (23:35 +0300)]
timer: fix pending counter
Bug: When a timer is running
- if rte_timer_stop is called, the pending decrement is
skipped (decremented only if the timer is pending) and due
to the update flag the future processing is skipped so the
timer is counted as pending while it is stopped. - the same
applies when rte_timer_reset is called but then the pending
statistics is additionally incremented so the timer is
counted pending twice.
Solution: decrement the pending
statistics after returning from the callback. If
rte_timer_stop was called, it skipped decrementing the
pending statistics. If rte_time_reset was called, the
pending statistics was incremented. If neither was called
and the timer is periodic, the pending statistics is
incremented when it is reloaded
Signed-off-by: Vadim Suraev <vadim.suraev@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Vadim Suraev [Wed, 21 May 2014 19:53:45 +0000 (22:53 +0300)]
timer: fix reloading after changes
Bug: when a periodic timer's callback is running, if another
timer is manipulated, the periodic timer is not reloaded.
Solution: set the update flag only if the modified timer is
in RUNNING state
Signed-off-by: Vadim Suraev <vadim.suraev@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Cristian Dumitrescu [Fri, 23 May 2014 15:21:24 +0000 (16:21 +0100)]
cmdline: fix infinite loop after EOF
Stop on EOF when reading commands from a file or a pipe.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Bruce Richardson [Fri, 23 May 2014 14:39:24 +0000 (15:39 +0100)]
mk: fix linking drivers in static apps
The variable CONFIG_RTE_BUILD_SHARED_LIB was in rte.app.mk as
"RTE_BUILD_SHARED_LIB", which meant that none of the example apps linked
in the PMDs and just didn't work with any eth ports in any static builds.
This bug has been introduced in commit
3660cdf990:
pcap: convert to use of PMD_REGISTER_DRIVER and fix linking
Link for l2fwd before patch:
"... -Wl,--whole-archive -Wl,-lrte_kni -Wl,-lrte_timer -Wl,-lrte_hash
-Wl,-lrte_lpm -Wl,-lrte_power -Wl,-lrte_meter -Wl,-lrte_sched -Wl,-lm
-Wl,-lrt -Wl,--start-group -Wl,-lrte_kvargs -Wl,-lrte_mbuf -Wl,-lethdev
-Wl,-lrte_malloc -Wl,-lrte_mempool -Wl,-lrte_ring -Wl,-lrte_eal
-Wl,-lrte_cmdline -Wl,-lrt -Wl,-lm -Wl,-ldl -Wl,--end-group
-Wl,--no-whole-archive"
Link for l2fwd after patch:
"... -Wl,--whole-archive -Wl,-lrte_kni -Wl,-lrte_timer -Wl,-lrte_hash
-Wl,-lrte_lpm -Wl,-lrte_power -Wl,-lrte_meter -Wl,-lrte_sched -Wl,-lm
-Wl,-lrt -Wl,--start-group -Wl,-lrte_kvargs -Wl,-lrte_mbuf -Wl,-lethdev
-Wl,-lrte_malloc -Wl,-lrte_mempool -Wl,-lrte_ring -Wl,-lrte_eal
-Wl,-lrte_cmdline -Wl,-lrte_pmd_vmxnet3_uio -Wl,-lrte_pmd_virtio_uio
-Wl,-lrte_pmd_ixgbe -Wl,-lrte_pmd_e1000 -Wl,-lrte_pmd_ring -Wl,-lrt
-Wl,-lm -Wl,-ldl -Wl,--end-group -Wl,--no-whole-archive"
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Helin Zhang [Thu, 22 May 2014 03:35:17 +0000 (11:35 +0800)]
kni: fix build on Fedora 18 with kernel 3.6.10
Error of "implicit-function-declaration" can be seen when building KNI
kernel module on Linux kernel 3.6.10 platform, as follows.
lib/librte_eal/linuxapp/kni/igb_ethtool.c:
In function igb_get_eee:
lib/librte_eal/linuxapp/kni/igb_ethtool.c:
2441:4: error: implicit declaration of function
mmd_eee_adv_to_ethtool_adv_t
lib/librte_eal/linuxapp/kni/igb_ethtool.c:
In function igb_set_eee:
lib/librte_eal/linuxapp/kni/igb_ethtool.c:
2551:2: error: implicit declaration of function
ethtool_adv_to_mmd_eee_adv_t
The root cause is as follows.
On Fedora 18 with kernel 3.6.10, ETHTOOL_GEEE is defined in Linux
header file of "linux/ethtool.h", while is not defined in most of other
linux kernel versions.
mmd_eee_cap_to_ethtool_sup_t(), mmd_eee_adv_to_ethtool_adv_t() and
ethtool_adv_to_mmd_eee_adv_t() in kcompat.h are disabled by "#if
!defined(ETHTOOL_GEEE) || (RHEL_RELEASE_CODE && RHEL_RELEASE_CODE <=
RHEL_RELEASE_VERSION(6,4))", while are called in igb_get_eee() in
igb_ethtool.c which is enabled by "#ifdef ETHTOOL_GEEE".
Reported-by: Prashant Upadhyaya <prashant.upadhyaya@aricent.com>
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Stephen Hemminger [Wed, 14 May 2014 16:25:27 +0000 (09:25 -0700)]
sched: fix grinder bug
The rte_scheduler will get stuck and not deliver any more packets
if there are two active subports and then one of them stops enqueing
more packets. This is because of a bug in how the grinder state machines
are managed.
If a non-zero grinder is assigned (but not yet active), then the dequeue
would miss it and always return zero packets. The cure is to always
do a first pass over all grinders.
Signed-off-by: Stephen Hemminger <shemming@brocade.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>