dpdk.git
9 years agodoc: new testpmd commands
Pablo de Lara [Mon, 1 Dec 2014 11:40:45 +0000 (11:40 +0000)]
doc: new testpmd commands

Added info in testpmd functions section for the following commands:

- tunnel_filter add
- tunnel_filter rm
- rx_vxlan_port add
- rx_vxlan_port rm
- port stop/start queue
- set port mac address filter (for VF)
- tx_checksum set
- tso set
- tso show

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
9 years agodoc: add vhost library
Siobhan Butler [Tue, 2 Dec 2014 21:11:42 +0000 (21:11 +0000)]
doc: add vhost library

As Vhost will be a library in DPDK 1.8, adding a new section to
Programmer's Guide to describe its use.

Signed-off-by: Siobhan Butler <siobhan.a.butler@intel.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
9 years agodoc: add distributor application
Siobhan Butler [Tue, 2 Dec 2014 14:02:46 +0000 (14:02 +0000)]
doc: add distributor application

New distributor sample app user guide section for sample app user guide.

Signed-off-by: Siobhan Butler <siobhan.a.butler@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
9 years agodoc: update bonding
Declan Doherty [Mon, 1 Dec 2014 17:10:12 +0000 (17:10 +0000)]
doc: update bonding

Adding details for link status interrupts and link status polling.
Adding details for mode 4 / mode 5
Tidying up rst document to conform to 80 character line limit
Adding diagrams to explain bonding modes
Removed link_bonding.png file

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
9 years agoexamples/multi_process: fix resilience by enabling Rx drop
Bruce Richardson [Wed, 3 Dec 2014 16:56:59 +0000 (16:56 +0000)]
examples/multi_process: fix resilience by enabling Rx drop

The symmetric_mp example app is set up to allow two processes to
share a NIC port, with each pulling packets from one queue. In order
to have the app continue working when one of the process dies, the
drop_en bit should be set in the NIC configuration. Without this bit
set, the NIC will stall once any queue fills. With the bit set, once
a queue fills, all subsequent packets for that queue are discarded
allowing other queues to continue operating as normal.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agotable: fix lookup with incomplete bitmask
Bruce Richardson [Thu, 4 Dec 2014 14:24:12 +0000 (14:24 +0000)]
table: fix lookup with incomplete bitmask

When a lookup was done on a table_array structure with an incomplete
bitmask, the results was always zero hits. This was because the
pkts_mask value was cleared as we process each entry, and the result
was assigned at the end of the loop, when pkts_mask was zero.
Changing the assignment to occur at the start, before the pkts_mask
gets cleared, fixes this issue.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

9 years agoi40e: setup flow director only if enabled
Jingjing Wu [Thu, 4 Dec 2014 15:40:23 +0000 (23:40 +0800)]
i40e: setup flow director only if enabled

In order not to affect the FVL's performance by default setting, this
patch moves the flow director initialization from i40e_pf_setup to
i40e_dev_configure according to the mode in fdir configure info.
Then the resources used for flow director will be only setup if it is enabled.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
9 years agombuf: replace inner fields by outer fields semantic
Jijiang Liu [Tue, 2 Dec 2014 15:06:07 +0000 (23:06 +0800)]
mbuf: replace inner fields by outer fields semantic

Replace the inner_l2_len and the inner_l3_len field with the
outer_l2_len and outer_l3_len field, and rework csum forward engine
and i40e PMD due to these changes.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
9 years agombuf: add Tx offloading flags for tunnels
Jijiang Liu [Tue, 2 Dec 2014 15:06:06 +0000 (23:06 +0800)]
mbuf: add Tx offloading flags for tunnels

Replace PKT_TX_VXLAN_CKSUM with PKT_TX_UDP_TUNNEL_PKT in order to indicate
a packet is an UDP tunneling packet, and introduce 3 TX offload flags for
outer IP TX checksum, which are PKT_TX_OUTER_IP_CKSUM, PKT_TX_OUTER_IPV4
and PKT_TX_OUTER_IPV6 respectively.
Rework csum forward engine and i40e PMD due to these changes.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
9 years agombuf: remove aliasing of Tx offloading flags with Rx ones
Jijiang Liu [Tue, 2 Dec 2014 15:06:05 +0000 (23:06 +0800)]
mbuf: remove aliasing of Tx offloading flags with Rx ones

The reason of redefining the PKT_TX_IPV4 and the PKT_TX_IPV6 is listed below,
It will avoid to send a packet with a bad info:
  - we receive a Ether/IP6/IP4/L4/data packet
  - the driver sets PKT_RX_IPV6_HDR
  - the stack decapsulates IP6
  - the stack sends the packet, it has the PKT_TX_IPV6 flag but it's an IPv4 packet.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
9 years agoapp/testpmd: fix endianness detection
Thomas Monjalon [Wed, 3 Dec 2014 20:12:00 +0000 (21:12 +0100)]
app/testpmd: fix endianness detection

Use endianness detection factorized in EAL.

The comment about arpa/inet.h is not valid anymore since
commit d07180f211c08 ("net: fix conflict with libc").

The macro _htons could also be moved in rte_byteorder.h
by providing some constant byte swapping.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
9 years agoeal: detect endianness
Thomas Monjalon [Wed, 3 Dec 2014 20:01:19 +0000 (21:01 +0100)]
eal: detect endianness

There is no standard to check endianness.
So we need to try different checks.
Previous trials were done in testpmd (see commits
51f694dd40f56 and 64741f237cf29) without full success.
This one is not guaranteed to work everywhere so it could
evolve when exceptions are found.

If endianness is not detected, there is a fallback on x86
to little endian. It could be forced before doing detection
but it would add some arch-dependent code in the generic header.

The option CONFIG_RTE_ARCH_BIG_ENDIAN introduced for IBM Power only
(commit a982ec81d84d53) can be removed. A compile-time check is better.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: Michael Qiu <michael.qiu@intel.com>
9 years agocmdline: fix overflow on bsd
Alan Carew [Fri, 5 Dec 2014 14:19:07 +0000 (15:19 +0100)]
cmdline: fix overflow on bsd

When using test-pmd with flow director in FreeBSD, the application will
segfault/Bus error while parsing the command-line. This is due to how
each commands result structure is represented during parsing, where the offsets
for each tokens value is stored in a character array(char result_buf[BUFSIZ])
in cmdline_parse()(./lib/librte_cmdline/cmdline_parse.c).

The overflow occurs where BUFSIZ is less than the size of a commands result
structure, in this case "struct cmd_pkt_filter_result"
(app/test-pmd/cmdline.c) is 1088 bytes and BUFSIZ on FreeBSD is 1024 bytes as
opposed to 8192 bytes on Linux.

The problem can be reproduced by running test-pmd on FreeBSD:
./testpmd -c 0x3 -n 4 -- -i --portmask=0x3 --pkt-filter-mode=perfect
And adding a filter:
add_perfect_filter 0 udp src 192.168.0.0 1024 dst 192.168.0.0 1024 flexbytes
0x800 vlan 0 queue 0 soft 0x17

This patch removes the OS dependency on BUFSIZ and defines and uses a
library #define CMDLINE_PARSE_RESULT_BUFSIZE 8192

Added boundary checking to ensure this buffer size cannot overflow, with
an error message being produced.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
http://git.droids-corp.org/?p=libcmdline.git;a=commitdiff;h=b1d5b169352e57df3fc14c51ffad4b83f3e5613f

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agocmdline: revert fix overflow on bsd
Thomas Monjalon [Thu, 4 Dec 2014 15:13:45 +0000 (16:13 +0100)]
cmdline: revert fix overflow on bsd

Revert commit a0547e0a751100 because it is an old version
of the patch and was applied by error.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoenic: fix warnings
Thomas Monjalon [Tue, 2 Dec 2014 13:38:31 +0000 (14:38 +0100)]
enic: fix warnings

A lot of warnings were not seen because $(WERROR_FLAGS) was not set
in the Makefile. But they appear with toolchains that enforce more checks.

-Wno-deprecated seems useless.
-Wno-strict-aliasing is added to avoid false positives.

This patch cleans up unused variable, unused functions, wrong types,
static declarations, etc. A lot of functions have unused parameters;
it suggests that more clean-up could be needed.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Sujith Sankar <ssujith@cisco.com>
9 years agokni: fix build on IBM Power
Chao Zhu [Thu, 4 Dec 2014 10:14:08 +0000 (18:14 +0800)]
kni: fix build on IBM Power

Because of different cache line size, the alignment of struct
rte_kni_mbuf in rte_kni_common.h doesn't work on IBM Power. This patch
changed from 64 to RTE_CACHE_LINE_SIZE micro to do the alignment.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agocmdline: fix overflow on bsd
Alan Carew [Mon, 20 Oct 2014 15:23:13 +0000 (16:23 +0100)]
cmdline: fix overflow on bsd

When using test-pmd with flow director in FreeBSD, the application will
segfault/Bus error while parsing the command-line. This is due to how
each commands result structure is represented during parsing, where the offsets
for each tokens value is stored in a character array(char result_buf[BUFSIZ])
in cmdline_parse()(./lib/librte_cmdline/cmdline_parse.c).

The overflow occurs where BUFSIZ is less than the size of a commands result
structure, in this case "struct cmd_pkt_filter_result"
(app/test-pmd/cmdline.c) is 1088 bytes and BUFSIZ on FreeBSD is 1024 bytes as
opposed to 8192 bytes on Linux.

This patch removes the OS dependency on BUFSIZ and defines and uses a
library #define CMDLINE_PARSE_RESULT_BUFSIZE 8192

The problem can be reproduced by running test-pmd on FreeBSD:
./testpmd -c 0x3 -n 4 -- -i --portmask=0x3 --pkt-filter-mode=perfect
And adding a filter:
add_perfect_filter 0 udp src 192.168.0.0 1024 dst 192.168.0.0 1024 flexbytes
0x800 vlan 0 queue 0 soft 0x17

Signed-off-by: Alan Carew <alan.carew@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
9 years agokni: create interface in current network namespace
Takayuki Usui [Wed, 3 Dec 2014 01:37:26 +0000 (10:37 +0900)]
kni: create interface in current network namespace

With this patch, KNI interface (e.g. vEth0) is created in the
network namespace where the DPDK application is running.
Otherwise, all interfaces are created in the default namespace
in the host.

put_net() is required, since get_net_ns_by_pid() increments
the reference counter of the network namespace with get_net().

Signed-off-by: Takayuki Usui <takayuki@midokura.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
9 years agoi40e: fix build with 16-byte descriptors
Helin Zhang [Wed, 3 Dec 2014 01:13:27 +0000 (09:13 +0800)]
i40e: fix build with 16-byte descriptors

The compile error will occur as below when set 'RTE_LIBRTE_I40E_16BYTE_RX_DESC=y'.
'fd_id' should be used to replace 'fd', as 'fd' is not defined in that structure
at all. In addition, local variable of 'flexbl' and 'flexbh' must be used only if
32 bytes RX descriptor is selected.

error logs:
lib/librte_pmd_i40e/i40e_rxtx.c: In function i40e_rxd_build_fdir:
lib/librte_pmd_i40e/i40e_rxtx.c:431:28: error: volatile union <anonymous> has no member named fd
lib/librte_pmd_i40e/i40e_rxtx.c:427:19: error: unused variable flexbl [-Werror=unused-variable]
lib/librte_pmd_i40e/i40e_rxtx.c:427:11: error: unused variable flexbh [-Werror=unused-variable]

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
9 years agotable: fix maybe-uninitialized variable with gcc lto
Dennis Marinus [Tue, 2 Dec 2014 00:39:06 +0000 (16:39 -0800)]
table: fix maybe-uninitialized variable with gcc lto

This patch fixes a maybe-uninitialized warning when compiling DPDK with
GCC 4.9 + Link Time Optimization.

Signed-off-by: Dennis Marinus <dmarinus@amazon.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agoixgbe: fix build with bypass and debug enabled
Thomas Monjalon [Mon, 1 Dec 2014 17:11:02 +0000 (18:11 +0100)]
ixgbe: fix build with bypass and debug enabled

Since commit aae1047905621 ("use the right debug macro"),
DEBUGOUT was replaced by PMD_DRV_LOG which requires at least
2 arguments. But the level argument was missing.

Commit 7a10de5e27 fixed the logs but not the macros FUNC_PTR_OR_*
which are not preprocessed if RTE_LIBRTE_IXGBE_DEBUG_DRIVER is disabled.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoapp/testpmd: fix macro check for little endian
Bruce Richardson [Mon, 1 Dec 2014 11:38:55 +0000 (11:38 +0000)]
app/testpmd: fix macro check for little endian

Compiling with clang on FreeBSD gave a compilation error:
app/test-pmd/csumonly.c:84:5: fatal error: '__BYTE_ORDER' is not defined, evaluates to 0 [-Wundef]

Querying the preprocessor defines show both the define and value used
are incorrect.
$ clang -dM -E - < /dev/null | grep BYTE
\#define  __BYTE_ORDER__  __ORDER_LITTLE_ENDIAN__

Changing the check to  __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ then
resolves the issue.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoapp/testpmd: fix out-of-range error on bsd
Bruce Richardson [Mon, 1 Dec 2014 11:38:54 +0000 (11:38 +0000)]
app/testpmd: fix out-of-range error on bsd

The definition value for IPPROTO_DIVERT protocol uses a value
which is out of range of the uint8_t type, giving clang compiler
errors on FreeBSD.

app/test-pmd/icmpecho.c:231:7: fatal error: overflow converting case value
        to switch condition type (258 to 2) [-Wswitch]
                case IPPROTO_DIVERT: /**< divert pseudo-protocol */

This is fixed by having the code to return the protocol name
use the uint16_t type for the protocol value input.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoenic: fix build with clang
Sujith Sankar [Sat, 29 Nov 2014 07:17:37 +0000 (12:47 +0530)]
enic: fix build with clang

This patch fixes the warnings and error reported by clang compiler on Linux.

Reported-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agoixgbe: fix bitfield assignation with clang
Olivier Matz [Mon, 1 Dec 2014 10:36:13 +0000 (11:36 +0100)]
ixgbe: fix bitfield assignation with clang

Commit 1224decaa44 ("support TCP segmentation offload")
changed the way the bitfields are assigned in ixgbe, example:

  tx_offload_mask.l2_len = ~0;

This result in a compilation error with clang:

  error: implicit truncation from 'int' to bitfield
    changes value from -1 to 127 [-Werror,-Wbitfield-constant-conversion]

Replacing the '=' with a '|=' fixes the issue.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agomk: fix linking with some linux toolchains
Sergio Gonzalez Monroy [Thu, 30 Oct 2014 10:57:42 +0000 (10:57 +0000)]
mk: fix linking with some linux toolchains

Ubuntu/Debian toolchain passes --as-needed flag to the linker by default.
Add --no-as-needed flag by default in linuxapp exec-env to ensure correct
linking.

The problem arises because librte_eal doesn't add a DT_NEEDED entry for
librte_mempool despite the fact that it references symbols in that library.
It does this because we don't explicitly link with -lrte_mempool when we
build librte_eal.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
9 years agoscripts: fix newline in configuration with bsd sed
David Marchand [Fri, 28 Nov 2014 15:42:44 +0000 (16:42 +0100)]
scripts: fix newline in configuration with bsd sed

Use of \n in sed expression is not portable and triggered an invalid
configuration on BSD (at least).
Replace with an explicit newline.

Reported-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agoenic: fix build by using standard integer types
Sujith Sankar [Fri, 28 Nov 2014 09:38:19 +0000 (15:08 +0530)]
enic: fix build by using standard integer types

ENIC PMD was giving compilation errors on ppc_64-power8-linuxapp-gcc because
of types such as u_int32_t.  This patch replaces all those with uint32_t and
similar ones.

Reported-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agobond: fix build with gcc 4.3
Pablo de Lara [Fri, 28 Nov 2014 15:10:16 +0000 (15:10 +0000)]
bond: fix build with gcc 4.3

GCC 4.3 complains that slow_pkts array in bond_ethdev_tx_burst_8023ad
may be used uninitialized, so it has been initialized to NULL.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoixgbe: fix mbuf failure statistics in vector Rx
Balazs Nemeth [Fri, 28 Nov 2014 09:21:45 +0000 (09:21 +0000)]
ixgbe: fix mbuf failure statistics in vector Rx

The statistics that is reported through the rx_nombuf fields in struct
rte_eth_stats was not set when the vector PMD was used. The statistics
should report the number of mbufs that could _not_ be allocated during
rearm of the RX queue. The non-vector PMD reports it correctly. The
use of either vector PMD or non-vector PMD depends on runtime
configuration. Hence it is possible that a change in configuration
would disable this statistics. To prevent this from happening, the
statistics should be reported by both implementations.

Signed-off-by: Balazs Nemeth <balazs.nemeth@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agoversion: 1.8.0-rc2
Thomas Monjalon [Thu, 27 Nov 2014 21:18:32 +0000 (22:18 +0100)]
version: 1.8.0-rc2

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agobond: set offload capabilities flags
Jia Yu [Thu, 27 Nov 2014 21:23:37 +0000 (21:23 +0000)]
bond: set offload capabilities flags

Before the fix, bond device's offload capabilities are unset. This fix
takes the minimum common set of slave devices' capabilities as bond
device's capabilities. For simplicity, we ensure all slave devices
to have a capability before bond device can claim this capability,
even if some slave devices are unused (i.e. linked down, standby).

Signed-off-by: Jia Yu <jyu@vmware.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
9 years agobond: unit tests for mode 5
Daniel Mrzyglod [Thu, 27 Nov 2014 16:33:41 +0000 (16:33 +0000)]
bond: unit tests for mode 5

This Patch add unit tests for mode 5 - tlb - to the oders
link bonding unit tests.

Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
9 years agobond: add mode 5
Daniel Mrzyglod [Thu, 27 Nov 2014 16:33:40 +0000 (16:33 +0000)]
bond: add mode 5

Add support for mode 5 (Transmit load balancing) into pmd driver

This patch add support for Adaptive transmit load balancing (mode 5) to the
librte_pmd_bond library. This mode provides an adaptive transmit load
balancing. It dynamically changes the transmitting slave, according to the
computed load.

Further details are described here:
https://www.kernel.org/doc/Documentation/networking/bonding.txt
In implementation callback is used for sorting slave order - providing
statistics for burst function about slave bandwith usage  and sort
interfaces due to usage.

Difference in this implementation vs Linux implementation:
- We Are trying send all pkts – If one interface hasn’t send packets we are
trying to send rest of packets by other slaves sorted previously by callback
function.

Some implementation details:
- Every 100ms is taken obytes statistics from every slave.
- Every 10 ms the slaves in  table are sorted and updated by callback -
bandwidth and successfully transmitted bytes from previous iteration which
happens every 100 ms
- There is callback function which updates this statistics for transparency and
for rather intensive computation involved in this mode.

Test report: http://dpdk.org/ml/archives/dev/2014-November/008729.html

Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Tested-by: SunX Jiajia <sunx.jiajia@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
9 years agobond: add mode 4
Pawel Wodkowski [Thu, 27 Nov 2014 18:01:10 +0000 (18:01 +0000)]
bond: add mode 4

This patch set add support for dynamic link aggregation (mode 4) to the
librte_pmd_bond library. This mode provides auto negotiation/configuration
of peers and well as link status changes monitoring using out of band
LACP (link aggregation control protocol) messages. For further details of
LACP specification see the IEEE 802.3ad/802.1AX standards. It is also
described here
https://www.kernel.org/doc/Documentation/networking/bonding.txt.

In this implementation we have an array of mode 4 settings for each slave.
There is also assumption that for every port is one aggregator (it might
be unused if better is found).

Difference in this implementation vs Linux implementation:
- this implementation it is not directly based on state machines but current
  state is calculated from actor and partner states (and other things too).

Some implementation details:
- during rx burst every packet Is checked if this is LACP or marker packet.
  If it is LACP frame it is passed to mode 4 logic using slaves rx ring  and
  removed from rx buffer before it is returned
- in tx burst, packets from mode 4 (if any) are injected into each slave.
- there is a timer running in background to process/produce mode 4
  frames form rx/to tx functions.

Some requirements for this mode:
- for LACP mode to work rx and tx burst functions must be invoked
  at least in 100ms intervals
- provided buffer to rx burst should be at least 2x slave count size. This is
  not needed but might increase performance especially during initial
  handshake.

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
9 years agoenic: fix vfio inclusion
Sujith Sankar [Thu, 27 Nov 2014 17:14:40 +0000 (22:44 +0530)]
enic: fix vfio inclusion

Inclusion of vfio.h was giving compilation errors if kernel version is less
than 3.6.0 and if RTE_EAL_VFIO was in config.

Removed inclusion of vfio.h and replaced RTE_EAL_VFIO with VFIO_PRESENT.

Reported-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoenic: fix dependencies
Thomas Monjalon [Thu, 27 Nov 2014 17:53:49 +0000 (18:53 +0100)]
enic: fix dependencies

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoconfig: disable enic driver on Power
David Marchand [Thu, 27 Nov 2014 11:42:38 +0000 (12:42 +0100)]
config: disable enic driver on Power

enic driver is giving trouble because of non-standard types :

  CC enic_res.o
In file included from
lib/librte_pmd_enic/enic_res.c:36:0:
lib/librte_pmd_enic/enic_compat.h:92:1: error: unknown type name ‘u_int32_t’
 static inline u_int32_t ioread32(volatile void *addr)
 ^

Disable it on Power for now.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoscripts: fix symbol overriding in configuration
David Marchand [Thu, 27 Nov 2014 11:29:05 +0000 (12:29 +0100)]
scripts: fix symbol overriding in configuration

When redefining the same symbol in configuration (basically after an inclusion),
we need to undefine the previous symbol to avoid "redefined" errors.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agonet: fix conflict with libc
Thomas Monjalon [Thu, 27 Nov 2014 11:28:17 +0000 (12:28 +0100)]
net: fix conflict with libc

It was impossible to include netinet/in.h and rte_ip.h
because the IP protocols were redefined.
It is removed because useless.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Ivan Boule <ivan.boule@6wind.com>
9 years agoapp/testpmd: fix RSS flags size
Jia Yu [Fri, 7 Nov 2014 15:43:01 +0000 (07:43 -0800)]
app/testpmd: fix RSS flags size

Since commit 8a387fa85f02 ("ethdev: more RSS flags") in DPDK 1.7,
RSS flags have increased.
According to rss_hf definition in rte_eth_rss_conf, it shall be uint64 type.
Using uint16 will get truncated value, and cause incorrect output. This
fix corrected this issue.

Signed-off-by: Jia Yu <jyu@vmware.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agomempool: avoid dump crash with null pointer
Keith Wiles [Sun, 28 Sep 2014 05:28:44 +0000 (05:28 +0000)]
mempool: avoid dump crash with null pointer

Check the FILE *f and rte_mempool *mp pointers for NULL.

Signed-off-by: Keith Wiles <keith.wiles@windriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
9 years agoadd prefix to cache line macros
Sergio Gonzalez Monroy [Wed, 19 Nov 2014 12:26:06 +0000 (12:26 +0000)]
add prefix to cache line macros

CACHE_LINE_SIZE is a macro defined in machine/param.h in FreeBSD and
conflicts with DPDK macro version.
Adding RTE_ prefix to avoid conflicts.
CACHE_LINE_MASK and CACHE_LINE_ROUNDUP are also prefixed.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
[Thomas: updated on HEAD, including PPC]

9 years agoeal/bsd: remove unused HPET support
Sergio Gonzalez Monroy [Thu, 20 Nov 2014 14:06:59 +0000 (14:06 +0000)]
eal/bsd: remove unused HPET support

The HPET support in the BSD EAL was copied directly from the Linux version,
but did not actually work on FreeBSD. We replace this old code with a simple
compiler message that informs the user that we don't support HPET on BSD if
they enable such support in the build-time configuration file.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
9 years agoeal/bsd: use sysctl to get TSC frequency
Sergio Gonzalez Monroy [Thu, 20 Nov 2014 14:06:58 +0000 (14:06 +0000)]
eal/bsd: use sysctl to get TSC frequency

BSD provides the TSC frequency value through sysctl.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
9 years agodoc: no more bare metal environment
Thomas Monjalon [Thu, 27 Nov 2014 10:02:11 +0000 (11:02 +0100)]
doc: no more bare metal environment

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoexamples: no more bare metal environment
David Marchand [Fri, 26 Sep 2014 14:04:02 +0000 (16:04 +0200)]
examples: no more bare metal environment

Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
9 years agoapp: no more bare metal environment
David Marchand [Fri, 26 Sep 2014 14:04:01 +0000 (16:04 +0200)]
app: no more bare metal environment

Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
9 years agoeal: no more bare metal environment
David Marchand [Fri, 26 Sep 2014 14:04:00 +0000 (16:04 +0200)]
eal: no more bare metal environment

Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
9 years agomk: no more bare metal environment
David Marchand [Fri, 26 Sep 2014 14:03:59 +0000 (16:03 +0200)]
mk: no more bare metal environment

Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
9 years agoconfig: no more bare metal environment
David Marchand [Fri, 26 Sep 2014 14:03:58 +0000 (16:03 +0200)]
config: no more bare metal environment

Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
9 years agombuf: sort TCP segmentation offload flag
Thomas Monjalon [Thu, 27 Nov 2014 09:35:56 +0000 (10:35 +0100)]
mbuf: sort TCP segmentation offload flag

Due to reordering conflicts, the TSO flag was not sorted.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoeal/linux: fix remaining checks for 64-bit architectures
David Marchand [Mon, 24 Nov 2014 15:18:51 +0000 (16:18 +0100)]
eal/linux: fix remaining checks for 64-bit architectures

RTE_ARCH_X86_64 can not be used as a way to determine if we are building for
64bits cpus. Instead, RTE_ARCH_64 should be used.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
9 years agoi40e: add ethertype filter
jingjing.wu [Thu, 13 Nov 2014 12:49:55 +0000 (20:49 +0800)]
i40e: add ethertype filter

Handle the RTE_ETH_FILTER_ADD and RTE_ETH_FILTER_DELETE operations
on ethertype filter.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
9 years agoethdev: add ethertype filter
jingjing.wu [Thu, 13 Nov 2014 12:49:54 +0000 (20:49 +0800)]
ethdev: add ethertype filter

A new structure of ethertype filter is defined in rte_eth_ctrl.h
for filter_ctrl api

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
9 years agoenic: build integration
Sujith Sankar [Tue, 25 Nov 2014 17:26:41 +0000 (22:56 +0530)]
enic: build integration

Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
[Thomas: enable for BSD - not tested]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoenic: new driver
Sujith Sankar [Tue, 25 Nov 2014 17:26:43 +0000 (22:56 +0530)]
enic: new driver

Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
9 years agoenic/base: common code
Sujith Sankar [Tue, 25 Nov 2014 17:26:42 +0000 (22:56 +0530)]
enic/base: common code

VNIC common code is partially shared with ENIC kernel mode driver.

Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
9 years agoenic: license
Sujith Sankar [Tue, 25 Nov 2014 17:26:40 +0000 (22:56 +0530)]
enic: license

Signed-off-by: Sujith Sankar <ssujith@cisco.com>
9 years agoapp/testpmd: fix build for IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:17 +0000 (17:17 -0500)]
app/testpmd: fix build for IBM Power

This patch fixes compiling problems on IBM Power architecture and turn
on the test-pmd compiling option in configuration file. Actually, this
is an big endian compiling fix.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoapp/test: fix finding the second smallest memory segment
Chao Zhu [Tue, 25 Nov 2014 22:17:16 +0000 (17:17 -0500)]
app/test: fix finding the second smallest memory segment

Curent implementation in test_memzone.c has bugs in finding the
second smallest memory segment. It's the last smallest memory segment,
but it's not the second smallest memory segment. This bug may cause test
failure in some cases. This patch fixes this bug.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agomem: support layout of IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:15 +0000 (17:17 -0500)]
mem: support layout of IBM Power

The mmap of hugepage files on IBM Power starts from high address to low
address. This is different from x86. This patch modified the memory
segment detection code to get the correct memory segment layout on Power
architecture. This patch also added a commond ARCH_PPC_64 definition for
64 bit systems.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agomem: add huge page sizes for IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:14 +0000 (17:17 -0500)]
mem: add huge page sizes for IBM Power

IBM Power architecture has different huge page sizes (16MB, 16GB) than
x86.This patch defines RTE_PGSIZE_16M and RTE_PGSIZE_16G in the
rte_page_sizes enum variable and adds huge page size support of DPDK
for IBM Power architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agomk: define cache size for IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:13 +0000 (17:17 -0500)]
mk: define cache size for IBM Power

IBM Power architecture has different cache line size (128 bytes) than
x86 (64 bytes). This patch defines CACHE_LINE_SIZE to 128 bytes to
override the default value 64 bytes to support IBM Power Architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoeal/linux: disable iopl operation for IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:12 +0000 (17:17 -0500)]
eal/linux: disable iopl operation for IBM Power

iopl() call is mostly for the i386 architecture. In Power and other
architecture, it doesn't exist. This patch modified rte_eal_iopl_init()
and make it return -1 for Power and other architecture. Thus
rte_config.flags will not contain EAL_FLG_HIGH_IOPL flag for other
architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoeal/ppc: cpu flag checks for IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:11 +0000 (17:17 -0500)]
eal/ppc: cpu flag checks for IBM Power

IBM Power processor doesn't have CPU flag hardware registers. This patch
uses aux vector software register to get CPU flags and add CPU flag
checking support for IBM Power architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoeal/ppc: vector memcpy for IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:10 +0000 (17:17 -0500)]
eal/ppc: vector memcpy for IBM Power

The SSE based memory copy in DPDK only support x86. This patch adds
altivec based memory copy functions for IBM Power architecture. This
patch includes altivec.h which requires GCC version>= 4.8.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoeal/ppc: spinlock operations for IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:09 +0000 (17:17 -0500)]
eal/ppc: spinlock operations for IBM Power

This patch adds spinlock operations for IBM Power architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoeal/ppc: prefetch operations for IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:08 +0000 (17:17 -0500)]
eal/ppc: prefetch operations for IBM Power

This patch add architecture specific prefetch operations for IBM Power
architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoeal/ppc: cpu cycle operations for IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:07 +0000 (17:17 -0500)]
eal/ppc: cpu cycle operations for IBM Power

IBM Power architecture doesn't have TSC register to get CPU cycles. This
patch implements the time base register read instead of TSC register of
x86 on IBM Power architecture.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoeal/ppc: byte order operations for IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:06 +0000 (17:17 -0500)]
eal/ppc: byte order operations for IBM Power

This patch adds architecture specific byte order operations for IBM Power
architecture. Power architecture support both big endian and little
endian. This patch also adds a RTE_ARCH_BIG_ENDIAN micro.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoeal/ppc: atomic operations for IBM Power
Chao Zhu [Tue, 25 Nov 2014 22:17:05 +0000 (17:17 -0500)]
eal/ppc: atomic operations for IBM Power

This patch adds architecture specific atomic operation file for IBM
Power architecture CPU.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agomk: introduce IBM Power architecture
Chao Zhu [Tue, 25 Nov 2014 22:17:04 +0000 (17:17 -0500)]
mk: introduce IBM Power architecture

To make DPDK run on IBM Power architecture, configuration files for
Power architecuture are added. Also, the compiling related .mk files are
added.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Acked-by: David Marchand <david.marchand@6wind.com>
10 years agoapp/testpmd: add a verbose mode to checksum forward engine
Olivier Matz [Wed, 26 Nov 2014 15:04:55 +0000 (16:04 +0100)]
app/testpmd: add a verbose mode to checksum forward engine

If the user specifies 'set verbose 1' in testpmd command line,
the csum forward engine will dump some informations about received
and transmitted packets, especially which flags are set and what
values are assigned to l2_len, l3_len, l4_len and tso_segsz.

This can help someone implementing TSO or hardware checksum offload to
understand how to configure the mbufs.

Example of output for one packet:

 --------------
 rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
 tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
 tx: m->tso_segsz=800
 tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
 --------------

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
10 years agoapp/testpmd: support TSO in checksum forward engine
Olivier Matz [Wed, 26 Nov 2014 15:04:54 +0000 (16:04 +0100)]
app/testpmd: support TSO in checksum forward engine

Add two new commands in testpmd:

- tso set <segsize> <portid>
- tso show <portid>

These commands can be used enable TSO when transmitting TCP packets in
the csum forward engine. Ex:

  set fwd csum
  tx_checksum set ip hw 0
  tso set 800 0
  start

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
10 years agoixgbe: support TCP segmentation offload
Olivier Matz [Wed, 26 Nov 2014 15:04:53 +0000 (16:04 +0100)]
ixgbe: support TCP segmentation offload

Implement TSO (TCP segmentation offload) in ixgbe driver. The driver is
now able to use PKT_TX_TCP_SEG mbuf flag and mbuf hardware offload infos
(l2_len, l3_len, l4_len, tso_segsz) to configure the hardware support of
TCP segmentation.

In ixgbe, when doing TSO, the IP length must not be included in the TCP
pseudo header checksum. A new function ixgbe_fix_tcp_phdr_cksum() is
used to fix the pseudo header checksum of the packet before giving it to
the hardware.

In the patch, the tx_desc_cksum_flags_to_olinfo() and
tx_desc_ol_flags_to_cmdtype() functions have been reworked to make them
clearer. This should not impact performance as gcc (version 4.8 in my
case) is smart enough to convert the tests into a code that does not
contain any branch instruction.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
10 years agombuf: generic support for TCP segmentation offload
Olivier Matz [Wed, 26 Nov 2014 15:04:52 +0000 (16:04 +0100)]
mbuf: generic support for TCP segmentation offload

Some of the NICs supported by DPDK have a possibility to accelerate TCP
traffic by using segmentation offload. The application prepares a packet
with valid TCP header with size up to 64K and deleguates the
segmentation to the NIC.

Implement the generic part of TCP segmentation offload in rte_mbuf. It
introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes)
and tso_segsz (MSS of packets).

To delegate the TCP segmentation to the hardware, the user has to:

- set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
  PKT_TX_TCP_CKSUM)
- set the flag PKT_TX_IPV4 or PKT_TX_IPV6
- set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
  the packet
- fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
- calculate the pseudo header checksum without taking ip_len in account,
  and set it in the TCP header, for instance by using
  rte_ipv4_phdr_cksum(ip_hdr, ol_flags)

The API is inspired from ixgbe hardware (the next commit adds the
support for ixgbe), but it seems generic enough to be used for other
hw/drivers in the future.

This commit also reworks the way l2_len and l3_len are used in igb
and ixgbe drivers as the l2_l3_len is not available anymore in mbuf.

Signed-off-by: Mirek Walukiewicz <miroslaw.walukiewicz@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
10 years agonet: new checksum functions
Olivier Matz [Wed, 26 Nov 2014 15:04:51 +0000 (16:04 +0100)]
net: new checksum functions

Introduce new functions to calculate checksums. These new functions
are derivated from the ones provided csumonly.c but slightly reworked.
There is still some room for future optimization of these functions
(maybe SSE/AVX, ...).

This API will be modified in tbe next commits by the introduction of
TSO that requires a different pseudo header checksum to be set in the
packet.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
10 years agoapp/testpmd: rework checksum forward engine
Olivier Matz [Wed, 26 Nov 2014 15:04:50 +0000 (16:04 +0100)]
app/testpmd: rework checksum forward engine

The csum forward engine was becoming too complex to be used and
extended (the next commits want to add the support of TSO):

- no explaination about what the code does
- code is not factorized, lots of code duplicated, especially between
  ipv4/ipv6
- user command line api: use of bitmasks that need to be calculated by
  the user
- the user flags don't have the same semantic:
  - for legacy IP/UDP/TCP/SCTP, it selects software or hardware checksum
  - for other (vxlan), it selects between hardware checksum or no
    checksum
- the code relies too much on flags set by the driver without software
  alternative (ex: PKT_RX_TUNNEL_IPV4_HDR). It is nice to be able to
  compare a software implementation with the hardware offload.

This commit tries to fix these issues, and provide a simple definition
of what is done by the forward engine:

 * Receive a burst of packets, and for supported packet types:
 *  - modify the IPs
 *  - reprocess the checksum in SW or HW, depending on testpmd command line
 *    configuration
 * Then packets are transmitted on the output port.
 *
 * Supported packets are:
 *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
 *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
 *
 * The network parser supposes that the packet is contiguous, which may
 * not be the case in real life.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
10 years agoapp/testpmd: fix use of offload flags
Olivier Matz [Wed, 26 Nov 2014 15:04:49 +0000 (16:04 +0100)]
app/testpmd: fix use of offload flags

In testpmd the rte_port->tx_ol_flags flag was used in 2 incompatible
manners:
- sometimes used with testpmd specific flags (0xff for checksums, and
  bit 11 for vlan)
- sometimes assigned to m->ol_flags directly, which is wrong in case
  of checksum flags

This commit replaces the hardcoded values by named definitions, which
are not compatible with mbuf flags. The testpmd forward engines are
fixed to use the flags properly.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
10 years agombuf: get the name of offload flags
Olivier Matz [Wed, 26 Nov 2014 15:04:48 +0000 (16:04 +0100)]
mbuf: get the name of offload flags

In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
The issue is that the list of flags in the application has to be
synchronized with the flags defined in rte_mbuf.h.

This patch introduces 2 new functions rte_get_rx_ol_flag_name()
and rte_get_tx_ol_flag_name() that returns the name of a flag from
its mask. It also fixes rxonly.c to use this new functions and to
display the proper flags.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
10 years agombuf: remove too specific flags mask
Olivier Matz [Wed, 26 Nov 2014 15:04:47 +0000 (16:04 +0100)]
mbuf: remove too specific flags mask

This definition is specific to Intel PMD drivers and its definition
"indicate what bits required for building TX context" shows that it
should not be in the generic rte_mbuf.h but in the PMD driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
10 years agombuf: add help about Tx checksum flags
Olivier Matz [Wed, 26 Nov 2014 15:04:46 +0000 (16:04 +0100)]
mbuf: add help about Tx checksum flags

Describe how to use hardware checksum API.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
10 years agombuf: reorder Tx flags
Olivier Matz [Wed, 26 Nov 2014 15:04:45 +0000 (16:04 +0100)]
mbuf: reorder Tx flags

The tx mbuf flags are now ordered from the lowest value to the
the highest. Add comments to explain where to add new flags.

By the way, move the PKT_TX_VXLAN_CKSUM at the right place.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
10 years agoixgbe: fix flags variable size to 64 bits
Olivier Matz [Wed, 26 Nov 2014 15:04:44 +0000 (16:04 +0100)]
ixgbe: fix flags variable size to 64 bits

Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
packet flags are now 64 bits wide. Some occurences were forgotten in
the ixgbe driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
10 years agoigb/ixgbe: fix IP checksum calculation
Olivier Matz [Wed, 26 Nov 2014 15:04:43 +0000 (16:04 +0100)]
igb/ixgbe: fix IP checksum calculation

According to Intel® 82599 10 GbE Controller Datasheet (Table 7-38), both
L2 and L3 lengths are needed to offload the IP checksum.

Note that the e1000 driver does not need to be patched as it already
contains the fix.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
10 years agoapp/test: vm power management
Alan Carew [Tue, 25 Nov 2014 16:18:11 +0000 (16:18 +0000)]
app/test: vm power management

Updated the unit tests to cover both librte_power implementations as well as
the external API.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
10 years agopower: integration of vm power management
Alan Carew [Tue, 25 Nov 2014 16:18:10 +0000 (16:18 +0000)]
power: integration of vm power management

librte_power now contains both rte_power_acpi_cpufreq and rte_power_kvm_vm
implementations.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
10 years agopower: packet format for vm power management
Alan Carew [Tue, 25 Nov 2014 16:18:09 +0000 (16:18 +0000)]
power: packet format for vm power management

Provides a command packet format for host and guest.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
10 years agopower: common interface for guest and host
Alan Carew [Tue, 25 Nov 2014 16:18:08 +0000 (16:18 +0000)]
power: common interface for guest and host

Moved the current librte_power implementation to rte_power_acpi_cpufreq, with
renaming of functions only.
Added rte_power_kvm_vm implementation to support Power Management from a VM.

librte_power now hides the implementation based on the environment used.
A new call rte_power_set_env() can explicidly set the environment, if not
called then auto-detection takes place.

rte_power_kvm_vm is subset of the librte_power APIs, the following is supported:
 rte_power_init(unsigned lcore_id)
 rte_power_exit(unsigned lcore_id)
 rte_power_freq_up(unsigned lcore_id)
 rte_power_freq_down(unsigned lcore_id)
 rte_power_freq_min(unsigned lcore_id)
 rte_power_freq_max(unsigned lcore_id)

The other unsupported APIs return -ENOTSUP

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
10 years agopower: vm communication channels in guest
Alan Carew [Tue, 25 Nov 2014 16:18:07 +0000 (16:18 +0000)]
power: vm communication channels in guest

Allows for the opening of Virtio-Serial devices on a VM, where a DPDK
application can send packets to the host based monitor. The packet formatted is
specified in channel_commands.h
Each device appears as a serial device in path
/dev/virtio-ports/virtio.serial.port.<agent_type>.<lcore_num> where each lcore
in a DPDK application has exclusive to a device/channel.
Each channel is opened in non-blocking mode, after a successful open a test
packet is send to the host to ensure the host side is monitoring.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
10 years agoexamples/vm_power: cli in guest
Alan Carew [Tue, 25 Nov 2014 16:18:06 +0000 (16:18 +0000)]
examples/vm_power: cli in guest

Provides a small sample application(guest_vm_power_mgr) to run on a VM.
The application is run by providing a core mask(-c) and number of memory
channels(-n). The core mask corresponds to the number of lcore channels to
attempt to open. A maximum of 64 channels per VM is allowed. The channels must
be monitored by the host.
After successful initialisation a CPU frequency command can be sent to the host
using:
set_cpu_freq <lcore_num> <up|down|min|max>.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
10 years agoexamples/vm_power: vm power management application
Alan Carew [Tue, 25 Nov 2014 16:18:05 +0000 (16:18 +0000)]
examples/vm_power: vm power management application

For launching CLI thread and Monitor thread and initialising
resources.
Requires a minimum of two lcores to run, additional cores specified by eal core
mask are not used.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
10 years agoexamples/vm_power: cpu frequency in host
Alan Carew [Tue, 25 Nov 2014 16:18:04 +0000 (16:18 +0000)]
examples/vm_power: cpu frequency in host

A wrapper around librte_power(using ACPI cpufreq), providing locking around the
non-threadsafe library, allowing for frequency changes based on core masks and
core numbers from both the CLI thread and epoll monitor thread.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
10 years agoexamples/vm_power: cli in host
Alan Carew [Tue, 25 Nov 2014 16:18:03 +0000 (16:18 +0000)]
examples/vm_power: cli in host

The CLI is used for administrating the channel monitor and manager and
manually setting the CPU frequency on the host.

Supports the following commands:
 add_vm [Mul-choice STRING]: add_vm|rm_vm <name>, add a VM for subsequent
  operations with the CLI or remove a previously added VM from the VM Power
  Manager

 rm_vm [Mul-choice STRING]: add_vm|rm_vm <name>, add a VM for subsequent
  operations with the CLI or remove a previously added VM from the VM Power
  Manager

 add_channels [Fixed STRING]: add_channels <vm_name> <list>|all, add
  communication channels for the specified VM, the virtio channels must be
  enabled in the VM configuration(qemu/libvirt) and the associated VM must be
  active. <list> is a comma-separated list of channel numbers to add, using the
  keyword 'all' will attempt to add all channels for the VM

 set_channel_status [Fixed STRING]:
  set_channel_status <vm_name> <list>|all enabled|disabled,  enable or disable
  the communication channels in list(comma-separated) for the specified VM,
  alternatively list can be replaced with keyword 'all'. Disabled channels will
  still receive packets on the host, however the commands they specify will be
  ignored. Set status to 'enabled' to begin processing requests again.

 show_vm [Fixed STRING]: show_vm <vm_name>, prints the information on the
  specified VM(s), the information lists the number of vCPUS, the pinning to
  pCPU(s) as a bit mask, along with any communication channels associated with
  each VM

 show_cpu_freq_mask [Fixed STRING]: show_cpu_freq_mask <mask>, Get the current
  frequency for each core specified in the mask

 set_cpu_freq_mask [Fixed STRING]: set_cpu_freq <core_mask> <up|down|min|max>,
  Set the current frequency for the cores specified in <core_mask> by scaling
  each up/down/min/max.

 show_cpu_freq [Fixed STRING]: Get the current frequency for the specified core

 set_cpu_freq [Fixed STRING]: set_cpu_freq <core_num> <up|down|min|max>,
  Set the current frequency for the specified core by scaling up/down/min/max

 quit [Fixed STRING]: close the application

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
10 years agoexamples/vm_power: channel manager and monitor in host
Alan Carew [Tue, 25 Nov 2014 16:18:02 +0000 (16:18 +0000)]
examples/vm_power: channel manager and monitor in host

The manager is responsible for adding communications channels to the Monitor
thread, tracking and reporting VM state and employs the libvirt API for
synchronization with the KVM Hypervisor. The manager interacts with the
Hypervisor to discover the mapping of virtual CPUS(vCPUs) to the host
physical CPUS(pCPUs) and to inspect the VM running state.

The manager provides the following functionality to the CLI:
1) Connect to a libvirtd instance, default: qemu:///system
2) Add a VM to an internal list, each VM is identified by a "name" which must
   correspond a valid libvirt Domain Name.
3) Add communication channels associated with a VM to the epoll based Monitor
   thread.
   The channels must exist and be in the form of:
   /tmp/powermonitor/<vm_name>.<channel_number>. Each channel is a
   Virtio-Serial endpoint configured as an AF_UNIX file socket and opened in
   non-blocking mode.
   Each VM can have a maximum of 64 channels associated with it.
4) Disable or re-enable VM communication channels, channels once added to the
   Monitor thread remain in that threads control, however acting on channel
   requests can be disabled and renabled via CLI.

The monitor is an epoll based infinite loop running in a separate thread that
waits on channel events from VMs and calls the corresponding functions. Channel
definitions from the manager are registered via the epoll event opaque pointer
when calling epoll_ctl(EPOLL_CTL_ADD), this allows for obtaining the channels
file descriptor for reading EPOLLIN events and mapping the vCPU to pCPU(s)
associated with a request from a particular VM.

Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
10 years agoexamples/skeleton: very simple code for packet forwarding
Bruce Richardson [Fri, 14 Nov 2014 14:31:50 +0000 (14:31 +0000)]
examples/skeleton: very simple code for packet forwarding

This is a very simple example app for doing packet forwarding with the
Intel DPDK. It's designed to serve as a start point for people new to
the Intel DPDK and who want to develop a new app.

Therefore it's meant to:
* have as good a performance out-of-the-box as possible, using the
  best-known settings for configuring the PMDs, so that any new apps can
  be based off it.
* be kept as short as possible to make it easy to understand it and get
  started with it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
10 years agoeal/linux: map pci memory resources after hugepages
Anatoly Burakov [Tue, 11 Nov 2014 10:09:25 +0000 (10:09 +0000)]
eal/linux: map pci memory resources after hugepages

Multi-process DPDK application must mmap hugepages and PCI resources
into the same virtual address space. By default the virtual addresses
are chosen by the primary process automatically when calling the mmap.
But sometimes the chosen virtual addresses aren't usable in secondary
process - for example, secondary process is linked with more libraries
than primary process, and the library occupies the same address space
that the primary process has requested for PCI mappings.

This patch makes EAL try and map PCI BARs right after the hugepages
(instead of location chosen by mmap) in virtual memory, so that PCI BARs
have less chance of ending up in random places in virtual memory.

Signed-off-by: Liang Xu <liang.xu@cinfotech.cn>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
10 years agoconfig: support 128 cores
Didier Pallard [Tue, 22 Apr 2014 10:20:25 +0000 (10:20 +0000)]
config: support 128 cores

New platforms have more than 64 cores.
Set default max cores number to 128.

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
10 years agoeal: add option --master-lcore
Simon Kuenzer [Tue, 8 Jul 2014 08:28:30 +0000 (10:28 +0200)]
eal: add option --master-lcore

Enable users to specify the lcore id that is used as master lcore.

Signed-off-by: Simon Kuenzer <simon.kuenzer@neclab.eu>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>