dpdk.git
9 years agoeal/x86: optimize memcpy for SSE and AVX
Zhihong Wang [Thu, 29 Jan 2015 02:38:47 +0000 (10:38 +0800)]
eal/x86: optimize memcpy for SSE and AVX

Main code changes:

1. Differentiate architectural features based on CPU flags
    a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth
    b. Implement separated copy flow specifically optimized for target architecture

2. Rewrite the memcpy function "rte_memcpy"
    a. Add store aligning
    b. Add load aligning based on architectural features
    c. Put block copy loop into inline move functions for better control of instruction order
    d. Eliminate unnecessary MOVs

3. Rewrite the inline move functions
    a. Add move functions for unaligned load cases
    b. Change instruction order in copy loops for better pipeline utilization
    c. Use intrinsics instead of assembly code

4. Remove slow glibc call for constant copies

Test report: http://dpdk.org/ml/archives/dev/2015-January/011848.html

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
Reviewed-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
9 years agoapp/test: extend memcpy test coverage
Zhihong Wang [Thu, 29 Jan 2015 02:38:46 +0000 (10:38 +0800)]
app/test: extend memcpy test coverage

Main code changes:
1. Added more typical data points for a thorough performance test
2. Added unaligned test cases since it's common in DPDK usage

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
9 years agoapp/test: remove unnecessary memcpy test cases
Zhihong Wang [Thu, 29 Jan 2015 02:38:45 +0000 (10:38 +0800)]
app/test: remove unnecessary memcpy test cases

Removed unnecessary test cases for base move functions
since the function "func_test" covers them all.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
9 years agoapp/test: disable variable tracking assignment for memcpy
Zhihong Wang [Thu, 29 Jan 2015 02:38:44 +0000 (10:38 +0800)]
app/test: disable variable tracking assignment for memcpy

VTA is for debugging only, it increases compile time and binary size,
especially when there're a lot of inlines.
So disable it since memcpy test contains a lot of inline calls.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
9 years agotimer: fix reset return value
Robert Sanford [Wed, 25 Feb 2015 04:09:49 +0000 (23:09 -0500)]
timer: fix reset return value

- API rte_timer_reset() should return -1 when the timer is in the
RUNNING or CONFIG state. Instead, it ignores the return value of
internal function __rte_timer_reset() and always returns 0.
We change rte_timer_reset() to return the value returned by
__rte_timer_reset().

- Enhance timer stress test 2 to report how many timer reset
collisions occur, i.e., how many times rte_timer_reset() fails
due to a timer being in the CONFIG state.

Signed-off-by: Robert Sanford <rsanford2@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
9 years agotimer: fix stress test on multiple runs
Robert Sanford [Wed, 25 Feb 2015 04:09:48 +0000 (23:09 -0500)]
timer: fix stress test on multiple runs

Fix timer stress test to succeed on multiple runs.

Signed-off-by: Robert Sanford <rsanford2@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
9 years agotimer: pause in reset sync
Robert Sanford [Wed, 25 Feb 2015 04:09:47 +0000 (23:09 -0500)]
timer: pause in reset sync

In rte_timer_reset_sync(), insert rte_pause() into loop that waits
for rte_timer_reset() to succeed.

Signed-off-by: Robert Sanford <rsanford2@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
9 years agoexamples/l2fwd-jobstats: add new example
Pawel Wodkowski [Tue, 24 Feb 2015 16:33:24 +0000 (17:33 +0100)]
examples/l2fwd-jobstats: add new example

This app demonstrate usage of new rte_jobstats library.
It is basically the orginal l2fwd with following modifications to met
library requirements:
- main_loop() was split into two jobs: forward job and flush job. Logic
for those jobs is almost the same as in original application.
- stats is moved to rte_alarm callback to not introduce overhead of
printing.
- stats are expanded to show rte_jobstats statistics.
- added new parameter '-l' to automatic thousands separator.

Comparing original l2fwd and l2fwd-jobstats apps will show approach what
is needed to properly write own application with rte_jobstats
measurements.

New available statistics:
- Total and % of fwd and flush execution time
- management time - overhead of rte_timer + overhead of rte_jobstats
library
- Idle time and % of time spent waiting for fwd or flush to be ready to
execute.
- per job execution time and period.

Fixes: 2caeb8c0141d ("examples/l2fwd-jobstats: new example")
[Thomas: files were missing in the previous commit]

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
9 years agoeal: fix missing symbol in version map
Cunming Liang [Wed, 25 Feb 2015 03:39:48 +0000 (11:39 +0800)]
eal: fix missing symbol in version map

As per_lcore__socket_id and rte_sys_gettid are missing in version map,
it causes compiling error when CONFIG_RTE_BUILD_SHARED_LIB is enabled.

Fixes: ef76436c6834 ("eal: get unique thread id")
Fixes: 9e29251b2afa ("eal: thread affinity API")

Reported-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Tested-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: John McNamara <john.mcnamara@intel.com>
9 years agodoc: update programmers guide for uio_pci_generic
Bruce Richardson [Tue, 24 Feb 2015 16:27:40 +0000 (16:27 +0000)]
doc: update programmers guide for uio_pci_generic

Since DPDK now has support for the in-tree uio_pci_generic driver,
update the programmers guide document to reference this module, and to use it
in preference to the igb_uio driver, which is DPDK-specific.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agodoc: update linux guide for uio_pci_generic use
Bruce Richardson [Tue, 24 Feb 2015 16:27:39 +0000 (16:27 +0000)]
doc: update linux guide for uio_pci_generic use

Since DPDK now has support for the in-tree uio_pci_generic driver,
update the GSG document to reference this module, and to use it
in preference to the igb_uio driver, which is DPDK-specific.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agoeal/linux: remove unnecessary check for primary instance
Bruce Richardson [Tue, 24 Feb 2015 13:30:47 +0000 (13:30 +0000)]
eal/linux: remove unnecessary check for primary instance

In pci_uio_map_resource we check that we are in a primary process
before calling pci_uio_set_bus_master. However, there is already
an earlier check which means that we are always in a primary instance
at this point in the code, so the check can be removed.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoeal/linux: populate uio maps from pci resources array
Bruce Richardson [Tue, 24 Feb 2015 13:30:46 +0000 (13:30 +0000)]
eal/linux: populate uio maps from pci resources array

Rather than scanning the resource file in sysfs a second time, we
can pull the information on physical addresses of BARs from the
pci resource information already present in the dev structure.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoeal/linux: mmap uio resources using resourceX files
Bruce Richardson [Tue, 24 Feb 2015 13:30:45 +0000 (13:30 +0000)]
eal/linux: mmap uio resources using resourceX files

Instead of distinguishing the BAR mappings via offset within a single
file, originally /dev/uioX, switch to mapping each individual bar via
the appropriately numbered resourceX file.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agomaintainers: claim responsibility for jobstats library and example
Pawel Wodkowski [Tue, 24 Feb 2015 16:33:25 +0000 (17:33 +0100)]
maintainers: claim responsibility for jobstats library and example

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoexamples/l2fwd-jobstats: new example
Pawel Wodkowski [Tue, 24 Feb 2015 16:33:24 +0000 (17:33 +0100)]
examples/l2fwd-jobstats: new example

This app demonstrate usage of new rte_jobstats library.
It is basically the orginal l2fwd with following modifications to met
library requirements:
- main_loop() was split into two jobs: forward job and flush job. Logic
for those jobs is almost the same as in original application.
- stats is moved to rte_alarm callback to not introduce overhead of
printing.
- stats are expanded to show rte_jobstats statistics.
- added new parameter '-l' to automatic thousands separator.

Comparing original l2fwd and l2fwd-jobstats apps will show approach what
is needed to properly write own application with rte_jobstats
measurements.

New available statistics:
- Total and % of fwd and flush execution time
- management time - overhead of rte_timer + overhead of rte_jobstats
library
- Idle time and % of time spent waiting for fwd or flush to be ready to
execute.
- per job execution time and period.

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
9 years agojobstats: new library
Pawel Wodkowski [Tue, 24 Feb 2015 16:33:23 +0000 (17:33 +0100)]
jobstats: new library

This library provide API to measure time spend in particular parts of
code and to calculate optimal polling time.

To calculate a those statistics application code need to be divided into
parts (called jobs) that do something. It is up to application to decide
what is considered a job.

Series of jobs must be surrounded with the rte_jobstats_context_start()
and rte_jobstats_context_finish() calls. After that, jobs might be
started.  Each job must be surrounded with rte_jobstats_start() and
rte_jobstats_finish() calls.

After job finishes its execution, period in which it should be called
again is adjusted. It might be used to minimize time wasted on
unnecessary polls/calls. Adjustment is based on data provided by job
itself (ex: number of packets it processed).

After all jobs in serie are executed fallowing statistics are updated
and might be used by application. Statistics can be reset. Some of
provided statistic data:
 - total/min/max execution - time spent in executing jobs.
 - total/min/max management - time spent outside execution area. This
value might be used to measure overhead of scheduling jobs. This time
also contains overhead of rte_jobstats library itself.
 - number of loops that executed at least one job
 - executed jobs
 - time when statistics were reset.

Each job provide total/min/max execution time and execution count
statistics.

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
9 years agoapp/test: fix missing NULL pointer checks
Daniel Mrzyglod [Tue, 27 Jan 2015 15:44:53 +0000 (16:44 +0100)]
app/test: fix missing NULL pointer checks

In test_sched, we are missing NULL pointer checks after create_mempool()
and rte_pktmbuf_alloc(). Add in these checks using TEST_ASSERT_NOT_NULL macros.

VERIFY macro was removed and replaced by standard test ASSERTS from "test.h" header.
This provides additional information to track when the failure occurred.

Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
9 years agodevargs: restore empty devargs
David Marchand [Tue, 24 Feb 2015 09:41:31 +0000 (10:41 +0100)]
devargs: restore empty devargs

Following commit c07691ae1089, an implicit change has been done in the
devargs API.
This triggers problem in virtual pmds that did not check for parameters
validity as it was implicitely valid.

Fix this by restoring the empty argument as "" and add a note in the api.
Restore associated tests.

Fixes: c07691ae1089 ("devargs: remove limit on parameters length")

Reported-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Tested-by: Tetsuya Mukawa <mukawa@igel.co.jp>
9 years agodoc: new eal multi-pthread feature
Cunming Liang [Mon, 16 Feb 2015 07:34:10 +0000 (15:34 +0800)]
doc: new eal multi-pthread feature

The patch add the multi-pthread section under EAL chapter of prog_guide.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
9 years agoring: add optional yield to avoid spin forever
Cunming Liang [Tue, 17 Feb 2015 02:08:15 +0000 (10:08 +0800)]
ring: add optional yield to avoid spin forever

Add a sched_yield() syscall if the thread spins for too long,
waiting other thread to finish its operations on the ring.
That gives pre-empted thread a chance to proceed and finish
with ring enqueue/dequeue operation.
The purpose is to reduce contention on the ring.
By ring_perf_test, it doesn't shows additional perf penalty.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoring: support non-EAL thread
Cunming Liang [Tue, 17 Feb 2015 02:08:14 +0000 (10:08 +0800)]
ring: support non-EAL thread

ring debug stat won't take care non-EAL thread.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agomempool: support non-EAL thread
Cunming Liang [Tue, 17 Feb 2015 02:08:13 +0000 (10:08 +0800)]
mempool: support non-EAL thread

For non-EAL thread, bypass per lcore cache, directly use ring pool.
It allows using rte_mempool in either EAL thread or any user pthread.
As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
It doesn't suggest to run multi-pthread/cpu which compete the rte_mempool.
It will get bad performance and has critical risk if scheduling policy is RT.
Haven't found significant performance decrease by mempool_perf_test.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agotimer: support non-EAL thread
Cunming Liang [Tue, 17 Feb 2015 02:08:16 +0000 (10:08 +0800)]
timer: support non-EAL thread

Allow to setup timers only for EAL (lcore) threads (__lcore_id < MAX_LCORE_ID).
E.g. – dynamically created thread will be able to reset/stop timer for lcore thread,
but it will be not allowed to setup timer for itself or another non-lcore thread.
rte_timer_manage() for non-lcore thread would simply do nothing and return straightway.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agospinlock: support non-EAL thread
Cunming Liang [Tue, 17 Feb 2015 02:08:12 +0000 (10:08 +0800)]
spinlock: support non-EAL thread

In non-EAL thread, lcore_id always be LCORE_ID_ANY.
It can't be used as unique id for recursive spinlock.
Then use rte_gettid() to replace it.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agolog: support non-EAL thread
Cunming Liang [Tue, 17 Feb 2015 02:08:10 +0000 (10:08 +0800)]
log: support non-EAL thread

For those non-EAL thread, *_lcore_id* is invalid and probably larger than RTE_MAX_LCORE.
The patch adds the check and allows only EAL thread using EAL per thread log level and log type.
Others shares the global log level.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoeal: initialize lcore and socket id
Cunming Liang [Tue, 17 Feb 2015 02:08:11 +0000 (10:08 +0800)]
eal: initialize lcore and socket id

Set _lcore_id and _socket_id to (-1) by default.
For those non EAL thread, _lcore_id shall always be LCORE_ID_ANY.
The libraries using _lcore_id as index need to take care.
_socket_id always be SOCKET_ID_ANY until the thread changes the affinity
by rte_thread_set_affinity().

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agomalloc: avoid unknown socket id
Cunming Liang [Tue, 17 Feb 2015 02:08:09 +0000 (10:08 +0800)]
malloc: avoid unknown socket id

Add check for rte_socket_id(), avoid get unexpected return like (-1).
By using rte_malloc_socket(), socket id is assigned by socket_arg.
If socket_arg set to SOCKET_ID_ANY, it expects to use the socket id to which the current cores belongs.
As the thread may affinity on a cpuset, the cores in the cpuset may belongs to different NUMA nodes.
The value of _socket_id probably be SOCKET_ID_ANY(-1), the case is not expected in origin malloc_get_numa_socket().

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoeal: apply thread affinity by assigned cpuset
Cunming Liang [Tue, 17 Feb 2015 02:08:07 +0000 (10:08 +0800)]
eal: apply thread affinity by assigned cpuset

EAL threads use assigned cpuset to set core affinity during startup.
It keeps 1:1 mapping, if no '--lcores' option is used.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoeal: thread affinity API
Cunming Liang [Tue, 17 Feb 2015 02:08:03 +0000 (10:08 +0800)]
eal: thread affinity API

1. add two TLS *_socket_id* and *_cpuset*
2. add one internal API, eal_cpu_socket_id/eal_thread_dump_affinity
3. add two public API, rte_thread_set/get_affinity
4. update EAL version map for EAL public API

The API works for both EAL thread and non EAL thread.
When calling rte_thread_set_affinity, the *_socket_id* and
*_cpuset* of calling thread will be updated if the thread
successfully set the cpu affinity.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoeal: get unique thread id
Cunming Liang [Tue, 17 Feb 2015 02:08:06 +0000 (10:08 +0800)]
eal: get unique thread id

The rte_gettid() wraps the linux and freebsd syscall gettid().
It provides a persistent unique thread id for the calling thread.
It will save the unique id in TLS on the first time.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoeal: get socket id from cpu id
Cunming Liang [Tue, 17 Feb 2015 02:08:02 +0000 (10:08 +0800)]
eal: get socket id from cpu id

It defines eal_cpu_socket_id() which exposing the origin private cpu_socket_id().
The function is only used inside EAL. It returns socket_id of the specified cpu_id.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoeal: fix strnlen return value with icc
Cunming Liang [Tue, 17 Feb 2015 02:08:01 +0000 (10:08 +0800)]
eal: fix strnlen return value with icc

The problem is that strnlen() here may return invalid value with 32bit icc.
(actually it returns it’s second parameter,e.g: sysconf(_SC_ARG_MAX)).
It starts to manifest hwen max_len parameter is > 2M and using icc –m32 –O2 (or above).

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoapp/test: add unit tests for --lcores option
Cunming Liang [Sun, 15 Feb 2015 05:47:31 +0000 (13:47 +0800)]
app/test: add unit tests for --lcores option

The patch add unit test for the new eal option "--lcores".

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
9 years agoeal: new option --lcores for cpu assignment
Cunming Liang [Tue, 17 Feb 2015 02:08:00 +0000 (10:08 +0800)]
eal: new option --lcores for cpu assignment

It supports one new eal long option '--lcores' for EAL thread cpuset assignment.

The format pattern:
--lcores='<lcores[@cpus]>[<,lcores[@cpus]>...]'
lcores, cpus could be a single digit/range or a group.
'(' and ')' are necessary if it's a group.
If not supply '@cpus', the value of cpus uses the same as lcores.

e.g. '1,2@(5-7),(3-5)@(0,2),(0,6),7-8' means starting 9 EAL thread as below
  lcore 0 runs on cpuset 0x41 (cpu 0,6)
  lcore 1 runs on cpuset 0x2 (cpu 1)
  lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
  lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
  lcore 6 runs on cpuset 0x41 (cpu 0,6)
  lcore 7 runs on cpuset 0x80 (cpu 7)
  lcore 8 runs on cpuset 0x100 (cpu 8)

Test report: http://dpdk.org/ml/archives/dev/2015-February/013383.html

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Qun Wan <qun.wan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoeal: add cpuset into lcore config
Cunming Liang [Tue, 17 Feb 2015 02:07:58 +0000 (10:07 +0800)]
eal: add cpuset into lcore config

The patch adds 'cpuset' into per-lcore configure 'lcore_config[]',
as the lcore no longer always 1:1 pinning with physical cpu.
The lcore now stands for a EAL thread rather than a logical cpu.

It doesn't change the default behavior of 1:1 mapping, but allows to
affinity the EAL thread to multiple cpus.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoenic: fix bsd namespace conflict
Cunming Liang [Tue, 17 Feb 2015 02:08:08 +0000 (10:08 +0800)]
enic: fix bsd namespace conflict

Some macros already been defined by freebsd 'sys/param.h'.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoeal/bsd: fix namespace conflict
Cunming Liang [Tue, 17 Feb 2015 02:07:59 +0000 (10:07 +0800)]
eal/bsd: fix namespace conflict

Fix namespace with EAL prefix.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agoeal/bsd: standardize init sequence between linux and bsd
Cunming Liang [Tue, 17 Feb 2015 02:08:05 +0000 (10:08 +0800)]
eal/bsd: standardize init sequence between linux and bsd

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agomaintainers: claim VFIO and IVSHMEM
Anatoly Burakov [Tue, 24 Feb 2015 11:19:18 +0000 (11:19 +0000)]
maintainers: claim VFIO and IVSHMEM

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agomk: fix build with Debian/Ubuntu-specific gcc version
Panu Matilainen [Tue, 24 Feb 2015 10:46:56 +0000 (12:46 +0200)]
mk: fix build with Debian/Ubuntu-specific gcc version

Commit 71f0ab1849b4fc3ca928deb566df12ca725ed150 broke compilation
on some versions of Debian and Ubuntu where gcc has been modified
to only emit MAJOR.MINOR part of the version from 'gcc -dumpversion'.
Drop the micro-version from gcc version comparisons to work around
this, it wasn't being used for anything anyway.

Fixes: 71f0ab1849b4 ("mk: rework gcc version detection to permit versions newer than 4.x")

Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoeal: add help option
Thomas Monjalon [Thu, 29 Jan 2015 16:51:17 +0000 (17:51 +0100)]
eal: add help option

Help is printed with -h or --help.

Help is also printed for an unknown option.
This was broken since the rework of options.

Fixes: 489a9d6c9f77 ("merge bsd and linux common options parsing")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
9 years agoeal: sort and align options lists
Thomas Monjalon [Thu, 29 Jan 2015 16:42:31 +0000 (17:42 +0100)]
eal: sort and align options lists

Options listing in usage help was a mess.
The main usage line is fixed and shorter.
The options in usage output are logically sorted (cpu/mem/dev/proc),
aligned and lightly reworded.
The options in declarations are alphabetically sorted.
Code in swith statement is not moved.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
9 years agodoc: describe ACL memory size build parameter
Konstantin Ananyev [Wed, 18 Feb 2015 16:28:51 +0000 (16:28 +0000)]
doc: describe ACL memory size build parameter

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Siobhan Butler <siobhan.a.butler@intel.com>
9 years agodoc: describe ACL classification methods
Konstantin Ananyev [Wed, 18 Feb 2015 16:28:50 +0000 (16:28 +0000)]
doc: describe ACL classification methods

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Siobhan Butler <siobhan.a.butler@intel.com>
9 years agodoc: add restrictions for ACL rule fields
Konstantin Ananyev [Wed, 18 Feb 2015 16:28:49 +0000 (16:28 +0000)]
doc: add restrictions for ACL rule fields

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Siobhan Butler <siobhan.a.butler@intel.com>
9 years agoenic: change probe log message level
Stephen Hemminger [Sat, 14 Feb 2015 15:32:59 +0000 (10:32 -0500)]
enic: change probe log message level

Drivers should be silent on boot.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Marchand <david.marchand@6wind.com>
9 years agoenic: replace use of printf with log
Stephen Hemminger [Sat, 14 Feb 2015 15:32:58 +0000 (10:32 -0500)]
enic: replace use of printf with log

Device driver should log via DPDK log, not to printf which is
sends to /dev/null in a daemon application.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Marchand <david.marchand@6wind.com>
[Thomas: include rte_log.h]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agomk: rework gcc version detection to permit versions newer than 4.x
Panu Matilainen [Mon, 23 Feb 2015 14:53:56 +0000 (16:53 +0200)]
mk: rework gcc version detection to permit versions newer than 4.x

Separately comparing major and minor versions becomes seriously clumsy
when with major version changes, convert the entire version string into
a numeric value (ie 4.6.0 becomes 460 and 5.0.0 becomes 500) and use
that for comparisons, eliminate unnecessary negations while at it.
This makes the comparisons simpler, more obvious and makes gcc 5.0
naturally recognized at least as capable as newest 4.x.

This three-digit scheme would run into trouble if gcc ever went to
two-digit version segments, but that hasn't happened in the last 10+
years so it seems like a safe assumption.

Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoixgbe: remove unused function causing error with clang
Keith Wiles [Mon, 23 Feb 2015 18:24:43 +0000 (12:24 -0600)]
ixgbe: remove unused function causing error with clang

Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agofm10k: fix clang warning flags
Jeff Shaw [Wed, 18 Feb 2015 17:57:43 +0000 (09:57 -0800)]
fm10k: fix clang warning flags

This commit fixes the following error which was reported when
compiling with clang by removing the option.

error: unknown warning option '-Wno-unused-but-set-variable'

Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
9 years agofm10k: fix build with unused debug function
Jeff Shaw [Wed, 18 Feb 2015 18:07:40 +0000 (10:07 -0800)]
fm10k: fix build with unused debug function

This commit fixes the following error which was reported when
compiling with clang by moving the function inside an
RTE_LIBRTE_FM10K_DEBUG_RX ifdef block.

error: unused function 'dump_rxd'

Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
9 years agoexamples/packet_ordering: move creation of reorder buffer
Sergio Gonzalez Monroy [Fri, 20 Feb 2015 12:10:53 +0000 (12:10 +0000)]
examples/packet_ordering: move creation of reorder buffer

There was no error checking after calling rte_reorder_create.
Move the creation of the reorder buffer before launching threads
in case of memory error.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
9 years agombuf: fix a couple of doxygen comments
Sergio Gonzalez Monroy [Fri, 20 Feb 2015 12:10:52 +0000 (12:10 +0000)]
mbuf: fix a couple of doxygen comments

Fix a couple of doxygen comments in mbuf structure:
 - seqn had no doxygen syntax.
 - usr was not generating proper link to function.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agodoc: add reorder api to doxygen
Sergio Gonzalez Monroy [Fri, 20 Feb 2015 12:10:51 +0000 (12:10 +0000)]
doc: add reorder api to doxygen

Add missing reorder lirbary directory to doxygen configuration.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agolib: fix C++11 compilation
Stefan Puiu [Fri, 20 Feb 2015 13:23:26 +0000 (15:23 +0200)]
lib: fix C++11 compilation

In C++11 concatenated string literals need to have a space in between.
Found with clang++-3.4, IIRC g++-4.8 also complains about this.

Sample error message:
error: invalid suffix on literal; C++11 requires a space between literal
and identifier [-Wreserved-user-defined-literal]

Signed-off-by: Stefan Puiu <stefan.puiu@gmail.com>
Reviewed-by: John McNamara <john.mcnamara@intel.com>
9 years agokni: optimize Rx burst
Hemant Agrawal [Wed, 23 Jul 2014 06:45:12 +0000 (12:15 +0530)]
kni: optimize Rx burst

The current implementation of rte_kni_rx_burst polls the fifo for buffers.
Irrespective of success or failure, it allocates the mbuf and try to put them into the alloc_q
if the buffers are not added to alloc_q, it frees them.
This waste lots of cpu cycles in allocating and freeing the buffers if alloc_q is full.

The logic has been changed to:
1. Initially allocand add buffer(burstsize) to alloc_q
2. Add buffers to alloc_q only when you are pulling out the buffers.

Signed-off-by: Hemant Agrawal <hemant@freescale.com>
Reviewed-by: Jay Rolette <rolette@infiniteio.com>
9 years agolpm: fix overflow issue
Igor Ryzhov [Fri, 20 Feb 2015 13:16:46 +0000 (16:16 +0300)]
lpm: fix overflow issue

LPM table overflow may occur if table is full and added rule has
the biggest depth that already have some rules.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agopipeline: fix port meta for non-default entries
Ildar Mustafin [Sat, 21 Feb 2015 08:31:21 +0000 (11:31 +0300)]
pipeline: fix port meta for non-default entries

Signed-off-by: Ildar Mustafin <imustafin@bk.ru>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
9 years agovhost: support dynamically registering server
Huawei Xie [Mon, 23 Feb 2015 17:36:33 +0000 (17:36 +0000)]
vhost: support dynamically registering server

* support calling rte_vhost_driver_register after rte_vhost_driver_session_start
* add mutext to protect fdset from concurrent access
* add busy flag in fdentry. this flag is set before cb and cleared after cb is finished.

mutex lock scenario in vhost:

* event_dispatch(in rte_vhost_driver_session_start) runs in a separate thread, infinitely
processing vhost messages through cb(callback).
* event_dispatch acquires the lock, get the cb and its context, mark the busy flag,
and releases the mutex.
* vserver_new_vq_conn cb calls fdset_add, which acquires the mutex and add new fd into fdset.
* vserver_message_handler cb frees data context, marks remove flag to request to delete
connfd(connection fd) from fdset.
* after cb returns, event_dispatch
  1. clears busy flag.
  2. if there is remove request, call fdset_del, which acquires mutex, checks busy flag, and
removes connfd from fdset.
* rte_vhost_driver_unregister(not implemented) runs in another thread, acquires the mutex,
calls fdset_del to remove fd(listenerfd) from fdset. Then it could free data context.

The above steps ensures fd data context isn't freed when cb is using.

VM(s) should have been shutdown before rte_vhost_driver_unregister.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
9 years agovhost: support ifname for vhost-user
Huawei Xie [Mon, 23 Feb 2015 17:36:32 +0000 (17:36 +0000)]
vhost: support ifname for vhost-user

for vhost-cuse, ifname is the name of the tap device
for vhost-user, ifname is the name of the unix domain socket path

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
9 years agovhost: support vhost-user
Huawei Xie [Mon, 23 Feb 2015 17:36:31 +0000 (17:36 +0000)]
vhost: support vhost-user

In rte_vhost_driver_register(), vhost unix domain socket listener fd is created
and added to polled(based on select) fdset.

In rte_vhost_driver_session_start(), fds in the fdset are checked for
processing. If there is new connection from qemu, connection fd accepted is
added to polled fdset. The listener and connection fds in the fdset are
then both checked. When there is message on the connection fd, its
callback vserver_message_handler is called to process vhost-user messages.

To support identifying which virtio is from which guest VM, we could call
rte_vhost_driver_register with different socket path. Virtio devices from
same VM will connect to VM specific socket. The socket path information is
stored in the virtio_net structure.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
9 years agovhost: add select based event driven processing
Huawei Xie [Mon, 23 Feb 2015 17:36:30 +0000 (17:36 +0000)]
vhost: add select based event driven processing

for more generic event driven processing, refer to:
http://libevent.org/

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
9 years agovhost: implement cuse memory table
Huawei Xie [Mon, 23 Feb 2015 17:36:29 +0000 (17:36 +0000)]
vhost: implement cuse memory table

remove set_memory_table ops

vhost-cuse or vhost-user will both implement their own set_memory_region handler.

In current vhost-cuse implementation, guest numa memory isn't supported.
Assume that guest memory is backed by only one file.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
9 years agovhost: make host memory mapping more generic
Huawei Xie [Mon, 23 Feb 2015 17:36:28 +0000 (17:36 +0000)]
vhost: make host memory mapping more generic

This functions accepts a virtual address and pid(qemu), and maps it into
current process(vhost)'s address space.

The memory behind the virtual address should be backed by a file,
and virtual address should be the starting address.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
9 years agovhost: copy host memory mapping to a new cuse file
Huawei Xie [Mon, 23 Feb 2015 17:36:27 +0000 (17:36 +0000)]
vhost: copy host memory mapping to a new cuse file

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
9 years agovhost: move fd copying into cuse subdirectory
Huawei Xie [Mon, 23 Feb 2015 17:36:26 +0000 (17:36 +0000)]
vhost: move fd copying into cuse subdirectory

File descriptor is copied from qemu process into vhost process.
vhost-user doesn't need eventfd kernel module to copy fds between processes.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
9 years agovhost: rename header file
Huawei Xie [Mon, 23 Feb 2015 17:36:25 +0000 (17:36 +0000)]
vhost: rename header file

Rename vhost-net-cdev.h to vhost-net.h.
This file defines common operations provided by virtio-net(.c).

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
9 years agovhost: move cuse related handling in a subdirectory
Huawei Xie [Mon, 23 Feb 2015 17:36:24 +0000 (17:36 +0000)]
vhost: move cuse related handling in a subdirectory

Create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse.

vhost-cuse driver will be divided into two parts: cuse driver specific message
handling(in cuse directory) and common message handling(in virtio-net.c).

vhost ioctl message is pre-processed in cuse and then sent to virtio-net
if is not terminated.

virtio-net.c provides common message handling for both vhost-cuse and vhost-user.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
9 years agovhost: enable virtio control channel Rx mode
Huawei Xie [Mon, 23 Feb 2015 17:36:23 +0000 (17:36 +0000)]
vhost: enable virtio control channel Rx mode

VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ.
Observed that virtio-net driver in guest would crash with only CTRL_RX enabled.

In virtnet_send_command:

/* Caller should know better */
BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
(out + in > VIRTNET_SEND_COMMAND_SG_MAX));

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
9 years agoexamples/rxtx_callbacks: show use of callbacks
Bruce Richardson [Mon, 23 Feb 2015 18:30:10 +0000 (18:30 +0000)]
examples/rxtx_callbacks: show use of callbacks

Example showing how callbacks can be used to insert a timestamp
into each packet on RX. On TX the timestamp is used to calculate
the packet latency through the app, in cycles.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
9 years agoethdev: support optional Rx and Tx callbacks
Bruce Richardson [Mon, 23 Feb 2015 18:30:09 +0000 (18:30 +0000)]
ethdev: support optional Rx and Tx callbacks

Add optional support for inline processing of packets inside the RX
or TX call. For an RX callback, what happens is that we get a set of
packets from the NIC and then pass them to a callback function, if
configured, to allow additional processing to be done on them, e.g.
filling in more mbuf fields, before passing back to the application.
On TX, the packets are similarly post-processed before being handed
to the NIC for transmission.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoethdev: rename interrupt callbacks field
Bruce Richardson [Mon, 23 Feb 2015 18:30:08 +0000 (18:30 +0000)]
ethdev: rename interrupt callbacks field

The 'callbacks' member of the rte_eth_dev structure has been renamed
to 'link_intr_cbs' to make it clear that it refers to callbacks from
NIC interrupts. This allows us to add other types of callbacks to
the structure without ambiguity.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoi40e: enable internal switch of PF
Jingjing Wu [Thu, 29 Jan 2015 01:41:55 +0000 (09:41 +0800)]
i40e: enable internal switch of PF

This patch enables PF's internal switch by setting ALLOWLOOPBACK
flag when VEB is created. With this patch, traffic from PF can be
switched on the VEB.

Test report: http://www.dpdk.org/ml/archives/dev/2015-February/013237.html

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Min Cao <min.cao@intel.com>
9 years agoi40e: fix vsi configuration
Jingjing Wu [Thu, 29 Jan 2015 01:41:54 +0000 (09:41 +0800)]
i40e: fix vsi configuration

In i40e_vsi_config_tc_queue_mapping, should add a flag to indicate
another valid setting by OR operation, but not set this flag to
valid_sections, otherwise it will overwrite the flags set before.

Test report: http://www.dpdk.org/ml/archives/dev/2015-February/013237.html

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Tested-by: Min Cao <min.cao@intel.com>
9 years agoi40e: workaround for XL710 performance
Helin Zhang [Mon, 29 Dec 2014 01:41:28 +0000 (09:41 +0800)]
i40e: workaround for XL710 performance

On XL710, performance number is far from the expectation on recent
firmware versions, if promiscuous mode is disabled, or promiscuous
mode is enabled and port MAC address is equal to the packet
destination MAC address. The fix for this issue may not be
integrated in the following firmware version. So the workaround in
software driver is needed. For XL710, it needs to modify the initial
values of 3 internal only registers, which are the same as X710.
Note that the values for X710 and XL710 registers could be different,
and the workaround can be removed when it is fixed in firmware in
the future.

Test report: http://www.dpdk.org/ml/archives/dev/2015-February/012749.html

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
9 years agoeal/linux: allow to map BARs with MSI-X tables
Dan Aloni [Wed, 28 Jan 2015 22:04:53 +0000 (00:04 +0200)]
eal/linux: allow to map BARs with MSI-X tables

While VFIO doesn't allow us to map complete BARs with MSI-X tables,
it does allow us to map around them in PAGE_SIZE granularity. There
might be adapters that provide their registers in the same BAR
but on a different page. For example, Intel's NVME adapter, though
not a network adapter, provides only one MMIO BAR that contains
the MSI-X table.

Signed-off-by: Dan Aloni <dan@kernelim.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
9 years agombuf: remove build option to disable refcnt
Sergio Gonzalez Monroy [Wed, 18 Feb 2015 11:03:03 +0000 (11:03 +0000)]
mbuf: remove build option to disable refcnt

This patch removes all references to RTE_MBUF_REFCNT, setting the refcnt
field in the mbuf struct permanently.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agombuf: introduce indirect attached flag
Sergio Gonzalez Monroy [Wed, 18 Feb 2015 11:03:02 +0000 (11:03 +0000)]
mbuf: introduce indirect attached flag

Currently for mbufs with refcnt, we cannot free mbufs with external memory
buffers (ie. vhost zero copy), as they are recognized as indirect
attached mbufs and therefore we free the direct mbuf it points to,
resulting in an error in the case of external memory buffers.

We solve the issue by introducing the IND_ATTACHED_MBUF flag, which indicates
that the mbuf is an indirect attached mbuf pointing to another mbuf.
When we free an mbuf, we only free the direct mbuf if the flag is set.
Freeing an mbuf with external buffer is the same as freeing a non attached mbuf.
The flag is set during attach and clear on detach.

So in the case of vhost zero copy where we have mbufs with external
buffers, by default we just free the mbuf and it is up to the user to deal with
the external buffer.

This patch would allow the removal of the RTE_MBUF_REFCNT config option,
setting refcnt for all mbufs permanently.

The patch also modifies the vhost example as it was using the
RTE_MBUF_INDIRECT macro to detect if it was an mbuf with external buffer.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
9 years agokni: add build option to disable preempting
Marc Sune [Fri, 13 Feb 2015 14:25:25 +0000 (15:25 +0100)]
kni: add build option to disable preempting

This patch introduces CONFIG_RTE_KNI_PREEMPT_DEFAULT flag. When set to 'no',
KNI kernel thread(s) do not call schedule_timeout_interruptible(), which
improves overall KNI performance at the expense of CPU cycles (polling).

Default values is 'yes', maintaining the same behaviour as of now.

Signed-off-by: Marc Sune <marc.sune@bisdn.de>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
9 years agoapp/test: remove redundant compile checks
Yerden Zhumabekov [Thu, 29 Jan 2015 08:50:47 +0000 (14:50 +0600)]
app/test: remove redundant compile checks

Since rte_hash_crc() can now be run regardless of SSE4.2 support,
we can safely remove compile checks for RTE_MACHINE_CPUFLAG_SSE4_2
in test utilities.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agohash: slice CRC data into 8-byte pieces
Yerden Zhumabekov [Thu, 29 Jan 2015 08:50:26 +0000 (14:50 +0600)]
hash: slice CRC data into 8-byte pieces

Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using CRC32 functions with either 8 and 4 byte operands.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agohash: fallback to software CRC32 implementation
Yerden Zhumabekov [Thu, 29 Jan 2015 08:50:03 +0000 (14:50 +0600)]
hash: fallback to software CRC32 implementation

Initially, SSE4.2 support is detected via the constructor function.

Added rte_hash_crc_set_alg() function to detect and set CRC32
implementation if necessary. SSE4.2 is allowed by default.

rte_hash_crc_*byte() functions reworked so they choose available
CRC32 implementation in the runtime.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agohash: add CRC function for 8 bytes
Yerden Zhumabekov [Thu, 29 Jan 2015 08:49:47 +0000 (14:49 +0600)]
hash: add CRC function for 8 bytes

SSE4.2 provides CRC32 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agohash: replace built-in functions implementing SSE4.2
Yerden Zhumabekov [Thu, 29 Jan 2015 08:49:17 +0000 (14:49 +0600)]
hash: replace built-in functions implementing SSE4.2

Give up using built-in intrinsics and use our own assembly
implementation. Remove #include entry as well.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agohash: add assembly implementation of CRC32 intrinsics
Yerden Zhumabekov [Thu, 29 Jan 2015 08:48:59 +0000 (14:48 +0600)]
hash: add assembly implementation of CRC32 intrinsics

Added:
- crc32c_sse42_u32() emits 'crc32l' asm instruction;
- crc32c_sse42_u64() emits 'crc32q' asm instruction;
- crc32c_sse42_u64_mimic(), wrapper in case of run on 32-bit platform.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agohash: add software CRC32 implementation
Yerden Zhumabekov [Thu, 29 Jan 2015 08:48:41 +0000 (14:48 +0600)]
hash: add software CRC32 implementation

Add lookup tables for CRC32 algorithm, crc32c_1word() and
crc32c_2words() functions returning hash of 32-bit and 64-bit
operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
9 years agoapp/testpmd: support NVGRE in Tx checksum offload
Jijiang Liu [Fri, 20 Feb 2015 17:01:47 +0000 (17:01 +0000)]
app/testpmd: support NVGRE in Tx checksum offload

Enhance csum fwd engine based on current TX checksum framework in order
to test TX Checksum offload for NVGRE packet.

It includes:
 - IPv4 and IPv6 packet
 - outer L3, inner L3 and L4 checksum offload for Tx side.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
9 years agoapp/testpmd: support NVGRE in Rx tunnel filtering
Jijiang Liu [Fri, 20 Feb 2015 17:01:46 +0000 (17:01 +0000)]
app/testpmd: support NVGRE in Rx tunnel filtering

Extend the "tunnel_filter" command in testpmd to test the RX tunnel filter API for NVGRE packet.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
9 years agoi40e: support NVGRE in Rx tunnel filtering
Jijiang Liu [Fri, 20 Feb 2015 17:01:45 +0000 (17:01 +0000)]
i40e: support NVGRE in Rx tunnel filtering

The filter types supported are listed below for NVGRE packet:
   1. Inner MAC and Inner VLAN ID.
   2. Inner MAC address, inner VLAN ID and tenant ID.
   3. Inner MAC and tenant ID.
   4. Inner MAC address.
   5. Outer MAC address, tenant ID and inner MAC address.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
9 years agoether: add transparent ethernet bridging type
Jijiang Liu [Fri, 20 Feb 2015 17:01:44 +0000 (17:01 +0000)]
ether: add transparent ethernet bridging type

Add an Ethernet type definition for Transparent Ethernet Bridging.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
9 years agoapp/testpmd: support new rss offloads
Helin Zhang [Wed, 4 Feb 2015 07:16:33 +0000 (15:16 +0800)]
app/testpmd: support new rss offloads

RSS offloads supported 'ip' and 'udp' only, which did not demonstrate
all of the hardware capabilities. The modifications adds support of
new RSS offloads of 'tcp', 'sctp', 'ether' and 'all'.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
9 years agoapp/testpmd: fix some indent
Helin Zhang [Wed, 4 Feb 2015 07:16:27 +0000 (15:16 +0800)]
app/testpmd: fix some indent

Added code style fixes.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
9 years agoethdev: unification of RSS offload types
Helin Zhang [Wed, 4 Feb 2015 07:16:32 +0000 (15:16 +0800)]
ethdev: unification of RSS offload types

RSS offload types were defined separately for 1/10G and 40G NICs,
and have no relationship with flow types. The modifications are to
unify all RSS offload types for all PMDs. Unified RSS offload types
have new and common names which can be used for any PMD or
applications, and decouple from specific hardwares.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
[Thomas: merge with fm10k]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoethdev: unification of flow types
Helin Zhang [Wed, 4 Feb 2015 07:16:31 +0000 (15:16 +0800)]
ethdev: unification of flow types

Flow types was defined actually for i40e hardware specifically,
and wasn't able to be used for defining RSS offload types of all
PMDs. It removed the enum flow types, and uses macros instead
with new names. The new macros can be used for defining RSS
offload types later. Also modifications are made in i40e and
testpmd accordingly.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
[Thomas: merge with new flow director API]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoethdev: fix size of flow type mask array
Helin Zhang [Wed, 4 Feb 2015 07:16:30 +0000 (15:16 +0800)]
ethdev: fix size of flow type mask array

It wrongly calculates the size of the flow type mask array. The fix
is to align the flow type maximum index ID with the number of
element bit width, and then divide the number of element bit width.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
9 years agoethdev: minor comment changes
Helin Zhang [Wed, 4 Feb 2015 07:16:28 +0000 (15:16 +0800)]
ethdev: minor comment changes

Added code style fixes.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
9 years agoi40e: remove some useless line breaks
Helin Zhang [Wed, 4 Feb 2015 07:16:29 +0000 (15:16 +0800)]
i40e: remove some useless line breaks

Added code style fixes.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
9 years agoethdev: remove old ethertype filter ABI
Thomas Monjalon [Sun, 22 Feb 2015 02:04:44 +0000 (03:04 +0100)]
ethdev: remove old ethertype filter ABI

The old ethertype filter API was removed in commit 75db20648,
but was still in (newly integrated) version map for ABI.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
9 years agoethdev: remove old ntuple filter API
Jingjing Wu [Tue, 10 Feb 2015 04:48:32 +0000 (12:48 +0800)]
ethdev: remove old ntuple filter API

Following structures are removed:
 - rte_2tuple_filter
 - rte_5tuple_filter
Following APIs are removed:
 - rte_eth_dev_add_2tuple_filter
 - rte_eth_dev_remove_2tuple_filter
 - rte_eth_dev_get_2tuple_filter
 - rte_eth_dev_add_5tuple_filter
 - rte_eth_dev_remove_5tuple_filter
 - rte_eth_dev_get_5tuple_filter
It also move macros TCP_*_FLAG to rte_eth_ctrl.h, and removes the macro
TCP_UGR_FLAG which is duplicated with TCP_URG_FLAG.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
[Thomas: remove also from version map]