git.droids-corp.org - dpdk.git/commit

author	Zhihong Wang <zhihong.wang@intel.com>
	Thu, 29 Jan 2015 02:38:47 +0000 (10:38 +0800)
committer	Thomas Monjalon <thomas.monjalon@6wind.com>
	Wed, 25 Feb 2015 10:50:53 +0000 (11:50 +0100)
commit	9144d6bcdefd5096a9f3f89a3ce433a54ed84475
tree	8d31fbf704c7542bb28c5f41789ec75542cff438	tree \| snapshot
parent	667d534aa7852c6e8221fa188b95026deee82186	commit \| diff

eal/x86: optimize memcpy for SSE and AVX

Main code changes:

1. Differentiate architectural features based on CPU flags
    a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth
    b. Implement separated copy flow specifically optimized for target architecture

2. Rewrite the memcpy function "rte_memcpy"
    a. Add store aligning
    b. Add load aligning based on architectural features
    c. Put block copy loop into inline move functions for better control of instruction order
    d. Eliminate unnecessary MOVs

3. Rewrite the inline move functions
    a. Add move functions for unaligned load cases
    b. Change instruction order in copy loops for better pipeline utilization
    c. Use intrinsics instead of assembly code

4. Remove slow glibc call for constant copies

Test report: http://dpdk.org/ml/archives/dev/2015-January/011848.html

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
Reviewed-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>