eal/arm: add vector memcpy for ARMv7
The SSE based memory copy in DPDK only support x86. This patch
adds ARM NEON based memory copy functions for ARM architecture.
The implementation improves memory copy of short or well aligned
data buffers. The following measurements show improvements over
the libc memcpy on Cortex CPUs.
by X % faster
Length (B) a15 a7 a9
1 4.9 15.2 3.2
7 56.9 48.2 40.3
8 37.3 39.8 29.6
9 69.3 38.7 33.9
15 60.8 35.3 23.7
16 50.6 35.9 35.0
17 57.7 35.7 31.1
31 16.0 23.3 9.0
32 65.9 13.5 21.4
33 3.9 10.3 -3.7
63 2.0 12.9 -2.0
64 66.5 0.0 16.5
65 2.7 7.6 -35.6
127 0.1 4.5 -18.9
128 66.2 1.5 -51.4
129 -0.8 3.2 -35.8
255 -3.1 -0.9 -69.1
256 67.9 1.2 7.2
257 -3.6 -1.9 -36.9
320 67.7 1.4 0.0
384 66.8 1.4 -14.2
511 -44.9 -2.3 -41.9
512 67.3 1.4 -6.8
513 -41.7 -3.0 -36.2
1023 -82.4 -2.8 -41.2
1024 68.3 1.4 -11.6
1025 -80.1 -3.3 -38.1
1518 -47.3 -5.0 -38.3
1522 -48.3 -6.0 -37.9
1600 65.4 1.3 -27.3
2048 59.5 1.5 -10.9
3072 52.3 1.5 -12.2
4096 45.3 1.4 -12.5
5120 40.6 1.5 -14.5
6144 35.4 1.4 -13.4
7168 32.9 1.4 -13.9
8192 28.2 1.4 -15.1
Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: David Marchand <david.marchand@6wind.com>