eal/x86: improve memcpy performance
authorZhihong Wang <zhihong.wang@intel.com>
Wed, 25 May 2016 01:23:03 +0000 (21:23 -0400)
committerThomas Monjalon <thomas.monjalon@6wind.com>
Wed, 15 Jun 2016 14:20:04 +0000 (16:20 +0200)
commit4b42e90ef0e421dc777f2b2e377eb237cd3675fa
tree16365e92702de61d558aca5db7f8441b7e54d23f
parent43b194b43344b262b00f6c8a24443b46ed82009b
eal/x86: improve memcpy performance

This patch fixes rte_memcpy performance in Haswell and Broadwell for
vhost when copy size larger than 256 bytes.

It is observed that for large copies like 1024/1518 ones, rte_memcpy
suffers high ratio of store buffer full issue which causes pipeline
to stall in scenarios like vhost enqueue. This can be alleviated by
adjusting instruction layout. Note that this issue may not be visible
in micro test.

How to reproduce?

PHY-VM-PHY using vhost/virtio or vhost/virtio loop back, with large
packets like 1024/1518 bytes ones. Make sure packet generation rate
is not the bottleneck if PHY-VM-PHY is used.

Test report: http://dpdk.org/ml/archives/dev/2016-May/039716.html

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
lib/librte_eal/common/include/arch/x86/rte_memcpy.h