eal/x86: optimize memcpy for AVX512 platforms
Implement AVX512 memcpy and choose the right implementation based on
predefined macros, to make full utilization of hardware resources and
deliver high performance.
In current DPDK, memcpy holds a large proportion of execution time in
libs like Vhost, especially for large packets, and this patch can bring
considerable benefits for AVX512 platforms.
The implementation is based on the current DPDK memcpy framework, some
background introduction can be found in these threads:
http://dpdk.org/ml/archives/dev/2014-November/008158.html
http://dpdk.org/ml/archives/dev/2015-January/011800.html
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>