On Xeon, 512b accesses are available, so movdir64 instruction is able to
perform 512b read and write to DLB producer port. In order for movdir64
to be able to pull its data from store buffers (store-buffer-forwarding)
(before actual write), data should be in single 512b write format.
This commit add change when code is built for Xeon with 512b AVX support
to make single 512b write of all 4 QEs instead of 4x64b writes.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com> Acked-by: Kent Wires <kent.wires@intel.com>