Reorganise SSE code-path a bit by moving lo/hi dwords shuffle
out from calc_addr().
That allows to make calc_addr() for SSE and AVX2 practically identical
and opens opportunity for further code deduplication.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>