mempool/octeontx2: add optimized dequeue operation for arm64
This patch adds an optimized arm64 instruction based routine to leverage
CPU pipeline characteristics of octeontx2. The theme is to fill the
pipeline with CASP operations as much HW can do so that HW can do alloc()
HW ops in full throttle.