ring: fix single consumer dequeue performance
Use of rte_smb_wmb() instead of rte_smb_rmb() in sc dequeue function
creates the additional overhead of waiting for all the STOREs
to be completed to local buffer from ring buffer memory.
The sc dequeue function demands only LOAD-STORE barrier where LOADs
from ring buffer memory needs to be completed before tail pointer update.
Changing to rte_smb_rmb() to enable the required LOAD-STORE barrier.
Fixes:
ecc7d10e448e ("ring: guarantee dequeue ordering before tail update")
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>