ring/c11: relax ordering for load and store of the head
When calling __atomic_compare_exchange_n, use relaxed ordering for the
success case, as multiple producers/consumers do not release updates to
each other so no need for acquire or release ordering.
Because the thread fence in place, ordering for the first iteration can
be relaxed.
Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core,4 threads/core,2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i
Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34
With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.59
MP/MC bulk enq/dequeue (size: 8): 10.54
SP/SC bulk enq/dequeue (size: 32): 1.73
MP/MC bulk enq/dequeue (size: 32): 2.38
No significant improvement, nor regression was seen, as the optimisation
is not at the critical path.
Fixes:
39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>