test/ring: replace sync with atomic builtins
authorPhil Yang <phil.yang@arm.com>
Mon, 8 Apr 2019 03:02:31 +0000 (11:02 +0800)
committerThomas Monjalon <thomas@monjalon.net>
Mon, 8 Jul 2019 14:35:55 +0000 (16:35 +0200)
'__sync' built-in functions are deprecated, should use the '__atomic'
built-in instead. the sync built-in functions are full barriers, while
atomic built-in functions offer less restrictive one-way barriers,
which help performance.

Here is the example test result on TX2:
sudo ./arm64-armv8a-linuxapp-gcc/app/test -c 0x7fffffe \
-n 4 --socket-mem=1024,0 --file-prefix=~ -- -i
RTE>>ring_perf_autotest

*** ring_perf_autotest without this patch ***
SP/SC bulk enq/dequeue (size: 8): 6.22
MP/MC bulk enq/dequeue (size: 8): 11.50
SP/SC bulk enq/dequeue (size: 32): 1.85
MP/MC bulk enq/dequeue (size: 32): 2.66

*** ring_perf_autotest with this patch ***
SP/SC bulk enq/dequeue (size: 8): 6.13
MP/MC bulk enq/dequeue (size: 8): 9.83
SP/SC bulk enq/dequeue (size: 32): 1.96
MP/MC bulk enq/dequeue (size: 32): 2.30

So for the ring performance test, this patch improved 11% of ring
operations performance.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
app/test/test_ring_perf.c

index 6eccccf..b6ad703 100644 (file)
@@ -162,7 +162,11 @@ enqueue_bulk(void *p)
        unsigned i;
        void *burst[MAX_BURST] = {0};
 
-       if ( __sync_add_and_fetch(&lcore_count, 1) != 2 )
+#ifdef RTE_USE_C11_MEM_MODEL
+       if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+       if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
                while(lcore_count != 2)
                        rte_pause();
 
@@ -198,7 +202,11 @@ dequeue_bulk(void *p)
        unsigned i;
        void *burst[MAX_BURST] = {0};
 
-       if ( __sync_add_and_fetch(&lcore_count, 1) != 2 )
+#ifdef RTE_USE_C11_MEM_MODEL
+       if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+       if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
                while(lcore_count != 2)
                        rte_pause();