spinlock: use WFE to reduce contention on aarch64
authorGavin Hu <gavin.hu@arm.com>
Wed, 7 Jul 2021 05:48:37 +0000 (13:48 +0800)
committerThomas Monjalon <thomas@monjalon.net>
Fri, 9 Jul 2021 19:33:01 +0000 (21:33 +0200)
In acquiring a spinlock, cores repeatedly poll the lock variable.
This is replaced by rte_wait_until_equal API.

Running micro benchmarking and testpmd and l3fwd traffic tests
on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well
and no notable performance gain nor degradation was measured.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
lib/eal/include/generic/rte_spinlock.h

index 87ae7a4..40fe49d 100644 (file)
@@ -65,8 +65,8 @@ rte_spinlock_lock(rte_spinlock_t *sl)
 
        while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
                                __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
-               while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED))
-                       rte_pause();
+               rte_wait_until_equal_32((volatile uint32_t *)&sl->locked,
+                              0, __ATOMIC_RELAXED);
                exp = 0;
        }
 }