spinlock: use WFE to reduce contention on aarch64