net/i40e: reduce L1 cache misses in NEON Rx
authorFeifei Wang <feifei.wang2@arm.com>
Fri, 23 Jul 2021 03:10:49 +0000 (11:10 +0800)
committerQi Zhang <qi.z.zhang@intel.com>
Tue, 10 Aug 2021 03:02:16 +0000 (05:02 +0200)
commit319df9f9bf1ad4a854e1e8b9fe087580909b8263
tree1b3b0aaf32776ba605228e02a7bfe05362edf238
parentdecc3b6aa5bf2776c872825d42301cf585d78bc2
net/i40e: reduce L1 cache misses in NEON Rx

For N1 platform, packet mbuf load and descs load are hot spots to limit
the performance for "desc_to_ptype_v" and "desc_to_olflags_v" functions
in i40e rx NEON path. This is because packet mbuf and descs are evicted
from l1d-cache to l2d-cache.

To reduce l1d-cache-misses and improve the performance, change the code
order and move "desc_to_ptype_v" and "desc_to_olflags_v" functions
forward to the location, where packet mbuf and descs are just loaded.

Test Result:
dpdk:21.08-rc1
gcc-9
For n1sdp, the patch improves the performance by 1.8%.
For thunderx2, no performance changes.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
drivers/net/i40e/i40e_rxtx_vec_neon.c