net/mlx5: prefetch CQEs for a faster decompression
authorAlexander Kozyrev <akozyrev@mellanox.com>
Tue, 24 Mar 2020 14:45:30 +0000 (16:45 +0200)
committerFerruh Yigit <ferruh.yigit@intel.com>
Tue, 21 Apr 2020 11:57:05 +0000 (13:57 +0200)
commit28a4b96321a3376174dad8778c5245708c469ca7
treec8953dd7dad7f438245bbe7f71e0090f5d853aae
parent70fa0b4ed083ea2444848716f5d396aba499b560
net/mlx5: prefetch CQEs for a faster decompression

Invalidation of consumed CQEs incurs a performance penalty
due to many cache misses caused by a non-sequential CQEs access.
Prefetch CQEs to get a better data locality and speed up the
decompression of CQEs. Prefetching reduces CPI rate of the
rxq_cq_decompress_v() function from 1 to 0.85 in my environment,
resulting in 2% boost in mpps for 64B frames single core test.

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
drivers/net/mlx5/mlx5_rxtx_vec_neon.h
drivers/net/mlx5/mlx5_rxtx_vec_sse.h