net/mlx5: fix Tx doorbell memory barrier
Configuring UAR as IO-mapped makes maximum throughput decline by
noticeable amount. If UAR is configured as write-combining register,
a write memory barrier is needed on ringing a doorbell.
rte_wmb() is mostly effective when the size of a burst is comparatively
small. Revert the register back to write-combining and enforce a write
memory barrier instead, except for vectorized Tx burst routines.
Application can change it by setting MLX5_SHUT_UP_BF under its own
necessity.
Fixes:
9f9bebae5530 ("net/mlx5: don't map doorbell register to write combining")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>