net/i40e: improve scalar Tx performance
authorFeifei Wang <feifei.wang2@arm.com>
Wed, 30 Jun 2021 06:40:35 +0000 (14:40 +0800)
committerQi Zhang <qi.z.zhang@intel.com>
Tue, 6 Jul 2021 02:59:01 +0000 (04:59 +0200)
commit95e7bb6a5fc9e371e763b11ec15786e4d574ef8e
tree6f61283cf676b5b59eb4b13b811c9f06d3db01cf
parent79d559de896b29f6b557a27a8b6381944e39a87d
net/i40e: improve scalar Tx performance

For i40e scalar Tx path, if implement FAST_FREE_MBUF mode, it means
per-queue all mbufs come from the same mempool and have refcnt = 1.

Thus we can use bulk free of the buffers when mbuf fast free mode is
enabled.

Following are the test results with this patch:

MRR L3FWD Test:
two ports & bi-directional flows & one core
RX API: i40e_recv_pkts_bulk_alloc
TX API: i40e_xmit_pkts_simple
ring_descs_size = 1024;
Ring_I40E_TX_MAX_FREE_SZ = 64;
tx_rs_thresh = I40E_DEFAULT_TX_RSBIT_THRESH = 32;
tx_free_thresh = I40E_DEFAULT_TX_FREE_THRESH = 32;

For scalar path in arm platform with default 'tx_rs_thresh':
In n1sdp, performance is improved by 7.9%;
In thunderx2, performance is improved by 7.6%.

For scalar path in x86 platform with default 'tx_rs_thresh':
performance is improved by 4.7%.

Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
drivers/net/i40e/i40e_rxtx.c