net/i40e: improve scalar Tx performance
For i40e scalar Tx path, if implement FAST_FREE_MBUF mode, it means
per-queue all mbufs come from the same mempool and have refcnt = 1.
Thus we can use bulk free of the buffers when mbuf fast free mode is
enabled.
Following are the test results with this patch:
MRR L3FWD Test:
two ports & bi-directional flows & one core
RX API: i40e_recv_pkts_bulk_alloc
TX API: i40e_xmit_pkts_simple
ring_descs_size = 1024;
Ring_I40E_TX_MAX_FREE_SZ = 64;
tx_rs_thresh = I40E_DEFAULT_TX_RSBIT_THRESH = 32;
tx_free_thresh = I40E_DEFAULT_TX_FREE_THRESH = 32;
For scalar path in arm platform with default 'tx_rs_thresh':
In n1sdp, performance is improved by 7.9%;
In thunderx2, performance is improved by 7.6%.
For scalar path in x86 platform with default 'tx_rs_thresh':
performance is improved by 4.7%.
Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>