net/bnxt: avoid unnecessary work in AVX2 Rx
authorLance Richardson <lance.richardson@broadcom.com>
Mon, 15 Nov 2021 18:24:09 +0000 (13:24 -0500)
committerAjit Khaparde <ajit.khaparde@broadcom.com>
Tue, 16 Nov 2021 22:56:37 +0000 (23:56 +0100)
Each call to the AVX2 vector burst receive function makes at
least one pass through the function's inner loop, loading
256 bytes of completion descriptors and copying 8 rte_mbuf
pointers regardless of whether there are any packets to be
received.

Unidirectional forwarding performance is improved by about
3-4% if we ensure that at least one packet can be received
before entering the inner loop.

Fixes: c4e4c18963b0 ("net/bnxt: add AVX2 RX/Tx")
Cc: stable@dpdk.org
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
drivers/net/bnxt/bnxt_rxtx_vec_avx2.c

index e4905b4..54e3af2 100644 (file)
@@ -98,6 +98,10 @@ recv_burst_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
        rte_prefetch0(&cp_desc_ring[cons + 8]);
        rte_prefetch0(&cp_desc_ring[cons + 12]);
 
+       /* Return immediately if there is not at least one completed packet. */
+       if (!bnxt_cpr_cmp_valid(&cp_desc_ring[cons], raw_cons, cp_ring_size))
+               return 0;
+
        /* Ensure that we do not go past the ends of the rings. */
        nb_pkts = RTE_MIN(nb_pkts, RTE_MIN(rx_ring_size - mbcons,
                                           (cp_ring_size - cons) / 2));