vmxnet3: improve Rx performance
This patch includes two small performance optimizations
on the rx path:
(1) It adds unlikely hints on various infrequent error
paths to the compiler to make branch prediction more
efficient.
(2) It also moves a constant assignment out of the pkt
polling loop. This saves one branching per packet.
Performance evaluation configs:
- On the DPDK-side, it's running some l3 forwarding app
inside a VM on ESXi with one core assigned for polling.
- On the client side, pktgen/dpdk is used to generate
64B tcp packets at line rate (14.8M PPS).
Performance results on a Nehalem box (4cores@2.8GHzx2)
shown below. CPU usage is collected factoring out the
idle loop cost.
- Before the patch, ~900K PPS with 65% CPU of a core
used for DPDK.
- After the patch, only 45% of a core used, while
maintaining the same packet rate.
Signed-off-by: Yong Wang <yongwang@vmware.com>