cxgbe: optimize forwarding performance for 40G