Prefetching only helps performance if it is done several 100
instructions before the actual use. The purpose of the prefetch
is to read ahead, it doesn't help if the next instruction
will block.
The code in the bnxt driver was doing these unnecessary prefetches.