Wenzhuo Lu [Fri, 6 Nov 2015 07:49:30 +0000 (15:49 +0800)]
app/testpmd: fix flow director help and doc
After implementing the fdir new modes for x550, the CLIs are modified.
Forgot to update the related help info and doc.
Fixes: 53b2bb9b7ea7 ("app/testpmd: new flow director commands") Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Pablo de Lara [Fri, 6 Nov 2015 15:15:13 +0000 (15:15 +0000)]
app/test: fix unit test for option -n
eal_flags_autotest was broken after commit 19bfa4dd ("eal: make the -n argument optional"),
since the unit test was checking that app would not run
if -n flag was missing, which now it is possible.
Also, subtest test_missing_n_flag() has been renamed
to test_invalid_n_flag(), as now -n flag is not compulsory.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Tiwei Bie [Mon, 19 Oct 2015 13:13:10 +0000 (21:13 +0800)]
eal: do not reset getopt lib
Someone may need to call rte_eal_init() with a fake argc/argv array
in the middle of using getopt() to parse its own unrelated argc/argv
parameters. So getopt lib shouldn't be reset by rte_eal_init().
Now eal will always save optind, optarg and optopt (and optreset on
FreeBSD) at the beginning, initialize optind (and optreset on FreeBSD)
to 1 before calling getopt_long(), then restore all values after.
Suggested-by: Don Provan <dprovan@bivio.net> Suggested-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Tiwei Bie <btw@mail.ustc.edu.cn> Reviewed-by: Don Provan <dprovan@bivio.net> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: David Marchand <david.marchand@6wind.com>
Tim Shearer [Tue, 27 Oct 2015 21:38:55 +0000 (17:38 -0400)]
ethdev: fix link status race condition
Calling the Ethernet driver's link_update function from
rte_eth_dev_start can result in a race condition if the NIC raises
the link interrupt at the same time.
Depending on the interrupt handler implementation, the race can cause
the it to think that it received two consecutive link up interrupts,
and it exits without calling the user callback. Appears to impact
E1000/IGB and virtio drivers only.
Signed-off-by: Tim Shearer <tim.shearer@overturenetworks.com>
Chao Zhu [Wed, 4 Nov 2015 06:15:26 +0000 (14:15 +0800)]
config: turn off fm10k driver on IBM POWER
The fm10k vector driver is specific for x86 platform which can't compile
on IBM POWER for lacking of tmmintrin.h header file. This patch turns
off fm10k driver compilation on IBM POWER to prevent compile issue.
Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com> Acked-by: Michael Qiu <michael.qiu@intel.com>
The i40e driver was using a #define value for the max number of rxtx interrupts
supported. This value was defined only for linux, giving an error when compiling
on FreeBSD.
CC i40e_ethdev.o
/usr/home/bruce/dpdk.org/drivers/net/i40e/i40e_ethdev.c:3885:9: fatal error: use of undeclared
identifier 'RTE_MAX_RXTX_INTR_VEC_ID'
Copying the necessary #define into the FreeBSD EAL header fixes the compile
error.
Fixes: d37641029ada ("eal/linux: add interrupt vectors") Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
This patch removes l3_l4_xsum_errors from rx errors.
The reason to remove it is that UDP packets have an optional checksum, and
when not calculated the checksum field should be set to zero. When the
checksum is not calculated (zero-ed out), the hardware still counts a valid
UDP packet as an l3_l4_xsum_error.
This hardware issue is documented in 82599 errata, titled:
"Integrity Error Reported for IPv4/UDP Packets with Zero Checksum"
The solution is to remove l3_l4_xsum_errors from rx_errors, as discussed on
http://thread.gmane.org/gmane.comp.networking.dpdk.devel/25590/
Fixes: f6bf669b9900 ("ixgbe: account more Rx errors") Suggested-by: Martin Weiser <martin.weiser@allegro-packets.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Pablo de Lara [Wed, 16 Sep 2015 21:51:24 +0000 (22:51 +0100)]
ethdev: check queue state before starting or stopping
Following the same approach taken with dev_started field
in rte_eth_dev_data structure, this patch adds two new fields
in it, rx_queue_state and tx_queue_state arrays, which track
which queues have been started and which not.
This is important to avoid trying to start/stop twice a queue,
which will result in undefined behaviour
(which may cause RX/TX disruption).
Mind that only the PMDs which have queue_start/stop functions
have been changed to update this field, as the functions will
check the queue state before switching it.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Cunming Liang [Wed, 4 Nov 2015 08:45:40 +0000 (16:45 +0800)]
i40evf: support Rx interrupt
The patch enables rx interrupt support on i40e VF and some necessary
change on PF IOV mode to support VF.
On PF side, running in IOV mode via uio won't allow rx interrupt
which is exclusive with mbox interrupt in single vector competition.
On VF side, one single vector is shared for all the rx queues.
Cunming Liang [Wed, 4 Nov 2015 08:45:39 +0000 (16:45 +0800)]
i40e: support Rx interrupt
The patch enables rx interrupt support on i40e PF non-IOV mode.
Per queue rx interrupt works on vfio, however on uio, all rx queues
share one interrupt vector.
Cunming Liang [Wed, 4 Nov 2015 08:45:35 +0000 (16:45 +0800)]
ixgbe: fix VF start with PF stopped
When ixgbe runs as a PF, mbox interrupt is prerequisite to make VF
start normally.
And PF sometimes won't 'dev_start', so the mbox interrupt register
during 'dev_init' is required.
The patch rolls back the interrupt register for mbox,lsc to the 'dev_init'.
As UIO doesn't support multiple vector, mbox has to occupy the only one.
It adds condition check on 'dev_start', rxq interrupt is not allowed
when PF running in IOV mode via UIO.
Cunming Liang [Wed, 4 Nov 2015 08:45:38 +0000 (16:45 +0800)]
igb: fix VF start with PF stopped
When igb runs as a PF, mbox interrupt is prerequisite to make VF
start normally.
And PF sometimes won't 'dev_start', so the mbox interrupt register
during 'dev_init' is required.
The patch rolls back the interrupt register for mbox,lsc to the 'dev_init'.
As UIO doesn't support multiple vector, mbox has to occupy the only one.
It adds condition check on 'dev_start', rxq interrupt is not allowed
when PF running in IOV mode via UIO.
Cunming Liang [Wed, 4 Nov 2015 08:45:34 +0000 (16:45 +0800)]
eal: query multi-vector interrupt capability
VFIO allows multiple MSI-X vector, others doesn't, but maybe will
allow it in the future.
Device drivers need to be aware of the capability.
It's better to avoid condition check on interrupt type (VFIO) everywhere,
instead a capability api is more flexible for the condition change.
Signed-off-by: Cunming Liang <cunming.liang@intel.com> Acked-by: David Marchand <david.marchand@6wind.com>
Cunming Liang [Wed, 4 Nov 2015 08:45:28 +0000 (16:45 +0800)]
eal: reserve VFIO vector zero for misc interrupt
During VFIO_DEVICE_SET_IRQS, the previous order is
{Q0_fd, ... Qn_fd, misc_fd}.
The vector number of misc is indeterminable which is
ugly to some NIC (e.g. i40e, fm10k).
The patch adjusts the order in {misc_fd, Q0_fd, ... Qn_fd},
always reserve the first vector to misc interrupt.
Signed-off-by: Cunming Liang <cunming.liang@intel.com> Acked-by: David Marchand <david.marchand@6wind.com>
Xutao Sun [Wed, 4 Nov 2015 09:20:48 +0000 (17:20 +0800)]
i40e: fix statistics
The old statistics on i40e only counted the packets on ports.
So the discarding packets on VSI were not counted.
This patch is to make statistics for packets both on ports and VSI.
Also update release notes.
Signed-off-by: Xutao Sun <xutao.sun@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Pablo de Lara [Mon, 12 Oct 2015 12:52:57 +0000 (13:52 +0100)]
kni: rename macro for igb nlflags
Rename HAVE_NDO_BRIDGE_GETLINK_FILTER_MASK macro for
a more meaningful HAVE_NDO_BRIDGE_GETLINK_NLFLAGS,
as the macro is used to know if igb_ndo_bridge_getlink
function has nlflags parameter.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Dex Chen [Tue, 27 Oct 2015 02:56:37 +0000 (10:56 +0800)]
kni: allow per-net instances
There is a global variable 'device_in_use' which is used to make sure
only one instance is using /dev/kni device. If you were using LXC, you
will find there is only one instance of KNI example could be run even
different namespaces were created.
In order to have /dev/kni used simultaneously in different namespaces,
making all of global variables as per network namespace variables.
With regard to single kernel thread mode, there will be one kernel
thread for each of network namespace.
When an application using huge-pages crash or exists, the hugetlbfs
backing files are not cleaned up. This is a patch to clean those files.
There are multi-process DPDK applications that may be benefited by those
backing files. Therefore, I have made that configurable so that the
application that does not need those backing files can remove them, thus
not changing the current default behavior. The application itself can
clean it up, however the rationale behind DPDK cleaning it up is, DPDK
created it and therefore, it is better it unlinks it.
Na Na [Tue, 3 Nov 2015 02:17:41 +0000 (10:17 +0800)]
lpm: fix incorrect reuse of already allocated tbl8
Fixes an initialization issue of 'valid_group' in the delete_depth_small().
When adding an entry to a tbl8, the .valid_group field should always be set,
so that future adds do not accidently find and use this table, thinking it is
currently invalid, i.e. unused, and thereby overwrite existing entries.
Signed-off-by: Na Na <nana.nn@alibaba-inc.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Na Na [Tue, 3 Nov 2015 02:17:40 +0000 (10:17 +0800)]
lpm: fix condition check in delete
Fixes an issue of check logic in delete_depth_small function.
For a tbl24 entry, the 'ext_entry' field indicates whether we need
to use tbl8_gindex to read the next_hop from a tbl8 entry, or whether
it can be read directly from this entry.
If a route is deleted, the prefix of previous route is used to override
the deleted route.
When checking the depth of the previous route the conditional checks
both the ext_entry and the depth, but the "else" leg fails to take
account that the condition could fail for one of two possible reasons,
leading to an incorrect flow when 'ext_entry == 0' is true,
but 'lpm->tbl24[i].depth > depth' is false.
The fix here is to add a condition check to the else leg so that it
only executes when ext_entry is set.
Signed-off-by: Na Na <nana.nn@alibaba-inc.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Pablo de Lara [Thu, 17 Sep 2015 10:30:48 +0000 (11:30 +0100)]
hash: fix incorrect lookup if key is all zero
If user has not added an all zero key in the hash table,
and tries to look it up, it results in an incorrect hit,
as dummy slot in the key table has all zero as well.
Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation") Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Pablo de Lara [Fri, 30 Oct 2015 14:37:28 +0000 (14:37 +0000)]
hash: fix scaling by reducing contention
If using multiple cores on a system with hardware transactional
memory support, thread scaling does not work, as there was a single
point in the hash library which is a bottleneck for all threads,
which is the "free_slots" ring, which stores all the indices of
the free slots in the table.
This patch fixes the problem, by creating a local cache per logical core,
which stores locally indices of free slots,
so most times, writer threads will not interfere each other.
Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation") Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Pablo de Lara [Fri, 2 Oct 2015 16:07:13 +0000 (17:07 +0100)]
hash: free internal ring when freeing hash
Since freeing a ring is now possible, then when freeing
a hash table, its internal ring can be freed as well.
Therefore when a new table, with the same name as a previously
freed table, is created, there is no need to look up
the already allocated ring.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Pablo de Lara [Fri, 2 Oct 2015 15:53:44 +0000 (16:53 +0100)]
ring: support freeing
When creating a ring, a memzone is created to allocate it in memory,
but the ring could not be freed, as memzones could not be.
Since memzones can be freed now, then rings can be as well,
taking into account if they were initialized using pre-allocated memory
(in which case, memory should be freed externally) or using rte_memzone_reserve
(with rte_ring_create), freeing the memory with rte_memzone_free.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Helin Zhang [Tue, 3 Nov 2015 16:28:35 +0000 (00:28 +0800)]
i40e: select GRE key length for filtering
By default, only first 3 bytes of GRE key will be used for hash or
FD calculation. With these changes, it can select 3 or 4 bytes of
GRE key for hash or FD calculation.
Helin Zhang [Tue, 3 Nov 2015 16:07:54 +0000 (00:07 +0800)]
i40e: configure input fields for RSS or flow director
The default input set of fields of a received packet are loaded from
firmware, which cannot be modified even users want to use different
fields for RSS or flow director. Here adds more flexibilities of
selecting packet fields for hash calculation or flow director for
users.
Helin Zhang [Tue, 3 Nov 2015 15:40:28 +0000 (23:40 +0800)]
i40e: enlarge the number of supported queues
It enlarges the number of supported queues to hardware allowed
maximum. There was a software limitation of 64 per physical port
which is not reasonable.
Only seen since IPv6 RSS support was added, confusion about the purpose of
the hash_rxq_type_from_n() function has caused it to return invalid hash RX
queue types.
Refactor function for its intended purpose, rename it
hash_rxq_type_from_pos() and update comment with a better description.
Fixes: a76133214d88 ("mlx5: use separate indirection table for default hash Rx queue") Reported-by: Olga Shern <olgas@mellanox.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The following error occurs when CONFIG_RTE_LIBRTE_MLX5_DEBUG=y:
drivers/net/mlx5/mlx5.c:381:4: error: ISO C forbids braced-groups within expressions
RTE_MIN() uses the non-standard ({ ... }) syntax to declare variables within
parentheses, which is rejected by -pedantic.
Since the RSS_INDIRECTION_TABLE_SIZE check is meant to go away as soon as
DPDK supports larger/variable indirection tables, put it in a separate
condition.
Fixes: 634efbc2c8c0 ("mlx5: support RETA query and update") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
remove pci_dev, pci_drv, rte_bond_pmd and pci_id_table.
handle numa_node for vdevs
handle RTE_ETH_DEV_INTR_LSC for vdevs
rename the function valid_bonded_device to check_for_bonded_device
remove branches on pci_dev
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com> Acked-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: John W. Linville <linville@tuxdriver.com>
Ravi Kerur [Wed, 23 Sep 2015 21:16:17 +0000 (14:16 -0700)]
ethdev: clean port id retrieval when attaching
Removed following functions
> rte_eth_dev_save and
> rte_eth_dev_get_changed_port
Added 2 new functions
> rte_eth_dev_get_port_by_name
> rte_eth_dev_get_port_by_addr
Compiled on Linux for following targets
> x86_64-native-linuxapp-gcc
> x86_64-native-linuxapp-clang
> x86_x32-native-linuxapp-gcc
Compiled on FreeBSD for following targets
> x86_64-native-bsdapp-clang
> x86_64-native-bsdapp-gcc
Tested on Linux/FreeBSD:
> port attach eth_null
> port start all
> port stop all
> port close all
> port detach 0
> port attach eth_null
> port start all
> port stop all
> port close all
> port detach 0
Successful run of checkpatch.pl on the diffs
Successful validate_abi on Linux for following targets
Ivan Boule [Thu, 29 Oct 2015 08:46:15 +0000 (09:46 +0100)]
virtio: fix size of MAC address array
Make the virtio PMD allocate the array of unicast MAC addresses with
the maximum of entries (VIRTIO_MAX_MAC_ADDRS) that it exports.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com> Signed-off-by: David Marchand <david.marchand@6wind.com> Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Didier Pallard [Thu, 29 Oct 2015 08:47:53 +0000 (09:47 +0100)]
ixgbe: remove useless fields in checksum offload
According to Table 7-38: Valid Fields by Offload Option
of Intel ® 82599 10 GbE Controller Datasheet,
L4LEN field is not needed for L4 XSUM computation by the hardware.
So remove l4_len from tx_offload_mask in ixgbe_set_xmit_ctx
function used to build the context transmitted to the hardware.
Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Signed-off-by: David Marchand <david.marchand@6wind.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
ConnectX-4 adapters do not have a constant indirection table size, which is
set at runtime from the number of RX queues. The maximum size is retrieved
using a hardware query and is normally 512.
Since the current RETA API cannot handle a variable size, any query/update
command causes it to be silently updated to RSS_INDIRECTION_TABLE_SIZE
entries regardless of the original size.
Also due to the underlying type of the configuration structure, the maximum
size is limited to RSS_INDIRECTION_TABLE_SIZE (currently 128, at most 256
entries).
A port stop/start must be done to apply the new RETA configuration.
Helin Zhang [Tue, 13 Oct 2015 07:19:50 +0000 (15:19 +0800)]
i40evf: support AQ based RSS config
It supports both Admin queue based and directly writing registers
based RSS hash key and lookup table configuration, as X722 supports
AQ based configuration.
Signed-off-by: Helin Zhang <helin.zhang@intel.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Helin Zhang [Tue, 13 Oct 2015 07:19:49 +0000 (15:19 +0800)]
i40e: support AQ based RSS config
It supports both Admin queue based and directly writing registers
based RSS hash key and lookup table configuration, as X722 supports
AQ based configuration.
Signed-off-by: Helin Zhang <helin.zhang@intel.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Helin Zhang [Tue, 13 Oct 2015 07:19:48 +0000 (15:19 +0800)]
i40e: support X722 and its A0 hardware
In order to provide users early access of X722 and its A0 hardware,
new device IDs are added, and also compilation with those support
in base driver is enabled.
Signed-off-by: Helin Zhang <helin.zhang@intel.com> Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Bruce Richardson [Wed, 30 Sep 2015 12:12:20 +0000 (13:12 +0100)]
ring: store memzone pointer
Add a new field to the rte_ring structure to store the memzone pointer which
contains the ring. For rings created using rte_ring_create(), the field will
be set automatically.
This new field will allow users of the ring to query the numa node a ring is
allocated on, or to get the physical address of the ring, if so needed.
The rte_ring structure will also maintain ABI compatibility, as the
structure members, after the new one, are set to be cache line aligned,
so leaving a space.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Bruce Richardson [Wed, 30 Sep 2015 12:12:19 +0000 (13:12 +0100)]
ring: enhance device setup from rings
The ring ethdev creation function creates an ethdev, but does not
actually set it up for use. Even if it's just a single ring, the user
still needs to create a mempool, call rte_eth_dev_configure, then call
rx and tx setup functions before the ethdev can be used.
This patch changes things so that the ethdev is fully set up after the
call to create the ethdev. The above-mentionned functions can still be
called - as will be the case, for instance, if the NIC is created via
commandline parameters - but they no longer are essential.
The function now also sets rte_errno appropriately on error, so the
caller can get a better indication of why a call may have failed.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Maryam Tahhan [Tue, 20 Oct 2015 10:34:18 +0000 (11:34 +0100)]
ethdev: do not deprecate imissed counter
Remove the deprecation tag and notice for imissed as it is a generic
register that accounts for packets that were dropped by the HW,
because there are no available mbufs (RX queues are full). imissed is
different to ierrors and can help with general debug.
This patch removes the mac local fault count and
mac remote fault count from rx errors. The mac
fault count registers count faults, not packets,
and hence should not be added to packet counters.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Update the strings used for presenting stats to adhere
to the scheme previously presented. Updated xstats_get()
function to handle Q information only if xstats() is not
implemented in the PMD, providing the PMD with the needed
flexibility to expose its extended Q stats.
Add extended statistic section to the programmers
guide, poll mode driver section. This section describes
how the strings stats are formatted, and how the client
code can use this to gather information about the stat.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Maryam Tahhan <maryam.tahhan@intel.com>
Marcel Apfelbaum [Thu, 15 Oct 2015 11:08:39 +0000 (14:08 +0300)]
vhost: enable virtio 1.0
Make vhost-user virtio 1.0 compatible by adding it to the
supported features and keeping the header length
the same as for mergeable RX buffers.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Huawei Xie [Thu, 29 Oct 2015 14:53:26 +0000 (22:53 +0800)]
virtio: add vector Rx
With fixed avail ring, we don't need to get desc idx from avail ring.
virtio driver only has to deal with desc ring.
This patch uses vector instruction to accelerate processing desc ring.
Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Huawei Xie [Thu, 29 Oct 2015 14:53:24 +0000 (22:53 +0800)]
virtio: optimize ring layout
In DPDK based switching environment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor
When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.
This idea of this optimization is:
allocate the fixed descriptor for each entry of avail ring, so avail ring will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.
This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
avail
idx
+
|
+----+----+---+-------------+------+
| 0 | 1 | 2 | ... | 254 | 255 | avail ring
+-+--+-+--+-+-+---------+---+--+---+
| | | | | |
| | | | | |
v v v | v v
+-+--+-+--+-+-+---------+---+--+---+
| 0 | 1 | 2 | ... | 254 | 255 | desc ring
+----+----+---+-------------+------+
|
|
+----+----+---+-------------+------+
| 0 | 1 | 2 | | 254 | 255 | used ring
+----+----+---+-------------+------+
|
+
This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.
++
||
||
+-----+-----+-----+--------------+------+------+------+
| 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| | | || | | |
v v v || v v v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 128 | 129 | ... | 255 || 128 | 129 | ... | 255 | desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| | | || | | |
v v v || v v v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
||
||
++
Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Vector RX function will process 4 packets at a time. When the RX
ring wrapps to the tail and the left descriptor size is not multiple
of 4, SW will overwrite memory that not belongs to it and cause crash.
The fix will allocate additional 4 HW/SW spaces at the tail to avoid
overwrite.
Vector TX use different way to manage TX queue, it's necessary
to use different functions to reset TX queue and release mbuf
in TX queue. So, introduce 2 function pointers to do such ops.