Matan Azrad [Mon, 29 Jun 2020 14:08:20 +0000 (14:08 +0000)]
vhost: notify virtq file descriptor update
When virtq call or kick file descriptors are changed in the device
configuration when the queue is ready, the application and the vDPA
driver should be notified to be aligned to the new file descriptors.
Notify the state to be disabled before the file descriptor update and
return it back to be enabled after the update.
Matan Azrad [Mon, 29 Jun 2020 14:01:56 +0000 (14:01 +0000)]
vdpa/mlx5: control completion queue event mode
The CQ polling is necessary in order to manage guest notifications when
the guest doesn't work with poll mode (callfd != -1).
The CQ polling scheduling method can affect the host CPU utilization and
the traffic bandwidth.
Define 3 modes to control the CQ polling scheduling:
1. A timer thread which automatically adjusts its delays to the coming
traffic rate.
2. A timer thread with fixed delay time.
3. Interrupts: Each CQE burst arms the CQ in order to get an interrupt
event in the next traffic burst.
When traffic becomes off, mode 3 is taken automatically.
The interrupt management takes a lot of CPU cycles but forward traffic
event to the guest very fast.
Timer thread save the interrupt overhead but may add delay for the guest
notification.
Add device arguments to control on the mode.
Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Matan Azrad [Mon, 29 Jun 2020 14:01:55 +0000 (14:01 +0000)]
vdpa/mlx5: optimize completion queue poll
The vDPA driver uses a CQ in order to know when traffic works were
completed by the HW.
Each traffic burst completion adds a CQE to the CQ.
When the vDPA driver detects CQEs in the CQ, it triggers the guest
notification for the corresponding queue and consumes all of them.
There is collapse feature in the HW that configures the HW to write all
the CQEs in the first entry of the CQ.
Using this feature, the vDPA driver can read only the first CQE,
validate that the completion counter inside the CQE was changed and if
so, to notify the guest.
Use CQ collapse feature in order to improve the poll utilization.
Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Matan Azrad [Mon, 29 Jun 2020 14:01:54 +0000 (14:01 +0000)]
vdpa/mlx5: optimize notification events
When the virtio guest driver doesn't work with poll mode, the driver
creates event mechanism in order to schedule completion notifications
for each virtq burst traffic.
When traffic comes to a virtq, a CQE will be added to the virtq CQ by
the FW.
The driver requests interrupt for the next CQE index, and when interrupt
is triggered, the driver polls the CQ and notifies the guest by virtq
callfd writing.
According to the described method, the interrupts will be triggered for
each burst of traffic. The burst size depends on interrupt latency.
Interrupts management takes a lot of CPU cycles and using it for each
traffic burst takes big portion of CPU capacity.
When traffic is on, using timer for CQ poll scheduling instead of
interrupts saves a lot of CPU cycles.
Move CQ poll scheduling to be done by timer in case of running traffic.
Request interrupts only when traffic is off.
The timer scheduling management is done by a new dedicated thread uses
a usleep command.
Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Michael Baum [Wed, 24 Jun 2020 13:50:39 +0000 (13:50 +0000)]
net/mlx5: fix iterator type in Rx queue management
The mlx5_check_vec_rx_support function in the mlx5_rxtx_vec.c file
passes the RX queues array in the loop. Similarly, the mlx5_mprq_enabled
function in the mlx5_rxq.c file passes the RX queues array in the loop.
In both cases, the iterator of the loop is called i and the variable
representing the array size is called rxqs_n.
The i variable is of UINT16_T type while the rxqs_n variable is of
unsigned int type. The size of the rxqs_n variable is much larger than
the number of iterations allowed by the i type, theoretically there may
be a situation where the value of the rxqs_n will be greater than can be
represented by 16 bits and the loop will never end.
Michael Baum [Wed, 24 Jun 2020 13:46:41 +0000 (13:46 +0000)]
net/mlx5: use anonymous Direct Verbs allocator argument
The mlx5_dev_spawn function defines an struct mlx5dv_ctx_allocators type
variable several hundred rows after it starts, with the only use it
being passed as a parameter to the mlx5_glue->dv_set_context_attr
function.
However, according to DPDK Coding Style Guidelines, variables should be
declared at the start of a block of code rather than in the middle.
Therefore, to improve the Coding Style, the variable is passed directly
to the function without declaring it before.
Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
Michael Baum [Wed, 24 Jun 2020 13:44:27 +0000 (13:44 +0000)]
net/mlx4: remove useless assignment
The mlx4_ibv_device_to_pci_addr function defines a variable called ret
inside a loop and uses it.
During the loop, the function assigns a value within the variable and
breaks from the loop, so that this assigning has done nothing and is
actually unnecessary.
Remove the unnecessary assigning.
Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
Michael Baum [Wed, 24 Jun 2020 13:33:50 +0000 (13:33 +0000)]
common/mlx5: remove useless assignment
The mlx5_dev_to_pci_addr function defines a variable called ret inside a
loop and uses it.
During the loop, the function assigns a value within the variable and
breaks from the loop, so that this assigning has done nothing and is
actually unnecessary.
Remove the unnecessary assigning.
Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
Michael Baum [Wed, 24 Jun 2020 13:29:55 +0000 (13:29 +0000)]
net/mlx4: use anonymous Direct Verbs allocator argument
The mlx4_pci_probe function defines an struct mlx4dv_ctx_allocators type
variable several hundred rows after it starts, with the only use it
being passed as a parameter to the mlx4_glue->dv_set_context_attr
function.
However, according to DPDK Coding Style Guidelines, variables should be
declared at the start of a block of code rather than in the middle.
Therefore, to improve the Coding Style, the variable is passed directly
to the function without declaring it before.
Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
Michael Baum [Wed, 24 Jun 2020 13:20:31 +0000 (13:20 +0000)]
common/mlx5: fix code arrangement in tag allocation
Flow tag action is supported only when the driver has DR or DV support.
The tag allocation is adjusted to the modes DV or DR.
In case both DR and DV are not supported in the system, the driver
handles static code for error report.
This error code, wrongly, was compiled when DV is supported while in
this case it cannot be accessed at all.
Ignore the aforementioned static error code in case of DV by
preprocessor commands rearrangement.
Fixes: cbb66daa3c85 ("net/mlx5: prepare Direct Verbs for Direct Rule") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
Shiri Kuzin [Tue, 23 Jun 2020 08:41:07 +0000 (11:41 +0300)]
net/mlx5: add parameter for LACP packets control
The new devarg will control the steering of the lacp traffic.
When setting dv_lacp_by_user = 0 the lacp traffic will be
steered to kernel and managed there.
When setting dv_lacp_by_user = 1 the lacp traffic will
not be steered and the user will need to manage it.
Add support for the vfr flag to the mark manager.
The vf representor flag is added to class table so it can be set in
the template details.
Also added the vfr flag process in mark database.
Mike Baucom [Fri, 12 Jun 2020 12:50:18 +0000 (18:20 +0530)]
net/bnxt: refactor mapper opcodes
Unify the opcodes of the different enums into a single enum for reuse of
common processors. Also the ADD_PAD opcode is now SET_TO_ZERO.
This change better reflects the intent of the opcode and allows it to be
used in more circumstances without overloading the term pad.
The fields that were setting a constant zero have now been switched to
use the new SET_TO_ZERO opcode as an optimization. The SET_TO_ZERO does
not copy data into the key/result/mask fields, but rather simply
increments the write pointer.
Venkat Duvvuru [Fri, 12 Jun 2020 12:50:15 +0000 (18:20 +0530)]
net/bnxt: modify IPv6 VTC flow field parsing
ipv6 vtc_flow contains three fields
1. Version
2. Priority / Traffic Class
3. Flow Label
Currently, these are not parsed separately and also not set separately
in the field bitmap by the flow parser. However, the template treats
them separately. As a result, the flow matching doesn't succeed because
the bitmaps of parser and the template doesn't match.
This patch fixes this problem by parsing the above mentioned fields
individually to align with the template.
Shuanglin Wang [Fri, 12 Jun 2020 12:50:12 +0000 (18:20 +0530)]
net/bnxt: set maximum flow count
User could set max flow count by passing a devarg
"-w 0000:0d:00.0,max_num_kflows=64" to a DPDK application;
The value must be not less than 32K and be power-of-2;
the default value is 32K.
net/bnxt: refactor and rename some fields and enums
- rename regfile_wr_idx to regfile_idx
The regfile index shall be used for both write and read operations.
Hence the field is renamed.
- remove the unused enum BNXT_ULP_REGFILE_INDEX_CACHE_ENTRY_PTR
- rename the enums in the bnxt_ulp_resource_sub_type
The enums in the bnxt_ulp_resource_sub_type are renamed to reflect
the table types explicitly.
- rename an enum in the regfile index
The BNXT_ULP_REGFILE_INDEX_ACTION_PTR_MAIN is renamed to
BNXT_ULP_REGFILE_INDEX_MAIN_ACTION_PTR since it is the main
action pointer.
- remove cache_tbl_id enums
The bnxt_ulp_cache_tbl_id enums are not required any longer
since the index is now calculated using resource sub type
and direction.
Extend index table processing to process action templates.
The index table processing is extended to address encapsulation fields
so that action template index table can be processed by a common index
processing function that can process both class and action index
tables.
This enables using the action bitmap to update the action result
fields in the flow creation instead of using computed header fields.
Direction bit needs to be added to the action bitmap during
flow parsing, so that egress flows can be matched to the
template signature.
An example would be the usage of the vlan pop action bitmap that is
used to updated action result field as part of this commit.
Also the ulp action bitmap enumeration values that
contain open flow string are renamed.
net/bnxt: update compute field list and access macros
The compute field is extended to support action fields and not
just header fields, hence CHF is changed to CF. The access macro
for compute field is renamed to address this.
net/bnxt: add computed header field in result opcode
Added support for computed header fields in the result field
processing. The computed header fields are fields that are extracted
from header fields or derived from data that is not part of the flow
command but shall be used in setting up of the flow rule.
net/bnxt: remove fields from bitmap and mapper table
Remove unnecessary fields from bitmap and mapper table.
- remove svif and VLAN info from header bitmap
The svif and vlan information are removed from header bitmap
signature so that the matching algorithm does not use these
fields to perform matching. So flows with or without vlan
tag could use the same flow template.
- remove mem field from mapper class table
Remove the unused mem field in the ulp mapper class table structure
Maxime Coquelin [Fri, 26 Jun 2020 14:04:37 +0000 (16:04 +0200)]
vhost: introduce wrappers for some vDPA ops
This patch is preliminary work to make the vDPA device
structure opaque to the user application. Some callbacks
of the vDPA devices are used to query capabilities before
attaching to a Vhost port. This patch introduces wrappers
for these ops.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>
Maxime Coquelin [Fri, 26 Jun 2020 14:04:34 +0000 (16:04 +0200)]
vhost: replace device ID in applications
This patch replaces the use of vDPA device ID with
vDPA device pointer. The goals is to remove the vDPA
device ID to avoid confusion with the Vhost ID.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>
Maxime Coquelin [Fri, 26 Jun 2020 14:04:32 +0000 (16:04 +0200)]
vhost: replace device ID in vDPA ops
This patch is a preliminary step to get rid of the
vDPA device ID. It makes vDPA callbacks to use the
vDPA device struct as a reference instead of the ID.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com>
Maxime Coquelin [Fri, 26 Jun 2020 14:04:29 +0000 (16:04 +0200)]
bus/fslmc: fix iterating on a class type
This patches fixes a null pointer dereferencing that happens
when the device string passed to the iterator is NULL. This
situation can happen when iterating on a class type.
For example:
Maxime Coquelin [Fri, 26 Jun 2020 14:04:28 +0000 (16:04 +0200)]
bus/dpaa: fix iterating on a class type
This patches fixes a null pointer dereferencing that happens
when the device string passed to the iterator is NULL. This
situation can happen when iterating on a class type.
For example:
David Marchand [Tue, 16 Jun 2020 09:47:00 +0000 (11:47 +0200)]
net/mvpp2: fix non-EAL thread support
Caught by code inspection, for a non-EAL thread identified with
rte_lcore_id() == LCORE_ID_ANY, the code currently arbitrarily uses
lcore 0 while there is no guarantee this lcore is used.
After enabling promiscuous mode all packets whose destination MAC
address is a multicast address were being dropped. This fix configures
H/W to receive all traffic in promiscuous mode. Promiscuous mode also
overrides allmulticast mode on/off status.
Fixes: 40e9f6fc1558 ("net/qede: enable VF-VF traffic with unmatched dest address") Cc: stable@dpdk.org Signed-off-by: Devendra Singh Rawat <dsinghrawat@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Rasesh Mody <rmody@marvell.com>
net/mlx5: fix host physical function representor naming
The new kernel adds the names like "pf0" for Host PCI physical
function representor on Bluefield SmartNIC hosts. This patch
provides correct HPF representor recognition over the kernel
versions 5.7 and laters.
The following port naming formats are supported:
- missing physical port name (no sysfs/netlink key) at all,
master is assumed
- decimal digits (for example "12"), representor is
assumed, the value is the index of attached VF
- "p" followed by decimal digits, for example "p2", master
is assumed
- "pf" followed by PF index, for example "pf0", Host PF
representor is assumed on SmartNIC systems.
- "pf" followed by PF index concatenated with "vf" followed by
VF index, for example "pf0vf1", representor is assumed.
If index of VF is "-1" it is a special case of Host PF
representor, this representor must be indexed in devargs
as 65535, for example representor=[0-3,65535] will
allow representors for VF0, VF1, VF2, VF3 and for host PF.
Fixes: 79aa430721b1 ("common/mlx5: split common file under Linux directory") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
Junyu Jiang [Wed, 24 Jun 2020 02:09:39 +0000 (02:09 +0000)]
net/ice: initialize and update RSS based on user config
Initialize and update RSS configure based on user request
(rte_eth_rss_conf) from dev_configure and .rss_hash_update ops.
All previous default configure has been removed.
Ophir Munk [Fri, 19 Jun 2020 07:30:08 +0000 (07:30 +0000)]
common/mlx5: move some getter functions from net driver
Getter functions such as: 'mlx5_os_get_ctx_device_name',
'mlx5_os_get_ctx_device_path', 'mlx5_os_get_dev_device_name',
'mlx5_os_get_umem_id' are implemented under net directory. To enable
additional devices (e.g. regex, vdpa) to access these getter functions
they are moved under common directory.
As part of this commit string sizes DEV_SYSFS_NAME_MAX and
DEV_SYSFS_PATH_MAX are increased by 1 to make sure that the destination
string size in strncpy() function is bigger than the source string size.
This update will avoid GCC version 8 error -Werror=stringop-truncation.
Suanming Mou [Thu, 18 Jun 2020 08:12:50 +0000 (16:12 +0800)]
net/mlx5: optimize free counter lookup
Currently, when allocate a new counter, it needs loop the whole
container pool list to get a free counter.
In the case with millions of counters allocated, and all the pools
are empty, allocate the new counter will still need to loop the
whole container pool list first, then allocate a new pool to get a
free counter. It wastes the cycles during the pool list traversal.
Add a global free counter list in the container helps to get the free
counters more efficiently.
Suanming Mou [Thu, 18 Jun 2020 07:24:44 +0000 (15:24 +0800)]
net/mlx5: optimize single counter pool search
For single counter, when allocate a new counter, it needs to find the pool
it belongs in order to do the query together.
Once there are millions of counters allocated, the pool array in the
counter container will become very large. In this case, the pool search
from the pool array will become extremely slow.
Save the minimum and maximum counter ID to have a quick check of current
counter ID range. And start searching the pool from the last pool in the
container will mostly get the needed pool since counter ID increases
sequentially.
Suanming Mou [Thu, 18 Jun 2020 07:24:43 +0000 (15:24 +0800)]
net/mlx5: manage shared counters in three-level table
Currently, to check if any shared counter with same ID existing, it will
have to loop the counter pools to search for the counter. Even add the
counter to the list will also not so helpful while there are thousands
of shared counters in the list.
Change Three-Level table to look up the counter index saved in the
relevant table entry will be more efficient.
This patch introduces the Three-level table to save the ID relevant
counter index in the table. Then the next while the same ID comes, just
check the table entry of this ID will get the counter index directly.
No search will be needed.
Suanming Mou [Thu, 18 Jun 2020 07:24:42 +0000 (15:24 +0800)]
net/mlx5: add three-level table utility
For the case which data is linked with sequence increased index, the
array table will be more efficient than hash table once need to search
one data entry in large numbers of entries. Since the traditional hash
tables has fixed table size, when huge numbers of data saved to the hash
table, it also comes lots of hash conflict.
But simple array table also has fixed size, allocates all the needed
memory at once will waste lots of memory. For the case don't know the
exactly number of entries will be impossible to allocate the array.
Then the multiple level table helps to balance the two disadvantages.
Allocate a global high level table with sub table entries at first,
the global table contains the sub table entries, and the sub table will
be allocated only once the corresponding index entry need to be saved.
e.g. for up to 32-bits index, three level table with 10-10-12 splitting,
with sequence increased index, the memory grows with every 4K entries.
The currently implementation introduces 10-10-12 32-bits splitting
Three-Level table to help the cases which have millions of entries to
save. The index entries can be addressed directly by the index, no
search will be needed.
Matan Azrad [Thu, 18 Jun 2020 18:59:42 +0000 (18:59 +0000)]
common/mlx5: support DevX virtq stats operations
Add DevX API to create and query virtio queue statistics
from the HW. The next counters are supported by the HW per
virtio queue:
received_desc.
completed_desc.
error_cqes.
bad_desc_errors.
exceed_max_chain.
invalid_buffer.
Matan Azrad [Thu, 18 Jun 2020 18:59:41 +0000 (18:59 +0000)]
vhost: introduce operation to get vDPA queue stats
The vDPA device offloads all the datapath of the vhost
device to the HW device.
In order to expose to the user traffic information this
patch introduces new 3 APIs to get traffic statistics, the
device statistics name and to reset the statistics per
virtio queue.
The statistics are taken directly from the vDPA driver
managing the HW device and can be different for each vendor
driver.
Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Maxime Coquelin [Thu, 28 May 2020 09:03:47 +0000 (11:03 +0200)]
vhost: enable reply-ack systematically
As announced during v20.05 release cycle, this
patch makes reply-ack protocol feature to be enabled
unconditionally.
This protocol feature makes the communication between the
master and the slave more robust, avoiding for example
possible undefined behaviour with VHOST_USER_SET_MEM_TABLE.
Also, reply-ack support will be required for upcoming
VHOST_USER_SET_STATUS request.
Note that this protocol feature was disabled by default
because Qemu version 2.7.0 to 2.9.0 had a bug causing a
deadlock when reply-ack was negotiated and multiqueue
enabled. These Qemu version are now very old and no more
maintained, so we can reasonably consider we no more
support them.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>