William Tu [Thu, 5 Aug 2021 17:48:19 +0000 (17:48 +0000)]
eal/windows: export version function
When OVS inits, it calls rte_version to get the DPDK's version.
The patch fixes the error below by exposing rte_version symbol.
libopenvswitch.a(dpdk.c.obj) : error LNK2019: unresolved external symbol
rte_version referenced in function dpdk_init
Fixes: 5b637a848195 ("eal: fix querying DPDK version at runtime") Cc: stable@dpdk.org Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
- Enable IAVF PMD build on Windows
- Replace x86intrin.h with rte_vect.h to avoid __m_prefetchw conflicting
types
- Fix for pointer and integer sign warnings using Clang compiler on
Windows
- Add extra cflags '-fno-asynchronous-unwind-tables'
to avoid MinGW build error:
Error: invalid register for .seh_savexmm
Based on the rte_eth_dev_socket_id() documentation,
set the default numa_node to -1. When the API is unsuccessful,
set numa_node to 0.
This change more correctly resembles the Linux code.
Fixes: bf7cf1f947bd ("bus/pci: fix unknown NUMA node value on Windows") Cc: stable@dpdk.org Reported-by: Vipin Varghese <vipin.varghese@intel.com> Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com> Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Fixes: e1a00536c8ed ("kvargs: add a new library to parse key/value arguments") Fixes: 3ab385063cb9 ("kvargs: add get by key") Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>
A quite common scenario with kvargs is to lookup for a <key>=<value> in
a kvlist. For instance, check if name=foo is present in
name=toto,name=foo,name=bar. This is currently done in drivers/bus with
rte_kvargs_process() + the rte_kvargs_strcmp() handler.
This approach is not straightforward, and can be replaced by this new
function.
rte_kvargs_strcmp() is then removed.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>
The function rte_kvargs_get() is used by eal and pci bus driver since
its introduction in commit 3ab385063cb9 ("kvargs: add get by key") and
commit d2a66ad79480 ("bus: add device arguments name parsing"), in
dpdk 21.05.
Let's promote it as stable.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>
net/af_packet: remove timestamp from packet status
We should eliminate the timestamp status from the packet
status. This should only matter if timestamping is enabled
on the socket, but we might hit a kernel bug, which is fixed
in newer releases.
For interfaces of type 'veth', the sent skb is forwarded
to the peer and back into the network stack which timestamps
it on the RX path if timestamping is enabled globally
(which happens if any socket enables timestamping).
When the skb is destructed, tpacket_destruct_skb() is called
and it calls __packet_set_timestamp() which doesn't check
the flags on the socket and returns the timestamp if it is
set in the skb (and for veth it is, as mentioned above).
See the following kernel commit for reference [1]:
net: packetmmap: fix only tx timestamp on request
The packetmmap tx ring should only return timestamps if requested
via setsockopt PACKET_TIMESTAMP, as documented. This allows
compatibility with non-timestamp aware user-space code which checks
tp_status == TP_STATUS_AVAILABLE; not expecting additional timestamp
flags to be set in tp_status.
Junxiao Shi [Thu, 9 Sep 2021 14:42:06 +0000 (14:42 +0000)]
net/memif: fix chained mbuf determination
Previously, TX functions call rte_pktmbuf_is_contiguous to determine
whether an mbuf is chained. However, rte_pktmbuf_is_contiguous is
designed to work on the first mbuf of a packet only. In case a packet
contains three or more segment mbufs in a chain, it may cause truncated
packets or rte_mbuf_sanity_check panics.
This patch updates TX functions to determine chained mbufs using
mbuf_head->nb_segs field, which works in all cases. Moreover, it
maintains that the second cacheline is only accessed when chained mbuf
is actually present.
Fixes: 09c7e63a71f9 ("net/memif: introduce memory interface PMD") Fixes: 43b815d88188 ("net/memif: support zero-copy slave") Cc: stable@dpdk.org Signed-off-by: Junxiao Shi <git@mail1.yoursunny.com> Reviewed-by: Jakub Grajciar <jgrajcia@cisco.com>
Thomas Monjalon [Mon, 30 Aug 2021 10:42:32 +0000 (12:42 +0200)]
ethdev: group constant definitions in Doxygen
A lot of flags are parts of a group but are documented alone.
The Doxygen syntax @{ and @} for grouping is used
to make flags appear together and have a common description.
Some Rx/Tx offload flags and RSS definitions are not grouped
because they need to be all properly documented first.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Kevin Traynor <ktraynor@redhat.com>
Shared RSS resources were released before checking that the shared RSS
has no more references. If it had, the destruction was aborted, leaving
the shared RSS in an invalid state where it could no longer be used.
Move reference counter check before resource release.
Fixes: d2046c09aa64 ("net/mlx5: support shared action for RSS") Cc: stable@dpdk.org Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
When an indirect action is used in a flow rule with a pattern that
causes RSS expansion, each device flow generated by the expansion
incremented the reference counter of the action. When such a flow was
destroyed, its action reference counter had been decremented only once.
The action remained marked as being used and could not be destroyed.
COUNT, AGE, and CONNTRACK indirect actions have been affected
(for AGE the error was not immediately observable).
Increment action counter only once for the original flow rule.
Fixes: 81073e1f8ce1 ("net/mlx5: support shared age action") Fixes: 2d084f69aa26 ("net/mlx5: add translation of connection tracking action") Fixes: f3191849f2c2 ("net/mlx5: support flow count action handle") Cc: stable@dpdk.org Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
net/mlx5: report error on indirect CT action destroy
When an indirect CT action of mlx5 PMD could not be destroyed,
rte_action_handle_destroy() was returning (-1), but the error
structure was not filled. This lead to a segfault in testpmd
on an attempt to print it. Fill the details for each possible
cause of this error.
Michael Baum [Sun, 12 Sep 2021 10:36:28 +0000 (13:36 +0300)]
common/mlx5: fix resource cleaning in device removal
The common remove function call in a loop to remove function for each
driver which have been registered.
If all removes are succeeded, it return 0 without to free the device
which allocated in probe function. Otherwise, it free the device.
In fact we expect exactly the opposite behavior. If all removes are
failed, it returns error without freeing the device which allocated in
probe function. Otherwise, it free the device and return 0.
Replace it with the correct behavior.
Fixes: 8a41f4deccc3 ("common/mlx5: introduce layer for multiple class drivers") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Sun, 12 Sep 2021 10:36:27 +0000 (13:36 +0300)]
common/mlx5: fix device list operations concurrency
The mlx5 common driver has a global list of mlx5 devices which are
probed.
In probe function it creates one and insert it to the list. Similarly it
removes the device in remove function.
These operations are not safe as there can be such operations in
parallel, by different threads.
Add global lock for the list and use it to insert or remove.
Fixes: 8a41f4deccc3 ("common/mlx5: introduce layer for multiple class drivers") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Sun, 12 Sep 2021 10:36:26 +0000 (13:36 +0300)]
common/mlx5: fix class combination validation
The common probe function gets as a user argument the classes it should
create, and checks whether the combination is valid.
In case the device already exists, it checks the integration of the
above with the classes that the device has.
However, the function does not check the combination when the device
does not exist and it has to create it.
Michael Baum [Sun, 12 Sep 2021 10:36:25 +0000 (13:36 +0300)]
net/mlx5: fix duplicate pattern option default
In order to allow/disallow configuring rules with identical patterns,
the new device argument 'allow_duplicate_pattern' was introduced.
The default is to allow, and it is initialized to 1 in PCI probe
function.
However, on auxiliary bus probing (for Sub-Function) it is not
initialized at all, so it's actually initialized to 0.
Move the initialization to default config function which is called from
both.
Fixes: 919488fbfa71 ("net/mlx5: support Sub-Function") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
Michael Baum [Sun, 12 Sep 2021 10:36:24 +0000 (13:36 +0300)]
net/mlx5: fix PF leak on PCI probing failure
During PCI probe, the internal probe function is called per PF.
If one of them fails, it was missing a proper destroy for the previously
probed PFs.
This fixes the behavior by destroying all previously probed PFs.
Fixes: 08c2772fc747 ("net/mlx5: support list of representor PF") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
1. Added support to specify l4 port masks in the template. Also enabled
source mac in the wild card key for ingress flows.
2. Added support to enable offload for ipv6 traffic within the vxlan
tunnel connection.
3. The flow counters is reduced from 7168 to 6912 for Whitney.
The stats operation is updated to reflect counts for packets
at egress from CFA instead of ingress to CFA
4. The miss path for the l2 context table is updated with correct
parif and default action handler to handle the miss path for
egress flows.
5. This support enables allocation of encapsulation, modification and
action records dynamically based on a given flow actions.
6. Reduce the l2context resource requests during open_session. Move the
SMAC from the L2Context to the EM/WM
7. Remap the parif in the bd action in order to eliminate incorrect
replication of broadcast packets. The layer 4 source port mask
was incorrectly updated in the outer layer 4 source port mask
instead of inner layer 4. Add the l3 proto to egress rules, switch
to using computed fields for l4 ports, add internal smac to f1/f2
flows, add l3 proto to ingress ipv6 flows
Signed-off-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com> Signed-off-by: Mike Baucom <michael.baucom@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Farah Smith [Mon, 20 Sep 2021 07:42:13 +0000 (13:12 +0530)]
net/bnxt: add SRAM manager shared session
Fix shared session support issues due to SRAM manager
additions. Shared session does not support slices within
RM blocks. Calculate resources required without slices
and determine base addresses using old methods for the
shared session.
Randy Schacher [Mon, 20 Sep 2021 07:42:12 +0000 (13:12 +0530)]
net/bnxt: allocate space dynamically for EM defrag
The dynamic pool allocation defrag function currently uses stack
allocation. To improve use of stack space, dynamically allocate
and deallocate memory for use to defragment the dynamic pool of
EM resources.
Signed-off-by: Randy Schacher <stuart.schacher@broadcom.com> Reviewed-by: Peter Spreadborough <peter.spreadborough@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
1. Add support for egress flows with port and count action for
Thor platform.
2. Added templates to support VXLAN encapsulation feature for Thor.
3. Added support for VXLAN decap and VLAN pop actions along with
the ingress flow.
4. Added templates to enable VXLAN decap support for f1 and f2 flows.
5. Added templates Thor VF Rep support
6. Added Thor ingress mod table actions for NAT, NAPT, and TTL.
7. Added mirror/sample table support
8. Added supported for IPv6 flows for Thor.
The encapsulation record processing is enhanced to handle data
dynamically. Different combinations of VXLAN encapsulation using
no VLAN or single or double VLAN can be supported and also supports
both IPv4 and IPv6 versions.
Add support for tunnel offload APIs. Specifically the following
are supported.
tunnel_decap_set, tunnel_match, tunnel_action_decap_release,
tunnel_item_release.
This provides support for VXLAN decap action where two flows
can indicate tunnel offload rule. The first flow indicates the
tunnel properties and second flow indicates the inner packet
structure. The templates are updated to support this
feature.
Template adds non-VFR based support for testpmd with:
matches to include
- DMAC, SIP, DIP, Proto, Sport, Dport
- SIP, DIP, Proto, Sport, Dport
actions:
- count, drop
Farah Smith [Mon, 20 Sep 2021 07:42:05 +0000 (13:12 +0530)]
net/bnxt: add SRAM manager model
The SRAM manager supports allocation and free of variable sized
records within SRAM memory. These record sizes are 8, 16, 32, or
64B. The SRAM manager algorithm will not fragment memory during
run time. Previous implementation only included fixed size 64B
records regardless of the size required.
Signed-off-by: Farah Smith <farah.smith@broadcom.com> Reviewed-by: Shahaji Bhosle <sbhosle@broadcom.com> Reviewed-by: Peter Spreadborough <peter.spreadborough@broadcom.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
common/cnxk: align NPA stack to ROC cache line size
Network Pool accelerator (NPA) is part of ROC (Rest Of Chip). So
NPA structures should be aligned to ROC Cache line size and not
CPU cache line size.
Non alignment of NPA stack to ROC cache line will result in
undefined runtime NPA behaviour.
Fixes: f765f5611240 ("common/cnxk: add NPA pool HW operations") Cc: stable@dpdk.org Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Issue has been observed where fields of indirect buffers are
accessed after being set free by the diver. Also fixing freeing
of direct buffers to correct aura.
Fixes: 5cbe184802aa ("net/octeontx: support fast mbuf free") Cc: stable@dpdk.org Signed-off-by: David George <david.george@sophos.com> Signed-off-by: Harman Kalra <hkalra@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Avoid using stashing option of stype in NPA in cn10k-a0 stepping.
This is a workaround for a HW Errata due to which NPA stashing operations
will never result in writing the data into L2 cache. But instead, it will
be written into LLC.
Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Made following updates to ROC (Rest of Chip) models.
- Use consistent upper/lower case in macros defining different
ROC models.
- Add API to detect cn96 Cx stepping.
- Make all current cn10k models as A0 stepping.
Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
Robin Zhang [Wed, 25 Aug 2021 08:34:35 +0000 (08:34 +0000)]
net/iavf: enable interrupt polling
For VF hosted by Intel 700 series NICs, internal Rx interrupt and adminq
interrupt share the same source, that cause a lot CPU cycles be wasted on
interrupt handler on Rx path.
The patch disable PCI interrupt and remove the interrupt handler, replace
it with a low frequency(50ms) interrupt polling daemon which is
implemented by registering an alarm callback periodically.
The virtual channel capability bit VIRTCHNL_VF_OFFLOAD_WB_ON_ITR can be
used to negotiate if iavf PMD needs to enable background alarm or not, so
ideally this change will not impact the case hosted by Intel 800 series
NICS.
This patch implements the same logic with an early i40e commit:
commit 864a800d706d ("net/i40e: remove VF interrupt handler")
net/iavf: remove support for IP fragment default RSS
To support independent IP fragment default RSS, considerable
additional work need to be done, so we decide to remove this
feature to avoid some unexpected behavior we have observed,
meanwhile user always can use rte_flow to create RSS for IP
fragment packet explicitly.
net/ice: remove support for IP fragment default RSS
To support independent IP fragment default RSS, considerable
additional work need to be done, so we decide to remove this
feature to avoid some unexpected behavior we have observed,
meanwhile user always can use rte_flow to create RSS for IP
fragment packet explicitly.
In the iavf_dev_rx_queue_start function, if the iavf_switch_queue
or iavf_switch_queue_lv function fails, the previously applied mbuf
is not released, resulting in leakage. The patch fixes the problem.
Fixes: 9cf9c02bf6ee ("net/iavf: add enable/disable queues for large VF") Cc: stable@dpdk.org Signed-off-by: Qiming Chen <chenqiming_huawei@163.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Simei Su [Wed, 15 Sep 2021 05:34:22 +0000 (13:34 +0800)]
net/ice: support 1PPS
The E810 supports four single-ended GPIO signals (SDP[20:23]). The 1PPS
signal outputs via SDP[20:23], which is measured by an oscilloscope.
This feature can be turned by a devargs which can select GPIO pin index
flexibly. Pin index 0 means SDP20, pin index 1 means SDP21 and so on.
The example for test command is as below:
./build/app/dpdk-testpmd -a af:00.0,pps_out='[pin:2]' -c f -n 4 -- -i
Signed-off-by: Simei Su <simei.su@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>
net/ice/base: add API for parser profile initialization
Add API ice_parser_profile_init to init a parser profile base on
a parser result and a mask buffer. The ice_parser_profile can feed to
low level FXP engine to create HW profile / field vector directly.
UDP tunnel can be added/deleted for vxlan, geneve, ecpri through
below APIs:
ice_parser_vxlan_tunnel_set
ice_parser_geneve_tunnel_set
ice_parser_ecpri_tunnel_set
net/ice/base: init parse graph CAM table for parser
Parse DDP section ICE_SID_RXPARSER_CAM or ICE_SID_RXPARSER_PG_SPILL
into an array of struct ice_pg_cam_item.
Parse DDP section ICE_SID_RXPARSER_NOMATCH_CAM or
ICE_SID_RXPARSER_NOMATCH_SPILL into an array of struct ice_pg_nm_cam_item.
Parse DDP section ICE_SID_RXPARSER_IMEM into an array of
struct ice_imem_item.
The Instruction Memory (IMEM) section contains three VLIW instructions
for the ALUs, a key extraction instruction for the Parse Graph CAM, and
several other fields.
net/ice/base: add parser create and destroy skeleton
Add new parser module which can parse a packet in binary
and generate information like ptype, protocol/offset pairs
and flags which can be used to feed the FXP profile creation
directly.
The patch added skeleton of the parser instance create and
destroy APIs:
ice_parser_create
ice_parser_destroy
net/ice/base: add get/set functions for shared parameters
Add functions used by the driver for setting and getting the shared
driver parameters. These will be used by the driver in order to share
the PTP clock index identifier between PF drivers.
net/ice/base: allow tool access to manageability register
E810-T supports signed netlists and to support this, the NVM update
tool needs to be able to read the GL_MNG_DEF_DEVID register. Add
said register to the allowlist in ice_validate_nvm_rw_reg.
Change one of the input parameter (addr) in ice_read_cgu_reg_e822 and
ice_write_cgu_reg_e822 functions. This will avoid the shrink down
conversion from addr to cgu_msg.msg_addr_low.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Junfeng Guo <junfeng.guo@intel.com>
net/ice/base: allow to enable LAN and loopback in switch
Currently shared code API does not allow to set/unset lb_en
and lan_en flags for advanced rules during their creation.
Because of that we have to use a workaround in switchdev
which is to update rule immediately after its creation.
This change will allow us to set/unset those flags right
away.
net/ice/base: use macro instead of open-coded division
For some operating systems, 64-bit division requires using specific
implementations. Use the DIV_64BIT macro to replace open-coded division
so that the driver may convert this to the appropriate operating-system
specific implementation when necessary.
In some devices, the function numbers used are non-contiguous. For
example, some two port devices will report as functions 0 and 2.
When distributing RSS and FDIR masks, which are global resources across
the active devices, it is required to have a contiguous PF id, which can
be described as a logical PF id. In the case above, function 0 would
have a logical PF id of 0, and function 2 would have a logical PF id of
1.
Using logical PF id can properly describe which slice of resources can
be used by a particular PF.
ethdev: add IPv4 and L4 checksum RSS offload types
This patch defines new RSS offload types for IPv4 and
L4(TCP/UDP/SCTP) checksum, which are required when users want
to distribute packets based on the IPv4 or L4 checksum field.
For example "flow create 0 ingress pattern eth / ipv4 / end
actions rss types ipv4-chksum end queues end / end", this flow
causes all matching packets to be distributed to queues on
basis of IPv4 checksum.
security: add option to configure tunnel header verification
Add option to indicate whether outer header verification
need to be done as part of inbound IPsec processing.
With inline IPsec processing, SA lookup would be happening
in the Rx path of rte_ethdev. When rte_flow is configured to
support more than one SA, SPI would be used to lookup SA.
In such cases, additional verification would be required to
ensure duplicate SPIs are not getting processed in the inline path.
For lookaside cases, the same option can be used by application
to offload tunnel verification to the PMD.
These verifications would help in averting possible DoS attacks.
Soft expiry is not a mandatory IPsec feature. It is verified separately
with IPsec unit tests. So configuration of the same is not required.
Also, soft expiry tracking can cause perf degradation with some PMDs.
Since a separate UT is available and the same setting in ipsec-secgw is
not verifying the functionality, remove the same by clearing life
configuration.
Signed-off-by: Anoob Joseph <anoobj@marvell.com> Acked-by: Akhil Goyal <gakhil@marvell.com>
Anoob Joseph [Tue, 28 Sep 2021 10:59:54 +0000 (16:29 +0530)]
security: add SA lifetime configuration
Add SA lifetime configuration to register soft and hard expiry limits.
Expiry can be in units of number of packets or bytes. Crypto op
status is also updated to include new field, aux_flags, which can be
used to indicate cases such as soft expiry in case of lookaside
protocol operations.
In case of soft expiry, the packets are successfully IPsec processed but
the soft expiry would indicate that SA needs to be reconfigured. For
inline protocol capable ethdev, this would result in an eth event while
for lookaside protocol capable cryptodev, this can be communicated via
`rte_crypto_op.aux_flags` field.
In case of hard expiry, the packets will not be IPsec processed and
would result in error.
Signed-off-by: Anoob Joseph <anoobj@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>