From: Yongseok Koh Date: Wed, 9 May 2018 11:13:50 +0000 (-0700) Subject: net/mlx5: add Multi-Packet Rx support X-Git-Url: http://git.droids-corp.org/?a=commitdiff_plain;h=7d6bf6b866b8c25ec06539b3eeed1db4f785577c;p=dpdk.git net/mlx5: add Multi-Packet Rx support Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth by posting a single large buffer for multiple packets. Instead of posting a buffer per a packet, one large buffer is posted in order to receive multiple packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides and each stride receives one packet. Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is comparatively small, or PMD attaches the Rx packet to the mbuf by external buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external buffers will be allocated and managed by PMD. Signed-off-by: Yongseok Koh Acked-by: Shahaf Shuler --- diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 13cc3f8314..a7d5c90bcf 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -105,10 +105,14 @@ Limitations - A multi segment packet must have less than 6 segments in case the Tx burst function is set to multi-packet send or Enhanced multi-packet send. Otherwise it must have less than 50 segments. + - Count action for RTE flow is **only supported in Mellanox OFED**. + - Flows with a VXLAN Network Identifier equal (or ends to be equal) to 0 are not supported. + - VXLAN TSO and checksum offloads are not supported on VM. + - VF: flow rules created on VF devices can only match traffic targeted at the configured MAC addresses (see ``rte_eth_dev_mac_addr_add()``). @@ -119,6 +123,13 @@ Limitations the device. In case of ungraceful program termination, some entries may remain present and should be removed manually by other means. +- When Multi-Packet Rx queue is configured (``mprq_en``), a Rx packet can be + externally attached to a user-provided mbuf with having EXT_ATTACHED_MBUF in + ol_flags. As the mempool for the external buffer is managed by PMD, all the + Rx mbufs must be freed before the device is closed. Otherwise, the mempool of + the external buffers will be freed by PMD and the application which still + holds the external buffers may be corrupted. + Statistics ---------- @@ -229,6 +240,53 @@ Run-time configuration - x86_64 with ConnectX-4, ConnectX-4 LX and ConnectX-5. - POWER8 and ARMv8 with ConnectX-4 LX and ConnectX-5. +- ``mprq_en`` parameter [int] + + A nonzero value enables configuring Multi-Packet Rx queues. Rx queue is + configured as Multi-Packet RQ if the total number of Rx queues is + ``rxqs_min_mprq`` or more and Rx scatter isn't configured. Disabled by + default. + + Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth + by posting a single large buffer for multiple packets. Instead of posting a + buffers per a packet, one large buffer is posted in order to receive multiple + packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides + and each stride receives one packet. MPRQ can improve throughput for + small-packet tarffic. + + When MPRQ is enabled, max_rx_pkt_len can be larger than the size of + user-provided mbuf even if DEV_RX_OFFLOAD_SCATTER isn't enabled. PMD will + configure large stride size enough to accommodate max_rx_pkt_len as long as + device allows. Note that this can waste system memory compared to enabling Rx + scatter and multi-segment packet. + +- ``mprq_log_stride_num`` parameter [int] + + Log 2 of the number of strides for Multi-Packet Rx queue. Configuring more + strides can reduce PCIe tarffic further. If configured value is not in the + range of device capability, the default value will be set with a warning + message. The default value is 4 which is 16 strides per a buffer, valid only + if ``mprq_en`` is set. + + The size of Rx queue should be bigger than the number of strides. + +- ``mprq_max_memcpy_len`` parameter [int] + + The maximum length of packet to memcpy in case of Multi-Packet Rx queue. Rx + packet is mem-copied to a user-provided mbuf if the size of Rx packet is less + than or equal to this parameter. Otherwise, PMD will attach the Rx packet to + the mbuf by external buffer attachment - ``rte_pktmbuf_attach_extbuf()``. + A mempool for external buffers will be allocated and managed by PMD. If Rx + packet is externally attached, ol_flags field of the mbuf will have + EXT_ATTACHED_MBUF and this flag must be preserved. ``RTE_MBUF_HAS_EXTBUF()`` + checks the flag. The default value is 128, valid only if ``mprq_en`` is set. + +- ``rxqs_min_mprq`` parameter [int] + + Configure Rx queues as Multi-Packet RQ if the total number of Rx queues is + greater or equal to this value. The default value is 12, valid only if + ``mprq_en`` is set. + - ``txq_inline`` parameter [int] Amount of data to be inlined during TX operations. Improves latency. diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 84fc9d533c..8cd2bc04f7 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -46,6 +46,18 @@ /* Device parameter to enable RX completion queue compression. */ #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en" +/* Device parameter to enable Multi-Packet Rx queue. */ +#define MLX5_RX_MPRQ_EN "mprq_en" + +/* Device parameter to configure log 2 of the number of strides for MPRQ. */ +#define MLX5_RX_MPRQ_LOG_STRIDE_NUM "mprq_log_stride_num" + +/* Device parameter to limit the size of memcpy'd packet for MPRQ. */ +#define MLX5_RX_MPRQ_MAX_MEMCPY_LEN "mprq_max_memcpy_len" + +/* Device parameter to set the minimum number of Rx queues to enable MPRQ. */ +#define MLX5_RXQS_MIN_MPRQ "rxqs_min_mprq" + /* Device parameter to configure inline send. */ #define MLX5_TXQ_INLINE "txq_inline" @@ -241,6 +253,7 @@ mlx5_dev_close(struct rte_eth_dev *dev) priv->txqs = NULL; } mlx5_flow_delete_drop_queue(dev); + mlx5_mprq_free_mp(dev); mlx5_mr_release(dev); if (priv->pd != NULL) { assert(priv->ctx != NULL); @@ -444,6 +457,14 @@ mlx5_args_check(const char *key, const char *val, void *opaque) } if (strcmp(MLX5_RXQ_CQE_COMP_EN, key) == 0) { config->cqe_comp = !!tmp; + } else if (strcmp(MLX5_RX_MPRQ_EN, key) == 0) { + config->mprq.enabled = !!tmp; + } else if (strcmp(MLX5_RX_MPRQ_LOG_STRIDE_NUM, key) == 0) { + config->mprq.stride_num_n = tmp; + } else if (strcmp(MLX5_RX_MPRQ_MAX_MEMCPY_LEN, key) == 0) { + config->mprq.max_memcpy_len = tmp; + } else if (strcmp(MLX5_RXQS_MIN_MPRQ, key) == 0) { + config->mprq.min_rxqs_num = tmp; } else if (strcmp(MLX5_TXQ_INLINE, key) == 0) { config->txq_inline = tmp; } else if (strcmp(MLX5_TXQS_MIN_INLINE, key) == 0) { @@ -486,6 +507,10 @@ mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs) { const char **params = (const char *[]){ MLX5_RXQ_CQE_COMP_EN, + MLX5_RX_MPRQ_EN, + MLX5_RX_MPRQ_LOG_STRIDE_NUM, + MLX5_RX_MPRQ_MAX_MEMCPY_LEN, + MLX5_RXQS_MIN_MPRQ, MLX5_TXQ_INLINE, MLX5_TXQS_MIN_INLINE, MLX5_TXQ_MPW_EN, @@ -667,6 +692,11 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, unsigned int tunnel_en = 0; unsigned int swp = 0; unsigned int verb_priorities = 0; + unsigned int mprq = 0; + unsigned int mprq_min_stride_size_n = 0; + unsigned int mprq_max_stride_size_n = 0; + unsigned int mprq_min_stride_num_n = 0; + unsigned int mprq_max_stride_num_n = 0; int idx; int i; struct mlx5dv_context attrs_out = {0}; @@ -753,6 +783,9 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, */ #ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT attrs_out.comp_mask |= MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS; +#endif +#ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT + attrs_out.comp_mask |= MLX5DV_CONTEXT_MASK_STRIDING_RQ; #endif mlx5_glue->dv_query_device(attr_ctx, &attrs_out); if (attrs_out.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) { @@ -771,6 +804,33 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, if (attrs_out.comp_mask & MLX5DV_CONTEXT_MASK_SWP) swp = attrs_out.sw_parsing_caps.sw_parsing_offloads; DRV_LOG(DEBUG, "SWP support: %u", swp); +#endif +#ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT + if (attrs_out.comp_mask & MLX5DV_CONTEXT_MASK_STRIDING_RQ) { + struct mlx5dv_striding_rq_caps mprq_caps = + attrs_out.striding_rq_caps; + + DRV_LOG(DEBUG, "\tmin_single_stride_log_num_of_bytes: %d", + mprq_caps.min_single_stride_log_num_of_bytes); + DRV_LOG(DEBUG, "\tmax_single_stride_log_num_of_bytes: %d", + mprq_caps.max_single_stride_log_num_of_bytes); + DRV_LOG(DEBUG, "\tmin_single_wqe_log_num_of_strides: %d", + mprq_caps.min_single_wqe_log_num_of_strides); + DRV_LOG(DEBUG, "\tmax_single_wqe_log_num_of_strides: %d", + mprq_caps.max_single_wqe_log_num_of_strides); + DRV_LOG(DEBUG, "\tsupported_qpts: %d", + mprq_caps.supported_qpts); + DRV_LOG(DEBUG, "device supports Multi-Packet RQ"); + mprq = 1; + mprq_min_stride_size_n = + mprq_caps.min_single_stride_log_num_of_bytes; + mprq_max_stride_size_n = + mprq_caps.max_single_stride_log_num_of_bytes; + mprq_min_stride_num_n = + mprq_caps.min_single_wqe_log_num_of_strides; + mprq_max_stride_num_n = + mprq_caps.max_single_wqe_log_num_of_strides; + } #endif if (RTE_CACHE_LINE_SIZE == 128 && !(attrs_out.flags & MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP)) @@ -821,6 +881,13 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, .inline_max_packet_sz = MLX5_ARG_UNSET, .vf_nl_en = 1, .swp = !!swp, + .mprq = { + .enabled = 0, /* Disabled by default. */ + .stride_num_n = RTE_MAX(MLX5_MPRQ_STRIDE_NUM_N, + mprq_min_stride_num_n), + .max_memcpy_len = MLX5_MPRQ_MEMCPY_DEFAULT_LEN, + .min_rxqs_num = MLX5_MPRQ_MIN_RXQS, + }, }; len = snprintf(name, sizeof(name), PCI_PRI_FMT, @@ -986,6 +1053,22 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, DRV_LOG(WARNING, "Rx CQE compression isn't supported"); config.cqe_comp = 0; } + config.mprq.enabled = config.mprq.enabled && mprq; + if (config.mprq.enabled) { + if (config.mprq.stride_num_n > mprq_max_stride_num_n || + config.mprq.stride_num_n < mprq_min_stride_num_n) { + config.mprq.stride_num_n = + RTE_MAX(MLX5_MPRQ_STRIDE_NUM_N, + mprq_min_stride_num_n); + DRV_LOG(WARNING, + "the number of strides" + " for Multi-Packet RQ is out of range," + " setting default value (%u)", + 1 << config.mprq.stride_num_n); + } + config.mprq.min_stride_size_n = mprq_min_stride_size_n; + config.mprq.max_stride_size_n = mprq_max_stride_size_n; + } eth_dev = rte_eth_dev_allocate(name); if (eth_dev == NULL) { DRV_LOG(ERR, "can not allocate rte ethdev"); diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index b2ed99087b..c4c962b92d 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -102,6 +102,16 @@ struct mlx5_dev_config { unsigned int l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */ unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */ unsigned int swp:1; /* Tx generic tunnel checksum and TSO offload. */ + struct { + unsigned int enabled:1; /* Whether MPRQ is enabled. */ + unsigned int stride_num_n; /* Number of strides. */ + unsigned int min_stride_size_n; /* Min size of a stride. */ + unsigned int max_stride_size_n; /* Max size of a stride. */ + unsigned int max_memcpy_len; + /* Maximum packet size to memcpy Rx packets. */ + unsigned int min_rxqs_num; + /* Rx queue count threshold to enable MPRQ. */ + } mprq; /* Configurations for Multi-Packet RQ. */ unsigned int max_verbs_prio; /* Number of Verb flow priorities. */ unsigned int tso_max_payload_sz; /* Maximum TCP payload for TSO. */ unsigned int ind_table_max_size; /* Maximum indirection table size. */ @@ -154,6 +164,7 @@ struct priv { unsigned int txqs_n; /* TX queues array size. */ struct mlx5_rxq_data *(*rxqs)[]; /* RX queues. */ struct mlx5_txq_data *(*txqs)[]; /* TX queues. */ + struct rte_mempool *mprq_mp; /* Mempool for Multi-Packet RQ. */ struct rte_eth_rss_conf rss_conf; /* RSS configuration. */ struct rte_intr_handle intr_handle; /* Interrupt handler. */ unsigned int (*reta_idx)[]; /* RETA index table. */ diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index 72e80af261..51124cdcc5 100644 --- a/drivers/net/mlx5/mlx5_defs.h +++ b/drivers/net/mlx5/mlx5_defs.h @@ -95,4 +95,22 @@ */ #define MLX5_UAR_OFFSET (1ULL << 32) +/* Log 2 of the default number of strides per WQE for Multi-Packet RQ. */ +#define MLX5_MPRQ_STRIDE_NUM_N 4U + +/* Two-byte shift is disabled for Multi-Packet RQ. */ +#define MLX5_MPRQ_TWO_BYTE_SHIFT 0 + +/* + * Minimum size of packet to be memcpy'd instead of being attached as an + * external buffer. + */ +#define MLX5_MPRQ_MEMCPY_DEFAULT_LEN 128 + +/* Minimum number Rx queues to enable Multi-Packet RQ. */ +#define MLX5_MPRQ_MIN_RXQS 12 + +/* Cache size of mempool for Multi-Packet RQ. */ +#define MLX5_MPRQ_MP_CACHE_SZ 32 + #endif /* RTE_PMD_MLX5_DEFS_H_ */ diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 898b2e5e5b..c52ec6fb70 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -525,6 +525,7 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev) }; if (dev->rx_pkt_burst == mlx5_rx_burst || + dev->rx_pkt_burst == mlx5_rx_burst_mprq || dev->rx_pkt_burst == mlx5_rx_burst_vec) return ptypes; return NULL; @@ -1182,6 +1183,8 @@ mlx5_select_rx_function(struct rte_eth_dev *dev) rx_pkt_burst = mlx5_rx_burst_vec; DRV_LOG(DEBUG, "port %u selected Rx vectorized function", dev->data->port_id); + } else if (mlx5_mprq_enabled(dev)) { + rx_pkt_burst = mlx5_rx_burst_mprq; } return rx_pkt_burst; } diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h index 05a6828981..0cf370cd73 100644 --- a/drivers/net/mlx5/mlx5_prm.h +++ b/drivers/net/mlx5/mlx5_prm.h @@ -219,6 +219,21 @@ struct mlx5_mpw { } data; }; +/* WQE for Multi-Packet RQ. */ +struct mlx5_wqe_mprq { + struct mlx5_wqe_srq_next_seg next_seg; + struct mlx5_wqe_data_seg dseg; +}; + +#define MLX5_MPRQ_LEN_MASK 0x000ffff +#define MLX5_MPRQ_LEN_SHIFT 0 +#define MLX5_MPRQ_STRIDE_NUM_MASK 0x3fff0000 +#define MLX5_MPRQ_STRIDE_NUM_SHIFT 16 +#define MLX5_MPRQ_FILLER_MASK 0x80000000 +#define MLX5_MPRQ_FILLER_SHIFT 31 + +#define MLX5_MPRQ_STRIDE_SHIFT_BYTE 2 + /* CQ element structure - should be equal to the cache line size */ struct mlx5_cqe { #if (RTE_CACHE_LINE_SIZE == 128) diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index 63f07fdc8a..587b22fc21 100644 --- a/drivers/net/mlx5/mlx5_rxq.c +++ b/drivers/net/mlx5/mlx5_rxq.c @@ -55,7 +55,74 @@ uint8_t rss_hash_default_key[] = { const size_t rss_hash_default_key_len = sizeof(rss_hash_default_key); /** - * Allocate RX queue elements. + * Check whether Multi-Packet RQ can be enabled for the device. + * + * @param dev + * Pointer to Ethernet device. + * + * @return + * 1 if supported, negative errno value if not. + */ +inline int +mlx5_check_mprq_support(struct rte_eth_dev *dev) +{ + struct priv *priv = dev->data->dev_private; + + if (priv->config.mprq.enabled && + priv->rxqs_n >= priv->config.mprq.min_rxqs_num) + return 1; + return -ENOTSUP; +} + +/** + * Check whether Multi-Packet RQ is enabled for the Rx queue. + * + * @param rxq + * Pointer to receive queue structure. + * + * @return + * 0 if disabled, otherwise enabled. + */ +inline int +mlx5_rxq_mprq_enabled(struct mlx5_rxq_data *rxq) +{ + return rxq->strd_num_n > 0; +} + +/** + * Check whether Multi-Packet RQ is enabled for the device. + * + * @param dev + * Pointer to Ethernet device. + * + * @return + * 0 if disabled, otherwise enabled. + */ +inline int +mlx5_mprq_enabled(struct rte_eth_dev *dev) +{ + struct priv *priv = dev->data->dev_private; + uint16_t i; + uint16_t n = 0; + + if (mlx5_check_mprq_support(dev) < 0) + return 0; + /* All the configured queues should be enabled. */ + for (i = 0; i < priv->rxqs_n; ++i) { + struct mlx5_rxq_data *rxq = (*priv->rxqs)[i]; + + if (!rxq) + continue; + if (mlx5_rxq_mprq_enabled(rxq)) + ++n; + } + /* Multi-Packet RQ can't be partially configured. */ + assert(n == 0 || n == priv->rxqs_n); + return n == priv->rxqs_n; +} + +/** + * Allocate RX queue elements for Multi-Packet RQ. * * @param rxq_ctrl * Pointer to RX queue structure. @@ -63,8 +130,58 @@ const size_t rss_hash_default_key_len = sizeof(rss_hash_default_key); * @return * 0 on success, a negative errno value otherwise and rte_errno is set. */ -int -rxq_alloc_elts(struct mlx5_rxq_ctrl *rxq_ctrl) +static int +rxq_alloc_elts_mprq(struct mlx5_rxq_ctrl *rxq_ctrl) +{ + struct mlx5_rxq_data *rxq = &rxq_ctrl->rxq; + unsigned int wqe_n = 1 << rxq->elts_n; + unsigned int i; + int err; + + /* Iterate on segments. */ + for (i = 0; i <= wqe_n; ++i) { + struct mlx5_mprq_buf *buf; + + if (rte_mempool_get(rxq->mprq_mp, (void **)&buf) < 0) { + DRV_LOG(ERR, "port %u empty mbuf pool", rxq->port_id); + rte_errno = ENOMEM; + goto error; + } + if (i < wqe_n) + (*rxq->mprq_bufs)[i] = buf; + else + rxq->mprq_repl = buf; + } + DRV_LOG(DEBUG, + "port %u Rx queue %u allocated and configured %u segments", + rxq->port_id, rxq_ctrl->idx, wqe_n); + return 0; +error: + err = rte_errno; /* Save rte_errno before cleanup. */ + wqe_n = i; + for (i = 0; (i != wqe_n); ++i) { + if ((*rxq->elts)[i] != NULL) + rte_mempool_put(rxq->mprq_mp, + (*rxq->mprq_bufs)[i]); + (*rxq->mprq_bufs)[i] = NULL; + } + DRV_LOG(DEBUG, "port %u Rx queue %u failed, freed everything", + rxq->port_id, rxq_ctrl->idx); + rte_errno = err; /* Restore rte_errno. */ + return -rte_errno; +} + +/** + * Allocate RX queue elements for Single-Packet RQ. + * + * @param rxq_ctrl + * Pointer to RX queue structure. + * + * @return + * 0 on success, errno value on failure. + */ +static int +rxq_alloc_elts_sprq(struct mlx5_rxq_ctrl *rxq_ctrl) { const unsigned int sges_n = 1 << rxq_ctrl->rxq.sges_n; unsigned int elts_n = 1 << rxq_ctrl->rxq.elts_n; @@ -140,13 +257,57 @@ error: } /** - * Free RX queue elements. + * Allocate RX queue elements. + * + * @param rxq_ctrl + * Pointer to RX queue structure. + * + * @return + * 0 on success, errno value on failure. + */ +int +rxq_alloc_elts(struct mlx5_rxq_ctrl *rxq_ctrl) +{ + return mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ? + rxq_alloc_elts_mprq(rxq_ctrl) : rxq_alloc_elts_sprq(rxq_ctrl); +} + +/** + * Free RX queue elements for Multi-Packet RQ. * * @param rxq_ctrl * Pointer to RX queue structure. */ static void -rxq_free_elts(struct mlx5_rxq_ctrl *rxq_ctrl) +rxq_free_elts_mprq(struct mlx5_rxq_ctrl *rxq_ctrl) +{ + struct mlx5_rxq_data *rxq = &rxq_ctrl->rxq; + uint16_t i; + + DRV_LOG(DEBUG, "port %u Multi-Packet Rx queue %u freeing WRs", + rxq->port_id, rxq_ctrl->idx); + if (rxq->mprq_bufs == NULL) + return; + assert(mlx5_rxq_check_vec_support(rxq) < 0); + for (i = 0; (i != (1u << rxq->elts_n)); ++i) { + if ((*rxq->mprq_bufs)[i] != NULL) + mlx5_mprq_buf_free((*rxq->mprq_bufs)[i]); + (*rxq->mprq_bufs)[i] = NULL; + } + if (rxq->mprq_repl != NULL) { + mlx5_mprq_buf_free(rxq->mprq_repl); + rxq->mprq_repl = NULL; + } +} + +/** + * Free RX queue elements for Single-Packet RQ. + * + * @param rxq_ctrl + * Pointer to RX queue structure. + */ +static void +rxq_free_elts_sprq(struct mlx5_rxq_ctrl *rxq_ctrl) { struct mlx5_rxq_data *rxq = &rxq_ctrl->rxq; const uint16_t q_n = (1 << rxq->elts_n); @@ -174,6 +335,21 @@ rxq_free_elts(struct mlx5_rxq_ctrl *rxq_ctrl) } } +/** + * Free RX queue elements. + * + * @param rxq_ctrl + * Pointer to RX queue structure. + */ +static void +rxq_free_elts(struct mlx5_rxq_ctrl *rxq_ctrl) +{ + if (mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq)) + rxq_free_elts_mprq(rxq_ctrl); + else + rxq_free_elts_sprq(rxq_ctrl); +} + /** * Clean up a RX queue. * @@ -585,10 +761,16 @@ mlx5_rxq_ibv_new(struct rte_eth_dev *dev, uint16_t idx) struct ibv_cq_init_attr_ex ibv; struct mlx5dv_cq_init_attr mlx5; } cq; - struct ibv_wq_init_attr wq; + struct { + struct ibv_wq_init_attr ibv; +#ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT + struct mlx5dv_wq_init_attr mlx5; +#endif + } wq; struct ibv_cq_ex cq_attr; } attr; - unsigned int cqe_n = (1 << rxq_data->elts_n) - 1; + unsigned int cqe_n; + unsigned int wqe_n = 1 << rxq_data->elts_n; struct mlx5_rxq_ibv *tmpl; struct mlx5dv_cq cq_info; struct mlx5dv_rwq rwq; @@ -596,6 +778,7 @@ mlx5_rxq_ibv_new(struct rte_eth_dev *dev, uint16_t idx) int ret = 0; struct mlx5dv_obj obj; struct mlx5_dev_config *config = &priv->config; + const int mprq_en = mlx5_rxq_mprq_enabled(rxq_data); assert(rxq_data); assert(!rxq_ctrl->ibv); @@ -620,6 +803,10 @@ mlx5_rxq_ibv_new(struct rte_eth_dev *dev, uint16_t idx) goto error; } } + if (mprq_en) + cqe_n = wqe_n * (1 << rxq_data->strd_num_n) - 1; + else + cqe_n = wqe_n - 1; attr.cq.ibv = (struct ibv_cq_init_attr_ex){ .cqe = cqe_n, .channel = tmpl->channel, @@ -657,11 +844,11 @@ mlx5_rxq_ibv_new(struct rte_eth_dev *dev, uint16_t idx) dev->data->port_id, priv->device_attr.orig_attr.max_qp_wr); DRV_LOG(DEBUG, "port %u priv->device_attr.max_sge is %d", dev->data->port_id, priv->device_attr.orig_attr.max_sge); - attr.wq = (struct ibv_wq_init_attr){ + attr.wq.ibv = (struct ibv_wq_init_attr){ .wq_context = NULL, /* Could be useful in the future. */ .wq_type = IBV_WQT_RQ, /* Max number of outstanding WRs. */ - .max_wr = (1 << rxq_data->elts_n) >> rxq_data->sges_n, + .max_wr = wqe_n >> rxq_data->sges_n, /* Max number of scatter/gather elements in a WR. */ .max_sge = 1 << rxq_data->sges_n, .pd = priv->pd, @@ -675,16 +862,35 @@ mlx5_rxq_ibv_new(struct rte_eth_dev *dev, uint16_t idx) }; /* By default, FCS (CRC) is stripped by hardware. */ if (rxq_data->crc_present) { - attr.wq.create_flags |= IBV_WQ_FLAGS_SCATTER_FCS; - attr.wq.comp_mask |= IBV_WQ_INIT_ATTR_FLAGS; + attr.wq.ibv.create_flags |= IBV_WQ_FLAGS_SCATTER_FCS; + attr.wq.ibv.comp_mask |= IBV_WQ_INIT_ATTR_FLAGS; } #ifdef HAVE_IBV_WQ_FLAG_RX_END_PADDING if (config->hw_padding) { - attr.wq.create_flags |= IBV_WQ_FLAG_RX_END_PADDING; - attr.wq.comp_mask |= IBV_WQ_INIT_ATTR_FLAGS; + attr.wq.ibv.create_flags |= IBV_WQ_FLAG_RX_END_PADDING; + attr.wq.ibv.comp_mask |= IBV_WQ_INIT_ATTR_FLAGS; + } +#endif +#ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT + attr.wq.mlx5 = (struct mlx5dv_wq_init_attr){ + .comp_mask = 0, + }; + if (mprq_en) { + struct mlx5dv_striding_rq_init_attr *mprq_attr = + &attr.wq.mlx5.striding_rq_attrs; + + attr.wq.mlx5.comp_mask |= MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ; + *mprq_attr = (struct mlx5dv_striding_rq_init_attr){ + .single_stride_log_num_of_bytes = rxq_data->strd_sz_n, + .single_wqe_log_num_of_strides = rxq_data->strd_num_n, + .two_byte_shift_en = MLX5_MPRQ_TWO_BYTE_SHIFT, + }; } + tmpl->wq = mlx5_glue->dv_create_wq(priv->ctx, &attr.wq.ibv, + &attr.wq.mlx5); +#else + tmpl->wq = mlx5_glue->create_wq(priv->ctx, &attr.wq.ibv); #endif - tmpl->wq = mlx5_glue->create_wq(priv->ctx, &attr.wq); if (tmpl->wq == NULL) { DRV_LOG(ERR, "port %u Rx queue %u WQ creation failure", dev->data->port_id, idx); @@ -695,16 +901,14 @@ mlx5_rxq_ibv_new(struct rte_eth_dev *dev, uint16_t idx) * Make sure number of WRs*SGEs match expectations since a queue * cannot allocate more than "desc" buffers. */ - if (((int)attr.wq.max_wr != - ((1 << rxq_data->elts_n) >> rxq_data->sges_n)) || - ((int)attr.wq.max_sge != (1 << rxq_data->sges_n))) { + if (attr.wq.ibv.max_wr != (wqe_n >> rxq_data->sges_n) || + attr.wq.ibv.max_sge != (1u << rxq_data->sges_n)) { DRV_LOG(ERR, "port %u Rx queue %u requested %u*%u but got %u*%u" " WRs*SGEs", dev->data->port_id, idx, - ((1 << rxq_data->elts_n) >> rxq_data->sges_n), - (1 << rxq_data->sges_n), - attr.wq.max_wr, attr.wq.max_sge); + wqe_n >> rxq_data->sges_n, (1 << rxq_data->sges_n), + attr.wq.ibv.max_wr, attr.wq.ibv.max_sge); rte_errno = EINVAL; goto error; } @@ -739,25 +943,40 @@ mlx5_rxq_ibv_new(struct rte_eth_dev *dev, uint16_t idx) goto error; } /* Fill the rings. */ - rxq_data->wqes = (volatile struct mlx5_wqe_data_seg (*)[]) - (uintptr_t)rwq.buf; - for (i = 0; (i != (unsigned int)(1 << rxq_data->elts_n)); ++i) { - struct rte_mbuf *buf = (*rxq_data->elts)[i]; - volatile struct mlx5_wqe_data_seg *scat = &(*rxq_data->wqes)[i]; - + rxq_data->wqes = rwq.buf; + for (i = 0; (i != wqe_n); ++i) { + volatile struct mlx5_wqe_data_seg *scat; + uintptr_t addr; + uint32_t byte_count; + + if (mprq_en) { + struct mlx5_mprq_buf *buf = (*rxq_data->mprq_bufs)[i]; + + scat = &((volatile struct mlx5_wqe_mprq *) + rxq_data->wqes)[i].dseg; + addr = (uintptr_t)mlx5_mprq_buf_addr(buf); + byte_count = (1 << rxq_data->strd_sz_n) * + (1 << rxq_data->strd_num_n); + } else { + struct rte_mbuf *buf = (*rxq_data->elts)[i]; + + scat = &((volatile struct mlx5_wqe_data_seg *) + rxq_data->wqes)[i]; + addr = rte_pktmbuf_mtod(buf, uintptr_t); + byte_count = DATA_LEN(buf); + } /* scat->addr must be able to store a pointer. */ assert(sizeof(scat->addr) >= sizeof(uintptr_t)); *scat = (struct mlx5_wqe_data_seg){ - .addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, - uintptr_t)), - .byte_count = rte_cpu_to_be_32(DATA_LEN(buf)), - .lkey = mlx5_rx_mb2mr(rxq_data, buf), + .addr = rte_cpu_to_be_64(addr), + .byte_count = rte_cpu_to_be_32(byte_count), + .lkey = mlx5_rx_addr2mr(rxq_data, addr), }; } rxq_data->rq_db = rwq.dbrec; rxq_data->cqe_n = log2above(cq_info.cqe_cnt); rxq_data->cq_ci = 0; - rxq_data->rq_ci = 0; + rxq_data->strd_ci = 0; rxq_data->rq_pi = 0; rxq_data->zip = (struct rxq_zip){ .ai = 0, @@ -768,7 +987,7 @@ mlx5_rxq_ibv_new(struct rte_eth_dev *dev, uint16_t idx) rxq_data->cqn = cq_info.cqn; rxq_data->cq_arm_sn = 0; /* Update doorbell counter. */ - rxq_data->rq_ci = (1 << rxq_data->elts_n) >> rxq_data->sges_n; + rxq_data->rq_ci = wqe_n >> rxq_data->sges_n; rte_wmb(); *rxq_data->rq_db = rte_cpu_to_be_32(rxq_data->rq_ci); DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id, @@ -893,13 +1112,181 @@ mlx5_rxq_ibv_releasable(struct mlx5_rxq_ibv *rxq_ibv) return (rte_atomic32_read(&rxq_ibv->refcnt) == 1); } +/** + * Callback function to initialize mbufs for Multi-Packet RQ. + */ +static inline void +mlx5_mprq_buf_init(struct rte_mempool *mp, void *opaque_arg __rte_unused, + void *_m, unsigned int i __rte_unused) +{ + struct mlx5_mprq_buf *buf = _m; + + memset(_m, 0, sizeof(*buf)); + buf->mp = mp; + rte_atomic16_set(&buf->refcnt, 1); +} + +/** + * Free mempool of Multi-Packet RQ. + * + * @param dev + * Pointer to Ethernet device. + * + * @return + * 0 on success, negative errno value on failure. + */ +int +mlx5_mprq_free_mp(struct rte_eth_dev *dev) +{ + struct priv *priv = dev->data->dev_private; + struct rte_mempool *mp = priv->mprq_mp; + unsigned int i; + + if (mp == NULL) + return 0; + DRV_LOG(DEBUG, "port %u freeing mempool (%s) for Multi-Packet RQ", + dev->data->port_id, mp->name); + /* + * If a buffer in the pool has been externally attached to a mbuf and it + * is still in use by application, destroying the Rx qeueue can spoil + * the packet. It is unlikely to happen but if application dynamically + * creates and destroys with holding Rx packets, this can happen. + * + * TODO: It is unavoidable for now because the mempool for Multi-Packet + * RQ isn't provided by application but managed by PMD. + */ + if (!rte_mempool_full(mp)) { + DRV_LOG(ERR, + "port %u mempool for Multi-Packet RQ is still in use", + dev->data->port_id); + rte_errno = EBUSY; + return -rte_errno; + } + rte_mempool_free(mp); + /* Unset mempool for each Rx queue. */ + for (i = 0; i != priv->rxqs_n; ++i) { + struct mlx5_rxq_data *rxq = (*priv->rxqs)[i]; + + if (rxq == NULL) + continue; + rxq->mprq_mp = NULL; + } + return 0; +} + +/** + * Allocate a mempool for Multi-Packet RQ. All configured Rx queues share the + * mempool. If already allocated, reuse it if there're enough elements. + * Otherwise, resize it. + * + * @param dev + * Pointer to Ethernet device. + * + * @return + * 0 on success, negative errno value on failure. + */ +int +mlx5_mprq_alloc_mp(struct rte_eth_dev *dev) +{ + struct priv *priv = dev->data->dev_private; + struct rte_mempool *mp = priv->mprq_mp; + char name[RTE_MEMPOOL_NAMESIZE]; + unsigned int desc = 0; + unsigned int buf_len; + unsigned int obj_num; + unsigned int obj_size; + unsigned int strd_num_n = 0; + unsigned int strd_sz_n = 0; + unsigned int i; + + if (!mlx5_mprq_enabled(dev)) + return 0; + /* Count the total number of descriptors configured. */ + for (i = 0; i != priv->rxqs_n; ++i) { + struct mlx5_rxq_data *rxq = (*priv->rxqs)[i]; + + if (rxq == NULL) + continue; + desc += 1 << rxq->elts_n; + /* Get the max number of strides. */ + if (strd_num_n < rxq->strd_num_n) + strd_num_n = rxq->strd_num_n; + /* Get the max size of a stride. */ + if (strd_sz_n < rxq->strd_sz_n) + strd_sz_n = rxq->strd_sz_n; + } + assert(strd_num_n && strd_sz_n); + buf_len = (1 << strd_num_n) * (1 << strd_sz_n); + obj_size = buf_len + sizeof(struct mlx5_mprq_buf); + /* + * Received packets can be either memcpy'd or externally referenced. In + * case that the packet is attached to an mbuf as an external buffer, as + * it isn't possible to predict how the buffers will be queued by + * application, there's no option to exactly pre-allocate needed buffers + * in advance but to speculatively prepares enough buffers. + * + * In the data path, if this Mempool is depleted, PMD will try to memcpy + * received packets to buffers provided by application (rxq->mp) until + * this Mempool gets available again. + */ + desc *= 4; + obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * priv->rxqs_n; + /* Check a mempool is already allocated and if it can be resued. */ + if (mp != NULL && mp->elt_size >= obj_size && mp->size >= obj_num) { + DRV_LOG(DEBUG, "port %u mempool %s is being reused", + dev->data->port_id, mp->name); + /* Reuse. */ + goto exit; + } else if (mp != NULL) { + DRV_LOG(DEBUG, "port %u mempool %s should be resized, freeing it", + dev->data->port_id, mp->name); + /* + * If failed to free, which means it may be still in use, no way + * but to keep using the existing one. On buffer underrun, + * packets will be memcpy'd instead of external buffer + * attachment. + */ + if (mlx5_mprq_free_mp(dev)) { + if (mp->elt_size >= obj_size) + goto exit; + else + return -rte_errno; + } + } + snprintf(name, sizeof(name), "%s-mprq", dev->device->name); + mp = rte_mempool_create(name, obj_num, obj_size, MLX5_MPRQ_MP_CACHE_SZ, + 0, NULL, NULL, mlx5_mprq_buf_init, NULL, + dev->device->numa_node, 0); + if (mp == NULL) { + DRV_LOG(ERR, + "port %u failed to allocate a mempool for" + " Multi-Packet RQ, count=%u, size=%u", + dev->data->port_id, obj_num, obj_size); + rte_errno = ENOMEM; + return -rte_errno; + } + priv->mprq_mp = mp; +exit: + /* Set mempool for each Rx queue. */ + for (i = 0; i != priv->rxqs_n; ++i) { + struct mlx5_rxq_data *rxq = (*priv->rxqs)[i]; + + if (rxq == NULL) + continue; + rxq->mprq_mp = mp; + } + DRV_LOG(INFO, "port %u Multi-Packet RQ is configured", + dev->data->port_id); + return 0; +} + /** * Create a DPDK Rx queue. * * @param dev * Pointer to Ethernet device. * @param idx - * TX queue index. + * RX queue index. * @param desc * Number of descriptors to configure in queue. * @param socket @@ -916,15 +1303,17 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, struct priv *priv = dev->data->dev_private; struct mlx5_rxq_ctrl *tmpl; unsigned int mb_len = rte_pktmbuf_data_room_size(mp); + unsigned int mprq_stride_size; struct mlx5_dev_config *config = &priv->config; /* * Always allocate extra slots, even if eventually * the vector Rx will not be used. */ - const uint16_t desc_n = + uint16_t desc_n = desc + config->rx_vec_en * MLX5_VPMD_DESCS_PER_LOOP; uint64_t offloads = conf->offloads | dev->data->dev_conf.rxmode.offloads; + const int mprq_en = mlx5_check_mprq_support(dev) > 0; tmpl = rte_calloc_socket("RXQ", 1, sizeof(*tmpl) + @@ -942,10 +1331,41 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, tmpl->socket = socket; if (dev->data->dev_conf.intr_conf.rxq) tmpl->irq = 1; - /* Enable scattered packets support for this queue if necessary. */ + /* + * This Rx queue can be configured as a Multi-Packet RQ if all of the + * following conditions are met: + * - MPRQ is enabled. + * - The number of descs is more than the number of strides. + * - max_rx_pkt_len plus overhead is less than the max size of a + * stride. + * Otherwise, enable Rx scatter if necessary. + */ assert(mb_len >= RTE_PKTMBUF_HEADROOM); - if (dev->data->dev_conf.rxmode.max_rx_pkt_len <= - (mb_len - RTE_PKTMBUF_HEADROOM)) { + mprq_stride_size = + dev->data->dev_conf.rxmode.max_rx_pkt_len + + sizeof(struct rte_mbuf_ext_shared_info) + + RTE_PKTMBUF_HEADROOM; + if (mprq_en && + desc >= (1U << config->mprq.stride_num_n) && + mprq_stride_size <= (1U << config->mprq.max_stride_size_n)) { + /* TODO: Rx scatter isn't supported yet. */ + tmpl->rxq.sges_n = 0; + /* Trim the number of descs needed. */ + desc >>= config->mprq.stride_num_n; + tmpl->rxq.strd_num_n = config->mprq.stride_num_n; + tmpl->rxq.strd_sz_n = RTE_MAX(log2above(mprq_stride_size), + config->mprq.min_stride_size_n); + tmpl->rxq.strd_shift_en = MLX5_MPRQ_TWO_BYTE_SHIFT; + tmpl->rxq.mprq_max_memcpy_len = + RTE_MIN(mb_len - RTE_PKTMBUF_HEADROOM, + config->mprq.max_memcpy_len); + DRV_LOG(DEBUG, + "port %u Rx queue %u: Multi-Packet RQ is enabled" + " strd_num_n = %u, strd_sz_n = %u", + dev->data->port_id, idx, + tmpl->rxq.strd_num_n, tmpl->rxq.strd_sz_n); + } else if (dev->data->dev_conf.rxmode.max_rx_pkt_len <= + (mb_len - RTE_PKTMBUF_HEADROOM)) { tmpl->rxq.sges_n = 0; } else if (offloads & DEV_RX_OFFLOAD_SCATTER) { unsigned int size = diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index 6cb177ccac..c887d550f2 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -47,6 +47,9 @@ static __rte_always_inline void rxq_cq_to_mbuf(struct mlx5_rxq_data *rxq, struct rte_mbuf *pkt, volatile struct mlx5_cqe *cqe, uint32_t rss_hash_res); +static __rte_always_inline void +mprq_buf_replace(struct mlx5_rxq_data *rxq, uint16_t rq_idx); + uint32_t mlx5_ptype_table[] __rte_cache_aligned = { [0xff] = RTE_PTYPE_ALL_MASK, /* Last entry for errored packet. */ }; @@ -1920,7 +1923,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) while (pkts_n) { unsigned int idx = rq_ci & wqe_cnt; - volatile struct mlx5_wqe_data_seg *wqe = &(*rxq->wqes)[idx]; + volatile struct mlx5_wqe_data_seg *wqe = + &((volatile struct mlx5_wqe_data_seg *)rxq->wqes)[idx]; struct rte_mbuf *rep = (*rxq->elts)[idx]; uint32_t rss_hash_res = 0; @@ -2023,6 +2027,236 @@ skip: return i; } +void +mlx5_mprq_buf_free_cb(void *addr __rte_unused, void *opaque) +{ + struct mlx5_mprq_buf *buf = opaque; + + if (rte_atomic16_read(&buf->refcnt) == 1) { + rte_mempool_put(buf->mp, buf); + } else if (rte_atomic16_add_return(&buf->refcnt, -1) == 0) { + rte_atomic16_set(&buf->refcnt, 1); + rte_mempool_put(buf->mp, buf); + } +} + +void +mlx5_mprq_buf_free(struct mlx5_mprq_buf *buf) +{ + mlx5_mprq_buf_free_cb(NULL, buf); +} + +static inline void +mprq_buf_replace(struct mlx5_rxq_data *rxq, uint16_t rq_idx) +{ + struct mlx5_mprq_buf *rep = rxq->mprq_repl; + volatile struct mlx5_wqe_data_seg *wqe = + &((volatile struct mlx5_wqe_mprq *)rxq->wqes)[rq_idx].dseg; + void *addr; + + assert(rep != NULL); + /* Replace MPRQ buf. */ + (*rxq->mprq_bufs)[rq_idx] = rep; + /* Replace WQE. */ + addr = mlx5_mprq_buf_addr(rep); + wqe->addr = rte_cpu_to_be_64((uintptr_t)addr); + /* If there's only one MR, no need to replace LKey in WQE. */ + if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) > 1)) + wqe->lkey = mlx5_rx_addr2mr(rxq, (uintptr_t)addr); + /* Stash a mbuf for next replacement. */ + if (likely(!rte_mempool_get(rxq->mprq_mp, (void **)&rep))) + rxq->mprq_repl = rep; + else + rxq->mprq_repl = NULL; +} + +/** + * DPDK callback for RX with Multi-Packet RQ support. + * + * @param dpdk_rxq + * Generic pointer to RX queue structure. + * @param[out] pkts + * Array to store received packets. + * @param pkts_n + * Maximum number of packets in array. + * + * @return + * Number of packets successfully received (<= pkts_n). + */ +uint16_t +mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) +{ + struct mlx5_rxq_data *rxq = dpdk_rxq; + const unsigned int strd_n = 1 << rxq->strd_num_n; + const unsigned int strd_sz = 1 << rxq->strd_sz_n; + const unsigned int strd_shift = + MLX5_MPRQ_STRIDE_SHIFT_BYTE * rxq->strd_shift_en; + const unsigned int cq_mask = (1 << rxq->cqe_n) - 1; + const unsigned int wq_mask = (1 << rxq->elts_n) - 1; + volatile struct mlx5_cqe *cqe = &(*rxq->cqes)[rxq->cq_ci & cq_mask]; + unsigned int i = 0; + uint16_t rq_ci = rxq->rq_ci; + uint16_t strd_idx = rxq->strd_ci; + struct mlx5_mprq_buf *buf = (*rxq->mprq_bufs)[rq_ci & wq_mask]; + + while (i < pkts_n) { + struct rte_mbuf *pkt; + void *addr; + int ret; + unsigned int len; + uint16_t consumed_strd; + uint32_t offset; + uint32_t byte_cnt; + uint32_t rss_hash_res = 0; + + if (strd_idx == strd_n) { + /* Replace WQE only if the buffer is still in use. */ + if (rte_atomic16_read(&buf->refcnt) > 1) { + mprq_buf_replace(rxq, rq_ci & wq_mask); + /* Release the old buffer. */ + mlx5_mprq_buf_free(buf); + } else if (unlikely(rxq->mprq_repl == NULL)) { + struct mlx5_mprq_buf *rep; + + /* + * Currently, the MPRQ mempool is out of buffer + * and doing memcpy regardless of the size of Rx + * packet. Retry allocation to get back to + * normal. + */ + if (!rte_mempool_get(rxq->mprq_mp, + (void **)&rep)) + rxq->mprq_repl = rep; + } + /* Advance to the next WQE. */ + strd_idx = 0; + ++rq_ci; + buf = (*rxq->mprq_bufs)[rq_ci & wq_mask]; + } + cqe = &(*rxq->cqes)[rxq->cq_ci & cq_mask]; + ret = mlx5_rx_poll_len(rxq, cqe, cq_mask, &rss_hash_res); + if (!ret) + break; + if (unlikely(ret == -1)) { + /* RX error, packet is likely too large. */ + ++rxq->stats.idropped; + continue; + } + byte_cnt = ret; + consumed_strd = (byte_cnt & MLX5_MPRQ_STRIDE_NUM_MASK) >> + MLX5_MPRQ_STRIDE_NUM_SHIFT; + assert(consumed_strd); + /* Calculate offset before adding up stride index. */ + offset = strd_idx * strd_sz + strd_shift; + strd_idx += consumed_strd; + if (byte_cnt & MLX5_MPRQ_FILLER_MASK) + continue; + /* + * Currently configured to receive a packet per a stride. But if + * MTU is adjusted through kernel interface, device could + * consume multiple strides without raising an error. In this + * case, the packet should be dropped because it is bigger than + * the max_rx_pkt_len. + */ + if (unlikely(consumed_strd > 1)) { + ++rxq->stats.idropped; + continue; + } + pkt = rte_pktmbuf_alloc(rxq->mp); + if (unlikely(pkt == NULL)) { + ++rxq->stats.rx_nombuf; + break; + } + len = (byte_cnt & MLX5_MPRQ_LEN_MASK) >> MLX5_MPRQ_LEN_SHIFT; + assert((int)len >= (rxq->crc_present << 2)); + if (rxq->crc_present) + len -= ETHER_CRC_LEN; + addr = RTE_PTR_ADD(mlx5_mprq_buf_addr(buf), offset); + /* Initialize the offload flag. */ + pkt->ol_flags = 0; + /* + * Memcpy packets to the target mbuf if: + * - The size of packet is smaller than mprq_max_memcpy_len. + * - Out of buffer in the Mempool for Multi-Packet RQ. + */ + if (len <= rxq->mprq_max_memcpy_len || rxq->mprq_repl == NULL) { + /* + * When memcpy'ing packet due to out-of-buffer, the + * packet must be smaller than the target mbuf. + */ + if (unlikely(rte_pktmbuf_tailroom(pkt) < len)) { + rte_pktmbuf_free_seg(pkt); + ++rxq->stats.idropped; + continue; + } + rte_memcpy(rte_pktmbuf_mtod(pkt, void *), addr, len); + } else { + rte_iova_t buf_iova; + struct rte_mbuf_ext_shared_info *shinfo; + uint16_t buf_len = consumed_strd * strd_sz; + + /* Increment the refcnt of the whole chunk. */ + rte_atomic16_add_return(&buf->refcnt, 1); + assert((uint16_t)rte_atomic16_read(&buf->refcnt) <= + strd_n + 1); + addr = RTE_PTR_SUB(addr, RTE_PKTMBUF_HEADROOM); + /* + * MLX5 device doesn't use iova but it is necessary in a + * case where the Rx packet is transmitted via a + * different PMD. + */ + buf_iova = rte_mempool_virt2iova(buf) + + RTE_PTR_DIFF(addr, buf); + shinfo = rte_pktmbuf_ext_shinfo_init_helper(addr, + &buf_len, mlx5_mprq_buf_free_cb, buf); + /* + * EXT_ATTACHED_MBUF will be set to pkt->ol_flags when + * attaching the stride to mbuf and more offload flags + * will be added below by calling rxq_cq_to_mbuf(). + * Other fields will be overwritten. + */ + rte_pktmbuf_attach_extbuf(pkt, addr, buf_iova, buf_len, + shinfo); + rte_pktmbuf_reset_headroom(pkt); + assert(pkt->ol_flags == EXT_ATTACHED_MBUF); + /* + * Prevent potential overflow due to MTU change through + * kernel interface. + */ + if (unlikely(rte_pktmbuf_tailroom(pkt) < len)) { + rte_pktmbuf_free_seg(pkt); + ++rxq->stats.idropped; + continue; + } + } + rxq_cq_to_mbuf(rxq, pkt, cqe, rss_hash_res); + PKT_LEN(pkt) = len; + DATA_LEN(pkt) = len; + PORT(pkt) = rxq->port_id; +#ifdef MLX5_PMD_SOFT_COUNTERS + /* Increment bytes counter. */ + rxq->stats.ibytes += PKT_LEN(pkt); +#endif + /* Return packet. */ + *(pkts++) = pkt; + ++i; + } + /* Update the consumer indexes. */ + rxq->strd_ci = strd_idx; + rte_io_wmb(); + *rxq->cq_db = rte_cpu_to_be_32(rxq->cq_ci); + if (rq_ci != rxq->rq_ci) { + rxq->rq_ci = rq_ci; + rte_io_wmb(); + *rxq->rq_db = rte_cpu_to_be_32(rxq->rq_ci); + } +#ifdef MLX5_PMD_SOFT_COUNTERS + /* Increment packets counter. */ + rxq->stats.ipackets += i; +#endif + return i; +} + /** * Dummy DPDK callback for TX. * diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index 74581cf9b1..a6017f0dd7 100644 --- a/drivers/net/mlx5/mlx5_rxtx.h +++ b/drivers/net/mlx5/mlx5_rxtx.h @@ -64,6 +64,16 @@ struct rxq_zip { uint32_t cqe_cnt; /* Number of CQEs. */ }; +/* Multi-Packet RQ buffer header. */ +struct mlx5_mprq_buf { + struct rte_mempool *mp; + rte_atomic16_t refcnt; /* Atomically accessed refcnt. */ + uint8_t pad[RTE_PKTMBUF_HEADROOM]; /* Headroom for the first packet. */ +} __rte_cache_aligned; + +/* Get pointer to the first stride. */ +#define mlx5_mprq_buf_addr(ptr) ((ptr) + 1) + /* RX queue descriptor. */ struct mlx5_rxq_data { unsigned int csum:1; /* Enable checksum offloading. */ @@ -75,19 +85,30 @@ struct mlx5_rxq_data { unsigned int elts_n:4; /* Log 2 of Mbufs. */ unsigned int rss_hash:1; /* RSS hash result is enabled. */ unsigned int mark:1; /* Marked flow available on the queue. */ - unsigned int :15; /* Remaining bits. */ + unsigned int strd_num_n:5; /* Log 2 of the number of stride. */ + unsigned int strd_sz_n:4; /* Log 2 of stride size. */ + unsigned int strd_shift_en:1; /* Enable 2bytes shift on a stride. */ + unsigned int :6; /* Remaining bits. */ volatile uint32_t *rq_db; volatile uint32_t *cq_db; uint16_t port_id; uint16_t rq_ci; + uint16_t strd_ci; /* Stride index in a WQE for Multi-Packet RQ. */ uint16_t rq_pi; uint16_t cq_ci; struct mlx5_mr_ctrl mr_ctrl; /* MR control descriptor. */ - volatile struct mlx5_wqe_data_seg(*wqes)[]; + uint16_t mprq_max_memcpy_len; /* Maximum size of packet to memcpy. */ + volatile void *wqes; volatile struct mlx5_cqe(*cqes)[]; struct rxq_zip zip; /* Compressed context. */ - struct rte_mbuf *(*elts)[]; + RTE_STD_C11 + union { + struct rte_mbuf *(*elts)[]; + struct mlx5_mprq_buf *(*mprq_bufs)[]; + }; struct rte_mempool *mp; + struct rte_mempool *mprq_mp; /* Mempool for Multi-Packet RQ. */ + struct mlx5_mprq_buf *mprq_repl; /* Stashed mbuf for replenish. */ struct mlx5_rxq_stats stats; uint64_t mbuf_initializer; /* Default rearm_data for vectorized Rx. */ struct rte_mbuf fake_mbuf; /* elts padding for vectorized Rx. */ @@ -206,6 +227,11 @@ struct mlx5_txq_ctrl { extern uint8_t rss_hash_default_key[]; extern const size_t rss_hash_default_key_len; +int mlx5_check_mprq_support(struct rte_eth_dev *dev); +int mlx5_rxq_mprq_enabled(struct mlx5_rxq_data *rxq); +int mlx5_mprq_enabled(struct rte_eth_dev *dev); +int mlx5_mprq_free_mp(struct rte_eth_dev *dev); +int mlx5_mprq_alloc_mp(struct rte_eth_dev *dev); void mlx5_rxq_cleanup(struct mlx5_rxq_ctrl *rxq_ctrl); int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, unsigned int socket, const struct rte_eth_rxconf *conf, @@ -229,6 +255,7 @@ int mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx); int mlx5_rxq_releasable(struct rte_eth_dev *dev, uint16_t idx); int mlx5_rxq_verify(struct rte_eth_dev *dev); int rxq_alloc_elts(struct mlx5_rxq_ctrl *rxq_ctrl); +int rxq_alloc_mprq_buf(struct mlx5_rxq_ctrl *rxq_ctrl); struct mlx5_ind_table_ibv *mlx5_ind_table_ibv_new(struct rte_eth_dev *dev, const uint16_t *queues, uint32_t queues_n); @@ -292,6 +319,10 @@ uint16_t mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t mlx5_tx_burst_empw(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n); uint16_t mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n); +void mlx5_mprq_buf_free_cb(void *addr, void *opaque); +void mlx5_mprq_buf_free(struct mlx5_mprq_buf *buf); +uint16_t mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, + uint16_t pkts_n); uint16_t removed_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n); uint16_t removed_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxtx_vec.c index 215a9e5926..0a4aed8ffa 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec.c +++ b/drivers/net/mlx5/mlx5_rxtx_vec.c @@ -275,6 +275,8 @@ mlx5_rxq_check_vec_support(struct mlx5_rxq_data *rxq) struct mlx5_rxq_ctrl *ctrl = container_of(rxq, struct mlx5_rxq_ctrl, rxq); + if (mlx5_mprq_enabled(ETH_DEV(ctrl->priv))) + return -ENOTSUP; if (!ctrl->priv->config.rx_vec_en || rxq->sges_n != 0) return -ENOTSUP; return 1; @@ -297,6 +299,8 @@ mlx5_check_vec_rx_support(struct rte_eth_dev *dev) if (!priv->config.rx_vec_en) return -ENOTSUP; + if (mlx5_mprq_enabled(dev)) + return -ENOTSUP; /* All the configured queues should support. */ for (i = 0; i < priv->rxqs_n; ++i) { struct mlx5_rxq_data *rxq = (*priv->rxqs)[i]; diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h b/drivers/net/mlx5/mlx5_rxtx_vec.h index 64442836ba..598dc7519b 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec.h @@ -87,7 +87,8 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq, uint16_t n) const uint16_t q_mask = q_n - 1; uint16_t elts_idx = rxq->rq_ci & q_mask; struct rte_mbuf **elts = &(*rxq->elts)[elts_idx]; - volatile struct mlx5_wqe_data_seg *wq = &(*rxq->wqes)[elts_idx]; + volatile struct mlx5_wqe_data_seg *wq = + &((volatile struct mlx5_wqe_data_seg *)rxq->wqes)[elts_idx]; unsigned int i; assert(n >= MLX5_VPMD_RXQ_RPLNSH_THRESH); diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c index 36b7c9e2f7..3e7c0a90f0 100644 --- a/drivers/net/mlx5/mlx5_trigger.c +++ b/drivers/net/mlx5/mlx5_trigger.c @@ -102,6 +102,9 @@ mlx5_rxq_start(struct rte_eth_dev *dev) unsigned int i; int ret = 0; + /* Allocate/reuse/resize mempool for Multi-Packet RQ. */ + if (mlx5_mprq_alloc_mp(dev)) + goto error; for (i = 0; i != priv->rxqs_n; ++i) { struct mlx5_rxq_ctrl *rxq_ctrl = mlx5_rxq_get(dev, i); struct rte_mempool *mp; @@ -109,7 +112,8 @@ mlx5_rxq_start(struct rte_eth_dev *dev) if (!rxq_ctrl) continue; /* Pre-register Rx mempool. */ - mp = rxq_ctrl->rxq.mp; + mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ? + rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp; DRV_LOG(DEBUG, "port %u Rx queue %u registering" " mp %s having %u chunks",