From 9ec0f97e02e19c7acec5ebc9e472b8a2f4605e1f Mon Sep 17 00:00:00 2001 From: Eli Britstein Date: Fri, 16 Oct 2020 15:51:06 +0300 Subject: [PATCH] ethdev: add tunnel offload model rte_flow API provides the building blocks for vendor-agnostic flow classification offloads. The rte_flow "patterns" and "actions" primitives are fine-grained, thus enabling DPDK applications the flexibility to offload network stacks and complex pipelines. Applications wishing to offload tunneled traffic are required to use the rte_flow primitives, such as group, meta, mark, tag, and others to model their high-level objects. The hardware model design for high-level software objects is not trivial. Furthermore, an optimal design is often vendor-specific. When hardware offloads tunneled traffic in multi-group logic, partially offloaded packets may arrive to the application after they were modified in hardware. In this case, the application may need to restore the original packet headers. Consider the following sequence: The application decaps a packet in one group and jumps to a second group where it tries to match on a 5-tuple, that will miss and send the packet to the application. In this case, the application does not receive the original packet but a modified one. Also, in this case, the application cannot match on the outer header fields, such as VXLAN vni and 5-tuple. There are several possible ways to use rte_flow "patterns" and "actions" to resolve the issues above. For example: 1 Mapping headers to a hardware registers using the rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects. 2 Apply the decap only at the last offload stage after all the "patterns" were matched and the packet will be fully offloaded. Every approach has its pros and cons and is highly dependent on the hardware vendor. For example, some hardware may have a limited number of registers while other hardware could not support inner actions and must decap before accessing inner headers. The tunnel offload model resolves these issues. The model goals are: 1 Provide a unified application API to offload tunneled traffic that is capable to match on outer headers after decap. 2 Allow the application to restore the outer header of partially offloaded packets. The tunnel offload model does not introduce new elements to the existing RTE flow model and is implemented as a set of helper functions. For the application to work with the tunnel offload API it has to adjust flow rules in multi-table tunnel offload in the following way: 1 Remove explicit call to decap action and replace it with PMD actions obtained from rte_flow_tunnel_decap_and_set() helper. 2 Add PMD items obtained from rte_flow_tunnel_match() helper to all other rules in the tunnel offload sequence. VXLAN Code example: Assume application needs to do inner NAT on the VXLAN packet. The first rule in group 0: flow create ingress group 0 pattern eth / ipv4 / udp dst is 4789 / vxlan / end actions {pmd actions} / jump group 3 / end The first VXLAN packet that arrives matches the rule in group 0 and jumps to group 3. In group 3 the packet will miss since there is no flow to match and will be sent to the application. Application will call rte_flow_get_restore_info() to get the packet outer header. Application will insert a new rule in group 3 to match outer and inner headers: flow create ingress group 3 pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 / udp dst 4789 / vxlan vni is 10 / ipv4 dst is 184.1.2.3 / end actions set_ipv4_dst 186.1.1.1 / queue index 3 / end Resulting of the rules will be that VXLAN packet with vni=10, outer IPv4 dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received decapped on queue 3 with IPv4 dst=186.1.1.1 Note: The packet in group 3 is considered decapped. All actions in that group will be done on the header that was inner before decap. The application may specify an outer header to be matched on. It's PMD responsibility to translate these items to outer metadata. API usage: /** * 1. Initiate RTE flow tunnel object */ const struct rte_flow_tunnel tunnel = { .type = RTE_FLOW_ITEM_TYPE_VXLAN, .tun_id = 10, } /** * 2. Obtain PMD tunnel actions * * pmd_actions is an intermediate variable application uses to * compile actions array */ struct rte_flow_action **pmd_actions; rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions, &num_pmd_actions, &error); /** * 3. offload the first rule * matching on VXLAN traffic and jumps to group 3 * (implicitly decaps packet) */ app_actions = jump group 3 rule_items = app_items; /** eth / ipv4 / udp / vxlan */ rule_actions = { pmd_actions, app_actions }; attr.group = 0; flow_1 = rte_flow_create(port_id, &attr, rule_items, rule_actions, &error); /** * 4. after flow creation application does not need to keep the * tunnel action resources. */ rte_flow_tunnel_action_release(port_id, pmd_actions, num_pmd_actions); /** * 5. After partially offloaded packet miss because there was no * matching rule handle miss on group 3 */ struct rte_flow_restore_info info; rte_flow_get_restore_info(port_id, mbuf, &info, &error); /** * 6. Offload NAT rule: */ app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 / vxlan vni is 10 / ipv4 dst is 184.1.2.3 } app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 } rte_flow_tunnel_match(&info.tunnel, &pmd_items, &num_pmd_items, &error); rule_items = {pmd_items, app_items}; rule_actions = app_actions; attr.group = info.group_id; flow_2 = rte_flow_create(port_id, &attr, rule_items, rule_actions, &error); /** * 7. Release PMD items after rule creation */ rte_flow_tunnel_item_release(port_id, pmd_items, num_pmd_items); References 1. https://mails.dpdk.org/archives/dev/2020-June/index.html Signed-off-by: Eli Britstein Signed-off-by: Gregory Etelson Acked-by: Ori Kam Acked-by: Viacheslav Ovsiienko --- doc/guides/prog_guide/rte_flow.rst | 78 +++++++++ doc/guides/rel_notes/release_20_11.rst | 5 + lib/librte_ethdev/rte_ethdev_version.map | 5 + lib/librte_ethdev/rte_flow.c | 112 +++++++++++++ lib/librte_ethdev/rte_flow.h | 195 +++++++++++++++++++++++ lib/librte_ethdev/rte_flow_driver.h | 32 ++++ 6 files changed, 427 insertions(+) diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst index 7fb5ec9059..8dc048c6f4 100644 --- a/doc/guides/prog_guide/rte_flow.rst +++ b/doc/guides/prog_guide/rte_flow.rst @@ -3131,6 +3131,84 @@ operations include: - Duplication of a complete flow rule description. - Pattern item or action name retrieval. +Tunneled traffic offload +~~~~~~~~~~~~~~~~~~~~~~~~ + +rte_flow API provides the building blocks for vendor-agnostic flow +classification offloads. The rte_flow "patterns" and "actions" +primitives are fine-grained, thus enabling DPDK applications the +flexibility to offload network stacks and complex pipelines. +Applications wishing to offload tunneled traffic are required to use +the rte_flow primitives, such as group, meta, mark, tag, and others to +model their high-level objects. The hardware model design for +high-level software objects is not trivial. Furthermore, an optimal +design is often vendor-specific. + +When hardware offloads tunneled traffic in multi-group logic, +partially offloaded packets may arrive to the application after they +were modified in hardware. In this case, the application may need to +restore the original packet headers. Consider the following sequence: +The application decaps a packet in one group and jumps to a second +group where it tries to match on a 5-tuple, that will miss and send +the packet to the application. In this case, the application does not +receive the original packet but a modified one. Also, in this case, +the application cannot match on the outer header fields, such as VXLAN +vni and 5-tuple. + +There are several possible ways to use rte_flow "patterns" and +"actions" to resolve the issues above. For example: + +1 Mapping headers to a hardware registers using the +rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects. + +2 Apply the decap only at the last offload stage after all the +"patterns" were matched and the packet will be fully offloaded. + +Every approach has its pros and cons and is highly dependent on the +hardware vendor. For example, some hardware may have a limited number +of registers while other hardware could not support inner actions and +must decap before accessing inner headers. + +The tunnel offload model resolves these issues. The model goals are: + +1 Provide a unified application API to offload tunneled traffic that +is capable to match on outer headers after decap. + +2 Allow the application to restore the outer header of partially +offloaded packets. + +The tunnel offload model does not introduce new elements to the +existing RTE flow model and is implemented as a set of helper +functions. + +For the application to work with the tunnel offload API it +has to adjust flow rules in multi-table tunnel offload in the +following way: + +1 Remove explicit call to decap action and replace it with PMD actions +obtained from rte_flow_tunnel_decap_and_set() helper. + +2 Add PMD items obtained from rte_flow_tunnel_match() helper to all +other rules in the tunnel offload sequence. + +The model requirements: + +Software application must initialize +rte_tunnel object with tunnel parameters before calling +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match(). + +PMD actions array obtained in rte_flow_tunnel_decap_set() must be +released by application with rte_flow_action_release() call. + +PMD items array obtained with rte_flow_tunnel_match() must be released +by application with rte_flow_item_release() call. Application can +release PMD items and actions after rule was created. However, if the +application needs to create additional rule for the same tunnel it +will need to obtain PMD items again. + +Application cannot destroy rte_tunnel object before it releases all +PMD actions & PMD items referencing that tunnel. + Caveats ------- diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index a8b547d052..101f72daa0 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -115,6 +115,11 @@ New Features * Flow rule verification was updated to accept private PMD items and actions. +* **Added generic API to offload tunneled traffic and restore missed packet.** + + * Added a new hardware independent helper to flow API that + offloads tunneled traffic and restores missed packets. + * **Updated the ethdev library to support hairpin between two ports.** New APIs are introduced to support binding / unbinding 2 ports hairpin. diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map index f64c379ac2..8ddda2547f 100644 --- a/lib/librte_ethdev/rte_ethdev_version.map +++ b/lib/librte_ethdev/rte_ethdev_version.map @@ -239,6 +239,11 @@ EXPERIMENTAL { rte_flow_shared_action_destroy; rte_flow_shared_action_query; rte_flow_shared_action_update; + rte_flow_tunnel_decap_set; + rte_flow_tunnel_match; + rte_flow_get_restore_info; + rte_flow_tunnel_action_decap_release; + rte_flow_tunnel_item_release; }; INTERNAL { diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c index b74ea5593a..380c5cae2c 100644 --- a/lib/librte_ethdev/rte_flow.c +++ b/lib/librte_ethdev/rte_flow.c @@ -1143,3 +1143,115 @@ rte_flow_shared_action_query(uint16_t port_id, data, error); return flow_err(port_id, ret, error); } + +int +rte_flow_tunnel_decap_set(uint16_t port_id, + struct rte_flow_tunnel *tunnel, + struct rte_flow_action **actions, + uint32_t *num_of_actions, + struct rte_flow_error *error) +{ + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; + const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error); + + if (unlikely(!ops)) + return -rte_errno; + if (likely(!!ops->tunnel_decap_set)) { + return flow_err(port_id, + ops->tunnel_decap_set(dev, tunnel, actions, + num_of_actions, error), + error); + } + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, rte_strerror(ENOTSUP)); +} + +int +rte_flow_tunnel_match(uint16_t port_id, + struct rte_flow_tunnel *tunnel, + struct rte_flow_item **items, + uint32_t *num_of_items, + struct rte_flow_error *error) +{ + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; + const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error); + + if (unlikely(!ops)) + return -rte_errno; + if (likely(!!ops->tunnel_match)) { + return flow_err(port_id, + ops->tunnel_match(dev, tunnel, items, + num_of_items, error), + error); + } + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, rte_strerror(ENOTSUP)); +} + +int +rte_flow_get_restore_info(uint16_t port_id, + struct rte_mbuf *m, + struct rte_flow_restore_info *restore_info, + struct rte_flow_error *error) +{ + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; + const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error); + + if (unlikely(!ops)) + return -rte_errno; + if (likely(!!ops->get_restore_info)) { + return flow_err(port_id, + ops->get_restore_info(dev, m, restore_info, + error), + error); + } + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, rte_strerror(ENOTSUP)); +} + +int +rte_flow_tunnel_action_decap_release(uint16_t port_id, + struct rte_flow_action *actions, + uint32_t num_of_actions, + struct rte_flow_error *error) +{ + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; + const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error); + + if (unlikely(!ops)) + return -rte_errno; + if (likely(!!ops->action_release)) { + return flow_err(port_id, + ops->action_release(dev, actions, + num_of_actions, error), + error); + } + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, rte_strerror(ENOTSUP)); +} + +int +rte_flow_tunnel_item_release(uint16_t port_id, + struct rte_flow_item *items, + uint32_t num_of_items, + struct rte_flow_error *error) +{ + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; + const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error); + + if (unlikely(!ops)) + return -rte_errno; + if (likely(!!ops->item_release)) { + return flow_err(port_id, + ops->item_release(dev, items, + num_of_items, error), + error); + } + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, rte_strerror(ENOTSUP)); +} diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h index 48395284b5..a8eac4deb8 100644 --- a/lib/librte_ethdev/rte_flow.h +++ b/lib/librte_ethdev/rte_flow.h @@ -3620,6 +3620,201 @@ rte_flow_shared_action_query(uint16_t port_id, void *data, struct rte_flow_error *error); +/* Tunnel has a type and the key information. */ +struct rte_flow_tunnel { + /** + * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN, + * RTE_FLOW_ITEM_TYPE_NVGRE etc. + */ + enum rte_flow_item_type type; + uint64_t tun_id; /**< Tunnel identification. */ + + RTE_STD_C11 + union { + struct { + rte_be32_t src_addr; /**< IPv4 source address. */ + rte_be32_t dst_addr; /**< IPv4 destination address. */ + } ipv4; + struct { + uint8_t src_addr[16]; /**< IPv6 source address. */ + uint8_t dst_addr[16]; /**< IPv6 destination address. */ + } ipv6; + }; + rte_be16_t tp_src; /**< Tunnel port source. */ + rte_be16_t tp_dst; /**< Tunnel port destination. */ + uint16_t tun_flags; /**< Tunnel flags. */ + + bool is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */ + + /** + * the following members are required to restore packet + * after miss + */ + uint8_t tos; /**< TOS for IPv4, TC for IPv6. */ + uint8_t ttl; /**< TTL for IPv4, HL for IPv6. */ + uint32_t label; /**< Flow Label for IPv6. */ +}; + +/** + * Indicate that the packet has a tunnel. + */ +#define RTE_FLOW_RESTORE_INFO_TUNNEL (1ULL << 0) + +/** + * Indicate that the packet has a non decapsulated tunnel header. + */ +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED (1ULL << 1) + +/** + * Indicate that the packet has a group_id. + */ +#define RTE_FLOW_RESTORE_INFO_GROUP_ID (1ULL << 2) + +/** + * Restore information structure to communicate the current packet processing + * state when some of the processing pipeline is done in hardware and should + * continue in software. + */ +struct rte_flow_restore_info { + /** + * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of + * other fields in struct rte_flow_restore_info. + */ + uint64_t flags; + uint32_t group_id; /**< Group ID where packed missed */ + struct rte_flow_tunnel tunnel; /**< Tunnel information. */ +}; + +/** + * Allocate an array of actions to be used in rte_flow_create, to implement + * tunnel-decap-set for the given tunnel. + * Sample usage: + * actions vxlan_decap / tunnel-decap-set(tunnel properties) / + * jump group 0 / end + * + * @param port_id + * Port identifier of Ethernet device. + * @param[in] tunnel + * Tunnel properties. + * @param[out] actions + * Array of actions to be allocated by the PMD. This array should be + * concatenated with the actions array provided to rte_flow_create. + * @param[out] num_of_actions + * Number of actions allocated. + * @param[out] error + * Perform verbose error reporting if not NULL. PMDs initialize this + * structure in case of error only. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +__rte_experimental +int +rte_flow_tunnel_decap_set(uint16_t port_id, + struct rte_flow_tunnel *tunnel, + struct rte_flow_action **actions, + uint32_t *num_of_actions, + struct rte_flow_error *error); + +/** + * Allocate an array of items to be used in rte_flow_create, to implement + * tunnel-match for the given tunnel. + * Sample usage: + * pattern tunnel-match(tunnel properties) / outer-header-matches / + * inner-header-matches / end + * + * @param port_id + * Port identifier of Ethernet device. + * @param[in] tunnel + * Tunnel properties. + * @param[out] items + * Array of items to be allocated by the PMD. This array should be + * concatenated with the items array provided to rte_flow_create. + * @param[out] num_of_items + * Number of items allocated. + * @param[out] error + * Perform verbose error reporting if not NULL. PMDs initialize this + * structure in case of error only. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +__rte_experimental +int +rte_flow_tunnel_match(uint16_t port_id, + struct rte_flow_tunnel *tunnel, + struct rte_flow_item **items, + uint32_t *num_of_items, + struct rte_flow_error *error); + +/** + * Populate the current packet processing state, if exists, for the given mbuf. + * + * @param port_id + * Port identifier of Ethernet device. + * @param[in] m + * Mbuf struct. + * @param[out] info + * Restore information. Upon success contains the HW state. + * @param[out] error + * Perform verbose error reporting if not NULL. PMDs initialize this + * structure in case of error only. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +__rte_experimental +int +rte_flow_get_restore_info(uint16_t port_id, + struct rte_mbuf *m, + struct rte_flow_restore_info *info, + struct rte_flow_error *error); + +/** + * Release the action array as allocated by rte_flow_tunnel_decap_set. + * + * @param port_id + * Port identifier of Ethernet device. + * @param[in] actions + * Array of actions to be released. + * @param[in] num_of_actions + * Number of elements in actions array. + * @param[out] error + * Perform verbose error reporting if not NULL. PMDs initialize this + * structure in case of error only. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +__rte_experimental +int +rte_flow_tunnel_action_decap_release(uint16_t port_id, + struct rte_flow_action *actions, + uint32_t num_of_actions, + struct rte_flow_error *error); + +/** + * Release the item array as allocated by rte_flow_tunnel_match. + * + * @param port_id + * Port identifier of Ethernet device. + * @param[in] items + * Array of items to be released. + * @param[in] num_of_items + * Number of elements in item array. + * @param[out] error + * Perform verbose error reporting if not NULL. PMDs initialize this + * structure in case of error only. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +__rte_experimental +int +rte_flow_tunnel_item_release(uint16_t port_id, + struct rte_flow_item *items, + uint32_t num_of_items, + struct rte_flow_error *error); #ifdef __cplusplus } #endif diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h index 58f56b0262..bd5ffc0bb1 100644 --- a/lib/librte_ethdev/rte_flow_driver.h +++ b/lib/librte_ethdev/rte_flow_driver.h @@ -131,6 +131,38 @@ struct rte_flow_ops { const struct rte_flow_shared_action *shared_action, void *data, struct rte_flow_error *error); + /** See rte_flow_tunnel_decap_set() */ + int (*tunnel_decap_set) + (struct rte_eth_dev *dev, + struct rte_flow_tunnel *tunnel, + struct rte_flow_action **pmd_actions, + uint32_t *num_of_actions, + struct rte_flow_error *err); + /** See rte_flow_tunnel_match() */ + int (*tunnel_match) + (struct rte_eth_dev *dev, + struct rte_flow_tunnel *tunnel, + struct rte_flow_item **pmd_items, + uint32_t *num_of_items, + struct rte_flow_error *err); + /** See rte_flow_get_rte_flow_restore_info() */ + int (*get_restore_info) + (struct rte_eth_dev *dev, + struct rte_mbuf *m, + struct rte_flow_restore_info *info, + struct rte_flow_error *err); + /** See rte_flow_action_tunnel_decap_release() */ + int (*action_release) + (struct rte_eth_dev *dev, + struct rte_flow_action *pmd_actions, + uint32_t num_of_actions, + struct rte_flow_error *err); + /** See rte_flow_item_release() */ + int (*item_release) + (struct rte_eth_dev *dev, + struct rte_flow_item *pmd_items, + uint32_t num_of_items, + struct rte_flow_error *err); }; /** -- 2.20.1