app/flow-perf: add multi-core rule insertion and deletion
One of the ways to increase the insertion/deletion rate is to use
multi-threaded insertion/deletion. Thus it's needed to have support
for testing and measure those rates using flow-perf application.
Now we generate cores and distribute all flows to those cores,
and start inserting/deleting in parallel.
The app now receive the cores count to use from command line option,
then it distribute the rte_flow rules evenly between the cores, and
start inserting/deleting. Each worker will report it's own results,
and in the end the MAIN worker will report the total results for all
cores.
The total results are calculated using RULES_COUNT divided over
max time used between all cores.
Also this touches the memory area, since inserting using multiple cores
in same time the pre solution for memory is not valid, thus now we save
memory before and after each allocation for all cores. In the end we
pick the min pre memory and the max post memory from all cores.
The difference between those values represent the total memory consumed
by the total rte_flow rules from all cores, and then report the total
size of single rte_flow in byte for each port.
How to use this feature:
--cores=N
Where 1 =< N <= RTE_MAX_LCORE
Signed-off-by: Wisam Jaddo <wisamm@nvidia.com>
Reviewed-by: Alexander Kozyrev <akozyrev@nvidia.com>
Reviewed-by: Suanming Mou <suanmingm@nvidia.com>