git.droids-corp.org - dpdk.git/log

net/virtio-user: do not stop stopped device again

Without this change, virtio-user still works, but it will show
annoying error messages like this on shutdown:

vhost_kernel_set_backend(): VHOST_NET_SET_BACKEND fails, Operation not permitted
vhost_kernel_ioctl(): VHOST_RESET_OWNER failed: Operation not permitted

Fixes: e3b434818bbb ("net/virtio-user: support kernel vhost")
Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
Cc: stable@dpdk.org
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/vhost: fix parameters string

Add the missing params to the param string.

Fixes: 39cac2adcad0 ("net/vhost: add client option")
Fixes: 4ce97c6f6b4f ("net/vhost: add an option to enable dequeue zero copy")
Fixes: 447e0d379756 ("net/vhost: add parameter to enable IOMMU feature")
Fixes: 6d6e95cec455 ("net/vhost: add parameter to enable postcopy")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/virtio: drop duplicated reset method

Drop the duplicated reset() method in virtio_pci_ops. Currently
vtpci_reset() is implemented on set_status() and get_status()
directly. The reset() method in virtio_pci_ops isn't used and
its implementation in the legacy device isn't right.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

net/mlx5: add 128B padding of Rx completion entry

A PMD parameter (rxq_cqe_pad_en) is added to enable 128B padding of CQE on
RX side. The size of CQE is aligned with the size of a cacheline of the
core. If cacheline size is 128B, the CQE size is configured to be 128B even
though the device writes only 64B data on the cacheline. This is to avoid
unnecessary cache invalidation by device's two consecutive writes on to one
cacheline. However in some architecture, it is more beneficial to update
entire cacheline with padding the rest 64B rather than striding because
read-modify-write could drop performance a lot. On the other hand, writing
extra data will consume more PCIe bandwidth and could also drop the maximum
throughput. It is recommended to empirically set this parameter. Disabled
by default.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: fix flow counters deletion in Verbs

The Flow counters created with Verbs are erroneously destroyed
in Flow remove function (flow_verbs_remove()). Counter Verbs
handles stored in the translated rule buffer become invalid.
If rule is reapplied with these invalid counter handles the
driver hangs.

The counter should be destroyed with Verbs in the Flow destroy
function. The Flow remove function should keep counters intact.

Fixes: 60bd8c9747e8 ("net/mlx5: add count flow action")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: fix detection and error for multiple item layers

1. The check for the Eth item was wrong. causing an error with
flow rules like:

flow create 0 ingress pattern eth / vlan vid is 13 / ipv4 / gre / eth /
vlan vid is 15 / end actions drop / end

2. align all error messages.

3. align multiple item layers check.

Fixes: 23c1d42c7138 ("net/mlx5: split flow validation to dedicated function")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: fix bit width of flow items

Apply the changes from commit c744f6b1b969 ("net/mlx5: fix bit width of
item and action flags") in some places that were overlooked.

Fixes: 0ddd11437a9a ("net/mlx5: fix bit width of item and action flags")
Fixes: 23c1d42c7138 ("net/mlx5: split flow validation to dedicated function")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>

net/mlx5: use pkg-config to handle SUSE libmnl

SUSE decided to install the libmnl include file in a non-standard
place: /usr/include/libmnl/libmnl/libmnl.h

This was probably a mistake by the SUSE package maintainer,
but hard to get fixed. Workaround the problem by pkg-config to find
the necessary include directive for libmnl.

Fixes: 20b71e92ef8e ("net/mlx5: lay groundwork for switch offloads")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Luca Boccassi <bluca@debian.org>

net/i40e: fix offload not supported mask

Just as the name I40E_TX_OFFLOAD_NOTSUP_MASK indicates, it should be the
mask of unsupported features (either not in PKT_TX_OFFLOAD_MASK or in
I40E_TX_OFFLOAD_MASK), however, xor will not get desired result here,
assume bit 0 of PKT_TX_OFFLOAD_MASK and I40E_TX_OFFLOAD_MAKS are 0 which
means corresponding feature is not supported in both sides, then we get
value of bit 0 of I40E_TX_OFFLOAD_NOTSUP_MASK which is 0 via xor, it
implies that it is supported which doesn't meet our expectation.

Correct it by a NOT-AND operation.

Fixes: 3f33e643e5c6 ("net/i40e: add Tx preparation")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

net/ixgbe: enable detach from secondary

Since we have enabled the hotplug mechanism for multi-process, it's not
necessary to return -EPERM when try detaches a device from a secondary
process.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>

examples/fips_validation: fix build

The example was not added to the Makefile and there are some
compilation errors:

examples/fips_validation/main.c: In function ‘prepare_aead_op’:
error: control reaches end of non-void function
examples/fips_validation/main.c: In function ‘prepare_auth_op’:
error: control reaches end of non-void function

Fixes: 3d0fad56b74a ("examples/fips_validation: add crypto FIPS application")
Fixes: f64adb6714e0 ("examples/fips_validation: support HMAC parsing")
Fixes: 4aaad2995e13 ("examples/fips_validation: support GCM parsing")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

ring/c11: move atomic load of head above the loop

In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
the do {} while loop as upon failure the old_head will be updated,
another load is costly and not necessary.

This helps a little on the latency,about 1~5%.

Test result with the patch(two cores):
SP/SC bulk enq/dequeue (size: 8): 5.64
MP/MC bulk enq/dequeue (size: 8): 9.58
SP/SC bulk enq/dequeue (size: 32): 1.98
MP/MC bulk enq/dequeue (size: 32): 2.30

Fixes: 39368ebfc606 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Jia He <justin.he@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

ring/c11: synchronize load and store of the tail

Synchronize the load-acquire of the tail and the store-release
within update_tail, the store release ensures all the ring operations,
enqueue or dequeue, are seen by the observers on the other side as soon
as they see the updated tail. The load-acquire is needed here as the
data dependency is not a reliable way for ordering as the compiler might
break it by saving to temporary values to boost performance.
When computing the free_entries and avail_entries, use atomic semantics
to load the heads and tails instead.

The patch was benchmarked with test/ring_perf_autotest and it decreases
the enqueue/dequeue latency by 5% ~ 27.6% with two lcores, the real gains
are dependent on the number of lcores, depth of the ring, SPSC or MPMC.
For 1 lcore, it also improves a little, about 3 ~ 4%.
It is a big improvement, in case of MPMC, with two lcores and ring size
of 32, it saves latency up to (3.26-2.36)/3.26 = 27.6%.

This patch is a bug fix, while the improvement is a bonus. In our analysis
the improvement comes from the cacheline pre-filling after hoisting load-
acquire from _atomic_compare_exchange_n up above.

The test command:
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=\
1024 -- -i

Test result with this patch(two cores):
SP/SC bulk enq/dequeue (size: 8): 5.86
MP/MC bulk enq/dequeue (size: 8): 10.15
SP/SC bulk enq/dequeue (size: 32): 1.94
MP/MC bulk enq/dequeue (size: 32): 2.36

In comparison of the test result without this patch:
SP/SC bulk enq/dequeue (size: 8): 6.67
MP/MC bulk enq/dequeue (size: 8): 13.12
SP/SC bulk enq/dequeue (size: 32): 2.04
MP/MC bulk enq/dequeue (size: 32): 3.26

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Jia He <justin.he@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

examples/fips_validation: support CCM parsing

Added enablement for CCM parser, to allow the
application to parser the ccm request files and to validate all
test types supported.

Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Reviewed-by: Akhil Goyal <akhil.goyal@nxp.com>

examples/fips_validation: support CMAC parsing

Added enablement for CMAC parser, to allow the
application to parser the cmac request files and to validate all
test types supported.

Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Reviewed-by: Akhil Goyal <akhil.goyal@nxp.com>

examples/fips_validation: support GCM parsing

Added enablement for GCM parser, to allow the
application to parser the GCM request file and to validate all
tests supported.

Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Reviewed-by: Akhil Goyal <akhil.goyal@nxp.com>

examples/fips_validation: support TDES parsing

Added enablement for TDES parser, to allow the
application to parser the TDES request files and to validate all
test types supported.

Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Reviewed-by: Akhil Goyal <akhil.goyal@nxp.com>

examples/fips_validation: support HMAC parsing

Added enablement for HMAC parser, to allow the
application to parser the hmac request files and to validate all
tests supported

Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Reviewed-by: Akhil Goyal <akhil.goyal@nxp.com>

examples/fips_validation: support AES parsing

Added enablement for AES-CBC parser, to allow the
application to parser the aes request file and to validate all
test types supported.

Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Reviewed-by: Akhil Goyal <akhil.goyal@nxp.com>

examples/fips_validation: add crypto FIPS application

Added FIPS application into the examples to allow
users to use a simple sample app to validate
their systems and be able to get FIPS certification.

Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Reviewed-by: Akhil Goyal <akhil.goyal@nxp.com>

compress/isal: fix uncleared compression states

Fixing uncleared states of compression & decompression engines post op.

Fixes: 788e748d3845 ("compress/isal: support chained mbufs")
Fixes: dc49e6aa4879 ("compress/isal: add ISA-L compression functionality")
Fixes: 7bf4f0630af6 ("compress/isal: add ISA-L decomp functionality")
Cc: stable@dpdk.org
Signed-off-by: Lee Daly <lee.daly@intel.com>

compress/qat: add log for IM buffer too small

Display trace if error returned from firmware is likely due
to intermediate buffers being too small for the compressed
output. Update documentation to explain this error case
and to clarify intermediate buffer memory usage.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>

test/compress: improve debug logs

Make clear which engine is compressing and which is decompressing
in debug output. Also add newline and print ratio = 0 if test fails.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
Acked-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>

compress/qat: fix out-of-bounds write

QAT array for sgls in intermediate buffer structure
was #defined to 1, but setup code hardcoded as if 2 buffers
so causing out of bounds write. Reworked to loop correctly
using #define.

Fixes: a124830a6f00 ("compress/qat: enable dynamic huffman encoding")
Reported-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>

crypto/caam_jr: fix check before job ring freeing

Check should be on parameter uio_fd instead of
local variable job_ring

Fixes: e7a45f3cc2 ("crypto/caam_jr: add UIO specific operations")
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>

compressdev: fix op allocation

Fixed bad logic in rte_comp_op_alloc() checking return
value from rte_comp_op_raw_bulk_alloc(). This
could have resulted in a seg-fault in error case.
Made rte_comp_ob_bulk_alloc() code consistent
with rte_comp_op_alloc().

Fixes: 96086db5a369 ("compressdev: add operation management")
Cc: stable@dpdk.org
Reported-by: Sabyasachi Sengupta <sabyasg@hpe.com>
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Shally Verma <shally.verma@caviumnetworks.com>

compressdev: clarify usage of op structure

Add note on usage of op structure and when it can be
accessed and freed.

Fixes: 63f4bfd5328b ("compressdev: add enqueue/dequeue functions")
Cc: stable@dpdk.org
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Shally Verma <shally.verma@caviumnetworks.com>

test/crypto: remove redundant RSA verification

Change unit test app to check only for op->status =
RTE_CRYPTO_OP_STATUS_SUCCESS/ERROR instead of calling rsa_verify().
as the cryptodev API is expected to return error incase of data
mismatch.

Signed-off-by: Ayuj Verma <ayuj.verma@caviumnetworks.com>
Signed-off-by: Akash Saxena <akash.saxena@caviumnetworks.com>
Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>

crypto/openssl: fix RSA verify operation

In lib cryptodev, RSA verify operation inputs plain message text and
corresponding signature and expected to return
RTE_CRYPTO_OP_STATUS_SUCCESS/FAILURE on a signature match/mismatch.
Current OpenSSL PMD RSA verify implementation overrides application passed
sign input by decrypted output which isn't expected.

This patch addresses this issue in OpenSSL PMD. Now, OpenSSL PMD use
tmp buffer to pass to OpenSSL sign API and memcmp output with
original plain text to verify signature match.
Set op->status = RTE_CRYPTO_OP_STATUS_ERROR on signature mismatch.

Fixes: 3e9d6bd447fb ("crypto/openssl: add RSA and mod asym operations")
Cc: stable@dpdk.org
Signed-off-by: Ayuj Verma <ayuj.verma@caviumnetworks.com>
Signed-off-by: Akash Saxena <akash.saxena@caviumnetworks.com>
Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>

examples/ip_pipeline: fix port and table stats read

Fix the pipeline port and table stats read operation.

Fixes: 50e73d051806 ("examples/ip_pipeline: add stats read commands")
Cc: stable@dpdk.org
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Hongjun Ni <hongjun.ni@intel.com>

examples/ip_pipeline: support table rule show

Add support for the table rule show operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

examples/ip_pipeline: support rule time read

Add support for the table rule timestamp read operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

examples/ip_pipeline: support rule TTL stats read

Add support for the table rule TTL stats read operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

examples/ip_pipeline: support meter stats read

Add support for the rule meter stats read operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

examples/ip_pipeline: support rule stats read

Add support for rule stats read operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

examples/ip_pipeline: track rules on delete default

Support table rule tracking on table rule delete default operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

examples/ip_pipeline: track table rules on delete

Support table rule tracking on table rule delete operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Hongjun Ni <hongjun.ni@intel.com>

examples/ip_pipeline: track rules on add default

Support table rule tracking on table rule add default operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

examples/ip_pipeline: track table rules on add bulk

Support table rule tracking on table rule add bulk operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

examples/ip_pipeline: track table rules on add

Support table rule tracking on table rule add operation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Hongjun Ni <hongjun.ni@intel.com>

examples/ip_pipeline: add rule list per table

For each pipeline table, have the master thread maintain the list of
rules that are currently stored in the table. This list allows the
master thread to handle table queries with minimal impact for the
data plane threads: requests to read the current set of table rules
are fully handled by the master thread with no involvement from
data plane threads, requests to read the per table rule moving data
(such as stats counters or timestamp associated with specific
actions) are handled by the data plane threads through plain memory
reads rather than key lookup.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Hongjun Ni <hongjun.ni@intel.com>

net/softnic: fix string copy

Use strlcpy instead of strcpy to avoid buffer overrun.

Coverity issues: 323475,323478,323514,323515
Fixes: b767f8efc8 ("net/softnic: replace pointers with arrays")
Fixes: c169b6a588 ("net/softnic: map flow attribute to pipeline table")
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>

net/softnic: fix mixing enum values

Fix mixing enum types enum rte_table_action_policer
and enum rte_mtr_policer_action for dereference of
policer action.

Coverity issue 323483, 323511
Fixes: 7e30e444c3e4 ("net/softnic: support flow meter action")
Fixes: 8a917ef88db7 ("net/softnic: update policer actions")
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

doc: add policer table details for metering application

The change adds note for previous colour in colour blind and DROP
in profile table actions. In colour blind mode only valid previous
colour is GREEN. To drop packets based on new colour one needs to
set action as DROP in profile table.

Signed-off-by: Vipin Varghese <vipin.varghese@intel.com>

app/testpmd: fix RED byte stats

Y stands for Yellow, R stands for Red.

Fixes: 30ffb4e67ee3 ("app/testpmd: add commands traffic metering and policing")
Cc: stable@dpdk.org
Signed-off-by: Leah Tekoa <leah@ethernitynet.com>

app/testpmd: fix shaper profile parameters

As struct rte_tm_shaper_params defined, the command line of
testpmd should include committed and peak parameters, but
right now the command line doesn't identify whether it's
committed or peak parameter. This patch identifies and
adds the clarify definition

Fixes: bddc2f40b594 ("app/testpmd: add commands for shaper and wred profiles")
Cc: stable@dpdk.org
Signed-off-by: Rosen Xu <rosen.xu@intel.com>

mem: add thread unsafe version for DMA mask check

During memory initialization calling rte_mem_check_dma_mask
leads to a deadlock because memory_hotplug_lock is locked by a
writer, the current code in execution, and rte_memseg_walk
tries to lock as a reader.

This patch adds a thread_unsafe version which will call the final
function specifying the memory_hotplug_lock does not need to be
acquired. The patch also modified rte_mem_check_dma_mask as a
intermediate step which will call the final function as before,
implying memory_hotplug_lock will be acquired.

PMDs should always use the version acquiring the lock with the
thread_unsafe one being just for internal EAL memory code.

Fixes: 223b7f1d5ef6 ("mem: add function for checking memseg IOVA")
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

mem: use DMA mask check for legacy memory

If a device reports addressing limitations through a dma mask,
the IOVAs for mapped memory needs to be checked out for ensuring
correct functionality.

Previous patches introduced this DMA check for main memory code
currently being used but other options like legacy memory and the
no hugepages option need to be also considered.

This patch adds the DMA check for those cases.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

malloc: modify error message for DMA mask check

If DMA mask checks shows mapped memory out of the supported range
specified by the DMA mask, nothing can be done but return an error
an report the error. This can imply the app not being executed at
all or precluding dynamic memory allocation once the app is running.
In any case, we can advice the user to force IOVA as PA if currently
IOVA being VA and user being root.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

bus/pci: avoid call to DMA mask check

Calling rte_mem_check_dma_mask when memory has not been initialized
yet is wrong. This patch use rte_mem_set_dma_mask instead.

Once memory initialization is done, the dma mask set will be used
for checking memory mapped is within the specified mask.

Fixes: fe822eb8c565 ("bus/pci: use IOVA DMA mask check when setting IOVA mode")
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

mem: add function for setting DMA mask

This patch adds the possibility of setting a dma mask to be used
once the memory initialization is done.

This is currently needed when IOVA mode is set by PCI related
code and an x86 IOMMU hardware unit is present. Current code calls
rte_mem_check_dma_mask but it is wrong to do so at that point
because the memory has not been initialized yet.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

mem: rename DMA mask check with proper prefix

Current name rte_eal_check_dma_mask does not follow the naming
used in the rest of the file.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

malloc: fix DMA mask check

The param needs to be the maskbits and not the mask.

Fixes: 223b7f1d5ef6 ("mem: add function for checking memseg IOVA")
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

doc: fix PDF build

PDF build was failing in the howto guides
found the weird character causing the issue

Fixes: 6e9270eab112 ("doc: add telemetry how-to")
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>

eal: fix build with gcc 9.0

build error:
In function ‘eal_plugin_add’,
    .../lib/librte_eal/common/eal_common_options.c:225:2:
    error: ‘strncpy’ output may be truncated copying 4095 bytes from a
           string of length 4095 [-Werror=stringop-truncation]
    strncpy(solib->name, path, PATH_MAX-1);

strncpy may result a not null-terminated string,
replaced it with strlcpy

Fixes: f9a08f650211 ("eal: add support for shared object drivers")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>

bus/dpaa: fix build with gcc 9.0

build error:
In function ‘fman_if_init’,
    .../drivers/bus/dpaa/base/fman/fman.c:186:2:
    error: ‘strncpy’ output may be truncated copying 4095 bytes from a
           string of length 4095 [-Werror=stringop-truncation]
    strncpy(__if->node_path, dpa_node->full_name, PATH_MAX - 1);

strncpy may result a not null-terminated string,
replaced it with strlcpy

Fixes: 5b22cf744689 ("bus/dpaa: introducing FMan configurations")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>

crypto/scheduler: fix build with gcc 8.2

build_error:

drivers/crypto/scheduler/scheduler_pmd.c: In function ‘parse_name_arg’:
drivers/crypto/scheduler/scheduler_pmd.c:372:2: error: ‘strncpy’
specified bound 64 equals destination size [-Werror=stringop-truncation]
strncpy(params->name, value, RTE_CRYPTODEV_NAME_MAX_LEN);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
strncpy may result a not null-terminated string,
replaced it with strlcpy

Fixes: 503e9c5afb38 ("crypto/scheduler: register as vdev driver")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

eal: fix error string function

errno_autotest testcase were failed since
commit 5d7b673d5fd6 ("mk: build with _GNU_SOURCE defined by default")
RTE>>errno_autotest
rte_strerror: 'Unknown error 11',
strerror: 'Resource temporarily unavailable'
Test Failed

There are two different version of strerror_t() based on
_GNU_SOURCE definition.

/* XSI-compliant */
int strerror_r(int errnum, char *buf, size_t buflen);

/* GNU-specific */
char *strerror_r(int errnum, char *buf, size_t buflen);

Since the GNU-specific version returns char* the exiting "if"
condition around the strerror_r fails.

Switching back to XSI-compliant version to allow

a) Portable strerror_r() usage as musl c library uses
non GNU speficic version
https://git.musl-libc.org/cgit/musl/tree/src/string/strerror_r.c

b) Based on strerror_r(3) man page, it is possible that GNU-specific
version need not use char *buf to fill error message instead it
can use the immutable static string from the library and return it.

note from strerror_r(3) man page:

The GNU-specific strerror_r() returns a pointer to a string containing
the error message. This may be either a pointer to a string that the
function stores in buf, or a pointer to some (immutable)
static string (in which case buf is unused).

Fixes: 5d7b673d5fd6 ("mk: build with _GNU_SOURCE defined by default")
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

mk: disable gcc AVX512F support

This is a workaround to prevent a crash, which might be caused by
optimization of newer gcc (7.3.0) on Intel Skylake.

This disables AVX512F support of gcc by adding -mno-avx512f if it is
disabled in DPDK (CONFIG_RTE_ENABLE_AVX512=n).

This does not apply to the meson build as that doesn't have such an option
but always enable AVX512F whenever supported.

Bugzilla ID: 97
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

examples/vm_power: respect maximum CPUs

The vm_power_manager app was not respecting the POWER_MGR_MAX_CPUS
during initialisation, so if there were more CPUs than this value (64),
it would lead to buffer overruns of there were more then 64 cores in
the system.

Added in a check during init and un-init to only initialise up to
lcore_id 63.

This raises the question as to why not simply increase the value of
POWER_MGR_MAX_CPUS. Well, it's not that simple, as many of the APIs take
a uint64_t as a parameter for the core mask, and this will not work for
cores greater than 63. So some work needs to be done in the future to
remove this limitation. For now we'll fix the memory corruption.

Also, the patch that this fixes says "allow greater than 64 cores" but
that's not across the entire application, it's only for the out-of-band
monitoring. I'll add a notice for an API change in the next release to
clean this up, i.e. depricate any API calls that use masks.

Fixes: 6453b9284b64 ("examples/vm_power: allow greater than 64 cores")
Cc: stable@dpdk.org
Signed-off-by: David Hunt <david.hunt@intel.com>

examples/multi_process: add sigint handler to server

add sigint handler in the server application to stop and close ports

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>

devtools: add explicit warnings for forbidden tokens

Replace the content of warning in the forbidden tokens script
from using the searched regex into using explicit messages

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>

eal/linux: handle UIO read failure in interrupt handler

If a device is unplugged while an interrupt is pending, the
read call to the uio device to remove it from the poll wait list
can fail resulting in it being continually polled forever. This
change checks for the read failing and if so, unregisters the device
as an interrupt source and causes the wait list to be rebuilt.

This race has been reported and observed in production.

Fixes: 0a45657a6794 ("pci: rework interrupt handling")
Cc: stable@dpdk.org
Signed-off-by: Brian Russell <brussell@brocade.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>

net/vmxnet3: fix hot-unplug

The vmxnet3 driver can't call back into dev_close(), and possibly
dev_stop(), in dev_uninit(). When dev_uninit() is called, anything
that those routines would want to clean up has already been released.
Further, for complete cleanup, it is necessary to release any of the
queue resources during dev_close().
This allows a vmxnet3 device to be hot-unplugged without leaking
queues.
Also set RTE_ETH_DEV_CLOSE_REMOVE on close so that the port resources
can be deallocated.
Return EBUSY if remove is called before stop.

Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver implementation")
Cc: stable@dpdk.org
Signed-off-by: Brian Russell <brussell@brocade.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>

net/virtio: register/unregister intr handler on start/stop

Register and unregister the virtio interrupt handler when the device is
started and stopped. This allows a virtio device to be hotplugged or
unplugged.

Fixes: c1f86306a026 ("virtio: add new driver")
Cc: stable@dpdk.org
Signed-off-by: Brian Russell <brussell@brocade.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

eal: fix memory leak on multi-process hotplug rollback

Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

drivers: remove useless constructor headers

A constructor is usually declared with RTE_INIT* macros.
As it is a static function, no need to declare before its definition.
The macro is used directly in the function definition.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

devtools: check wrong svg include in guides

Including svg files with the svg extension is a common mistake:
.. figure:: example.svg
must be
.. figure:: example.*
So it will work also when building pdf doc with figures converted
to png files.

A check is added in checkpatches.sh.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>

bus/pci: fix resource mapping override

When scanning an already plugged device, the virtual address
of mapped PCI resource in rte_pci_device will be overridden
with 0, that may cause driver does not work correctly.
The fix is not to update any rte_pci_device's field if the being
scanned device's driver is already probed.

Bugzilla ID: 85
Fixes: c752998b5e2e ("pci: introduce library and driver")
Cc: stable@dpdk.org
Reported-by: Geoffrey Lv <geoffrey.lv@gmail.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>

eal: fix IPC memory leak on device hotplug

rte_mp_request_sync() says that the caller is responsible
for freeing one of its parameters afterwards. EAL didn't
do that, causing a memory leak.

Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

doc: fix PDF build

Don't use .svg extension on ..figure references. PDF versions of
the documents use .png images generated from the .svg images.

Fixes: 0ba610ca1d17 ("net/mvpp2: document MTR and TM usage")
Signed-off-by: Dan Gora <dg@adax.com>

version: 18.11-rc1

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

test/metrics: add unit tests for metrics library

Unit testcases are added for metrics library
Added metrics unit test to autotest list
Updated meson build file
Updated MAINTAINERSHIP for metrics unit test

Signed-off-by: Hari Kumar Vemula <hari.kumarx.vemula@intel.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>
Reviewed-by: Remy Horton <remy.horton@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>

test/pmd_ring: restructure and cleanup

Divided main test to smaller logical tests.
Registered with UT framework.
Added cleanup of the resources else ring creation fails
during consecutive test runs.
Freed the allocated mempool, rings and uninitalized the drivers.

Signed-off-by: Chaitanya Babu Talluri <tallurix.chaitanya.babu@intel.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>

test/timer: reduce duration for race condition case

Reduced test duration for timer_racecond unit test.
This wil help to receive quicker test results.

Signed-off-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>

test: clean up on exit

The test application didn't call rte_eal_cleanup() on exit, which
caused leftover hugepages and memory leaks when running secondary
processes. Fix this by calling rte_eal_cleanup() on exit.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>
Tested-by: Reshma Pattan <reshma.pattan@intel.com>

test: disable alarm autotest in FreeBSD

Disabled the alarm_autotest UT in FreeBSD
Interrupts are not supported in FreeBSD.
Alarm API depends on interrupts, so disabled alarm test on FreeBSD.

Signed-off-by: Pallantla Poornima <pallantlax.poornima@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>

examples/service_cores: check cores before run

The service core samples has varied profiles created to run on specified
lcore count. The patch adds the check before each run, to ensure
example has sufficent lcores to be added as service cores on given run
profile. If sufficent cores are not found, the run is skipped with user
notification.

Signed-off-by: Vipin Varghese <vipin.varghese@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>

examples/ipv4_multicast: enable multicast promiscuous

This example has not been enable for receiving multicast
packet, so it will drop multicast packet. Users must send packet
with ether MAC destination address the same as pf port MAC address,
in order to forward packet successfully, but this is an example
for forwarding ipv4 multicastpacket. So calling function
rte_eth_promiscuous_enable() or rte_eth_allmulticast_enable() can
enable promiscuous mode of all multicast packet. And also, DPDK has
rte API function of rte_eth_dev_set_mc_addr_list() for setting
specific multicast filter table for specific multicast IP address,
but this example do not support this configuration, so it need to
be enable multicast promiscuous mode instead.

Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Tested-by: Dong Wang <dong1.wang@intel.com>

lib: reduce global variable usage

Some global variables can be eliminated, since they are not part of
public interface, it is free to remove them.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

fix global variable issues

Various fixes related to the global variable usage.

Fixes: 43e610bb8565 ("compress/octeontx: introduce octeontx zip PMD")
Fixes: c378f084d6e3 ("compress/octeontx: add device setup ops")
Fixes: b43ebc65aada ("compress/octeontx: create private xform")
Fixes: b1ce8ebd97ba ("eventdev: add PMD callbacks for eth Rx adapter")
Fixes: 3810ae435783 ("eventdev: add interrupt driven queues to Rx adapter")
Fixes: fefed3d1e62c ("enic: new driver")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

drivers: prefix global variables with module name

Some global variables are defined with generic names, add component name
as prefix to variables to prevent collusion with application variables.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Tianfei Zhang <tianfei.zhang@intel.com>

add missing static keyword to globals

Some global variables can indeed be static, add static keyword to them.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

bus/pci: propagate probing error codes

In a couple of places we check its error code against -EEXIST,
but this function returned either -1, 0, or 1.

This gets critical when hotplugging a device in secondary
process, while the same device is already plugged in the
primary. Failing to "hotplug" it in the primary will cause
the secondary to fail as well.

Fixes: e9d159c3d534 ("eal: allow probing a device again")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>

vfio: fix interrupt unregister for hotplug notifier

This function is documented to return the number of unregistered
callbacks or negative numbers on error, but pci_vfio checks for
ret != 0 to detect failures. Not anymore.

Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

vfio: share default container in multi-process

So far each process in MP used to have a separate container
and relied on the primary process to register all memsegs.

Mapping external memory via rte_vfio_container_dma_map()
in secondary processes was broken, because the default
(process-local) container had no groups bound. There was
even no way to bind any groups to it, because the container
fd was deeply encapsulated within EAL.

This patch introduces a new SOCKET_REQ_DEFAULT_CONTAINER
message type for MP synchronization, makes all processes
within a MP party use a single default container, and hence
fixes rte_vfio_container_dma_map() for secondary processes.

From what I checked this behavior was always the same, but
started to be invalid/insufficient once mapping external
memory was allowed.

While here, fix up the comment on rte_vfio_get_container_fd().
This function always opens a new container, never reuses
an old one.

Fixes: 73a639085938 ("vfio: allow to map other memory regions")
Cc: stable@dpdk.org
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

vfio: fix read of freed memory on getting container fd

We were reading some memory just after freeing it.

Fixes: 83a73c5fef66 ("vfio: use generic multi-process channel")
Cc: stable@dpdk.org
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

vfio: cleanup getting group fd

Factor out duplicated code.

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

vfio: check if group fd is already open

Always attempt to find already opened fd for an iommu
group as subsequent attempts to open it will fail.

There's no public API to check if a group was already
bound and has a container, so rte_vfio_container_group_bind()
shouldn't fail in such case.

Fixes: ea2dc1066870 ("vfio: add multi container support")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

bus/pci: compare kernel driver instead of interrupt handler

Invoking the right pci read/write functions is based on interrupt
handler type. However, this is not configured for secondary processes
precluding to use those functions.

This patch fixes the issue using the driver name the device is bound
to instead.

Fixes: 632b2d1deeed ("eal: provide functions to access PCI config")
Cc: stable@dpdk.org
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

net/virtio: fix PCI config error handling

In virtio_read_caps and vtpci_msix_detect, rte_pci_read_config returns
the number of bytes read from PCI config or < 0 on error.
If less than the expected number of bytes are read then log the
failure and return rather than carrying on with garbage.

Fixes: 6ba1f63b5ab0 ("virtio: support specification 1.0")
Cc: stable@dpdk.org
Signed-off-by: Brian Russell <brussell@brocade.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

bus/pci: harmonize return value of config read

On Linux, rte_pci_read_config on success returns the number of read
bytes, but on BSD it returns 0.
Document the return values, and have BSD behave as Linux does.

At least one case (bnx2x PMD) treats 0 as an error, so the change
makes sense also for that.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

eal: force IOVA to a particular mode

This patch uses EAL option "--iova-mode" to force the IOVA mode to a
particular value. There exists virtual devices that are not directly
attached to the PCI bus, and therefore the auto detection of the IOVA
mode based on probing the PCI bus and IOMMU configuration may not
report the required addressing mode. Using the EAL option permits the
mode to be explicitly configured in this scenario.

Signed-off-by: Eric Zhang <eric.zhang@windriver.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Marko Kovacevic <marko.kovacevic@intel.com>

eal: add --iova-mode option

In the case of user don't want to use bus iova scheme and want
to override.

For that, adding EAL option --iova-mode=<string> where valid input
string is 'pa' or 'va'.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Eric Zhang <eric.zhang@windriver.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

vfio: fix sPAPR IOMMU mapping

Commit 73a639085938 ("vfio: allow to map other memory regions")
introduced a bug in sPAPR IOMMU mapping. The commit removed necessary
ioctl with VFIO_IOMMU_SPAPR_REGISTER_MEMORY. Also, vfio_spapr_map_walk
should call vfio_spapr_dma_do_map instead of vfio_spapr_dma_mem_map.

Fixes: 73a639085938 ("vfio: allow to map other memory regions")
Cc: stable@dpdk.org
Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>

net/nfp: support IOVA VA mode

NFP can handle IOVA as VA. It requires to check those IOVAs
being in the supported range what is done during initialization.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>

net/nfp: check hugepage IOVA based on DMA mask

NFP devices can not handle DMA addresses requiring more than
40 bits. This patch uses rte_dev_check_dma_mask with 40 bits
and avoids device initialization if memory out of NFP range.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>

bus/pci: use IOVA DMA mask check when setting IOVA mode

Currently the code precludes IOVA mode if IOMMU hardware reports
less addressing bits than necessary for full virtual memory range.

Although VT-d emulation currently only supports 39 bits, it could
be iovas for allocated memlory being within that supported range.
This patch allows IOVA mode in such a case adding a call to
rte_eal_check_dma_mask using the reported addressing bits by the
IOMMU hardware.

Indeed, memory initialization code has been modified for using lower
virtual addresses than those used by the kernel for 64 bits processes
by default, and therefore memsegs iovas can use 39 bits or less for
most systems. And this is likely 100% true for VMs.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

bus/pci: check IOMMU addressing limitation just once

Current code checks if IOMMU hardware reports enough addressing
bits for using IOVA mode but it repeats the same check for any
PCI device present. This is not necessary because the IOMMU hardware
is the same for all of them.

This patch only checks the IOMMU using first PCI device found.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

mem: use address hint for mapping hugepages

Linux kernel uses a really high address as starting address for
serving mmaps calls. If there exist addressing limitations and
IOVA mode is VA, this starting address is likely too high for
those devices. However, it is possible to use a lower address in
the process virtual address space as with 64 bits there is a lot
of available space.

This patch adds an address hint as starting address for 64 bits
systems and increments the hint for next invocations. If the mmap
call does not use the hint address, repeat the mmap call using
the hint address incremented by page size.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>