Tiwei Bie [Thu, 25 Oct 2018 09:46:58 +0000 (17:46 +0800)]
net/virtio: drop duplicated reset method
Drop the duplicated reset() method in virtio_pci_ops. Currently
vtpci_reset() is implemented on set_status() and get_status()
directly. The reset() method in virtio_pci_ops isn't used and
its implementation in the legacy device isn't right.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Yongseok Koh [Thu, 25 Oct 2018 06:24:00 +0000 (06:24 +0000)]
net/mlx5: add 128B padding of Rx completion entry
A PMD parameter (rxq_cqe_pad_en) is added to enable 128B padding of CQE on
RX side. The size of CQE is aligned with the size of a cacheline of the
core. If cacheline size is 128B, the CQE size is configured to be 128B even
though the device writes only 64B data on the cacheline. This is to avoid
unnecessary cache invalidation by device's two consecutive writes on to one
cacheline. However in some architecture, it is more beneficial to update
entire cacheline with padding the rest 64B rather than striding because
read-modify-write could drop performance a lot. On the other hand, writing
extra data will consume more PCIe bandwidth and could also drop the maximum
throughput. It is recommended to empirically set this parameter. Disabled
by default.
The Flow counters created with Verbs are erroneously destroyed
in Flow remove function (flow_verbs_remove()). Counter Verbs
handles stored in the translated rule buffer become invalid.
If rule is reapplied with these invalid counter handles the
driver hangs.
The counter should be destroyed with Verbs in the Flow destroy
function. The Flow remove function should keep counters intact.
SUSE decided to install the libmnl include file in a non-standard
place: /usr/include/libmnl/libmnl/libmnl.h
This was probably a mistake by the SUSE package maintainer,
but hard to get fixed. Workaround the problem by pkg-config to find
the necessary include directive for libmnl.
Xiaolong Ye [Fri, 26 Oct 2018 06:33:14 +0000 (14:33 +0800)]
net/i40e: fix offload not supported mask
Just as the name I40E_TX_OFFLOAD_NOTSUP_MASK indicates, it should be the
mask of unsupported features (either not in PKT_TX_OFFLOAD_MASK or in
I40E_TX_OFFLOAD_MASK), however, xor will not get desired result here,
assume bit 0 of PKT_TX_OFFLOAD_MASK and I40E_TX_OFFLOAD_MAKS are 0 which
means corresponding feature is not supported in both sides, then we get
value of bit 0 of I40E_TX_OFFLOAD_NOTSUP_MASK which is 0 via xor, it
implies that it is supported which doesn't meet our expectation.
Qi Zhang [Thu, 25 Oct 2018 02:48:57 +0000 (10:48 +0800)]
net/ixgbe: enable detach from secondary
Since we have enabled the hotplug mechanism for multi-process, it's not
necessary to return -EPERM when try detaches a device from a secondary
process.
Thomas Monjalon [Mon, 5 Nov 2018 17:37:21 +0000 (18:37 +0100)]
examples/fips_validation: fix build
The example was not added to the Makefile and there are some
compilation errors:
examples/fips_validation/main.c: In function ‘prepare_aead_op’:
error: control reaches end of non-void function
examples/fips_validation/main.c: In function ‘prepare_auth_op’:
error: control reaches end of non-void function
Fixes: 3d0fad56b74a ("examples/fips_validation: add crypto FIPS application") Fixes: f64adb6714e0 ("examples/fips_validation: support HMAC parsing") Fixes: 4aaad2995e13 ("examples/fips_validation: support GCM parsing") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Gavin Hu [Fri, 2 Nov 2018 11:21:28 +0000 (19:21 +0800)]
ring/c11: move atomic load of head above the loop
In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
the do {} while loop as upon failure the old_head will be updated,
another load is costly and not necessary.
This helps a little on the latency,about 1~5%.
Test result with the patch(two cores):
SP/SC bulk enq/dequeue (size: 8): 5.64
MP/MC bulk enq/dequeue (size: 8): 9.58
SP/SC bulk enq/dequeue (size: 32): 1.98
MP/MC bulk enq/dequeue (size: 32): 2.30
Fixes: 39368ebfc606 ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Jia He <justin.he@arm.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Gavin Hu [Fri, 2 Nov 2018 11:21:27 +0000 (19:21 +0800)]
ring/c11: synchronize load and store of the tail
Synchronize the load-acquire of the tail and the store-release
within update_tail, the store release ensures all the ring operations,
enqueue or dequeue, are seen by the observers on the other side as soon
as they see the updated tail. The load-acquire is needed here as the
data dependency is not a reliable way for ordering as the compiler might
break it by saving to temporary values to boost performance.
When computing the free_entries and avail_entries, use atomic semantics
to load the heads and tails instead.
The patch was benchmarked with test/ring_perf_autotest and it decreases
the enqueue/dequeue latency by 5% ~ 27.6% with two lcores, the real gains
are dependent on the number of lcores, depth of the ring, SPSC or MPMC.
For 1 lcore, it also improves a little, about 3 ~ 4%.
It is a big improvement, in case of MPMC, with two lcores and ring size
of 32, it saves latency up to (3.26-2.36)/3.26 = 27.6%.
This patch is a bug fix, while the improvement is a bonus. In our analysis
the improvement comes from the cacheline pre-filling after hoisting load-
acquire from _atomic_compare_exchange_n up above.
The test command:
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=\
1024 -- -i
Test result with this patch(two cores):
SP/SC bulk enq/dequeue (size: 8): 5.86
MP/MC bulk enq/dequeue (size: 8): 10.15
SP/SC bulk enq/dequeue (size: 32): 1.94
MP/MC bulk enq/dequeue (size: 32): 2.36
In comparison of the test result without this patch:
SP/SC bulk enq/dequeue (size: 8): 6.67
MP/MC bulk enq/dequeue (size: 8): 13.12
SP/SC bulk enq/dequeue (size: 32): 2.04
MP/MC bulk enq/dequeue (size: 32): 3.26
Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Jia He <justin.he@arm.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Fiona Trahe [Wed, 31 Oct 2018 21:46:57 +0000 (21:46 +0000)]
compress/qat: add log for IM buffer too small
Display trace if error returned from firmware is likely due
to intermediate buffers being too small for the compressed
output. Update documentation to explain this error case
and to clarify intermediate buffer memory usage.
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>
Fiona Trahe [Wed, 31 Oct 2018 00:39:54 +0000 (00:39 +0000)]
compress/qat: fix out-of-bounds write
QAT array for sgls in intermediate buffer structure
was #defined to 1, but setup code hardcoded as if 2 buffers
so causing out of bounds write. Reworked to loop correctly
using #define.
Fixes: a124830a6f00 ("compress/qat: enable dynamic huffman encoding") Reported-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>
Fiona Trahe [Sat, 27 Oct 2018 00:43:07 +0000 (01:43 +0100)]
compressdev: fix op allocation
Fixed bad logic in rte_comp_op_alloc() checking return
value from rte_comp_op_raw_bulk_alloc(). This
could have resulted in a seg-fault in error case.
Made rte_comp_ob_bulk_alloc() code consistent
with rte_comp_op_alloc().
Akash Saxena [Thu, 25 Oct 2018 10:01:01 +0000 (10:01 +0000)]
test/crypto: remove redundant RSA verification
Change unit test app to check only for op->status =
RTE_CRYPTO_OP_STATUS_SUCCESS/ERROR instead of calling rsa_verify().
as the cryptodev API is expected to return error incase of data
mismatch.
Akash Saxena [Thu, 25 Oct 2018 10:00:56 +0000 (10:00 +0000)]
crypto/openssl: fix RSA verify operation
In lib cryptodev, RSA verify operation inputs plain message text and
corresponding signature and expected to return
RTE_CRYPTO_OP_STATUS_SUCCESS/FAILURE on a signature match/mismatch.
Current OpenSSL PMD RSA verify implementation overrides application passed
sign input by decrypted output which isn't expected.
This patch addresses this issue in OpenSSL PMD. Now, OpenSSL PMD use
tmp buffer to pass to OpenSSL sign API and memcmp output with
original plain text to verify signature match.
Set op->status = RTE_CRYPTO_OP_STATUS_ERROR on signature mismatch.
For each pipeline table, have the master thread maintain the list of
rules that are currently stored in the table. This list allows the
master thread to handle table queries with minimal impact for the
data plane threads: requests to read the current set of table rules
are fully handled by the master thread with no involvement from
data plane threads, requests to read the per table rule moving data
(such as stats counters or timestamp associated with specific
actions) are handled by the data plane threads through plain memory
reads rather than key lookup.
Vipin Varghese [Tue, 30 Oct 2018 03:56:15 +0000 (09:26 +0530)]
doc: add policer table details for metering application
The change adds note for previous colour in colour blind and DROP
in profile table actions. In colour blind mode only valid previous
colour is GREEN. To drop packets based on new colour one needs to
set action as DROP in profile table.
Rosen Xu [Mon, 22 Oct 2018 08:46:40 +0000 (16:46 +0800)]
app/testpmd: fix shaper profile parameters
As struct rte_tm_shaper_params defined, the command line of
testpmd should include committed and peak parameters, but
right now the command line doesn't identify whether it's
committed or peak parameter. This patch identifies and
adds the clarify definition
Fixes: bddc2f40b594 ("app/testpmd: add commands for shaper and wred profiles") Cc: stable@dpdk.org Signed-off-by: Rosen Xu <rosen.xu@intel.com>
During memory initialization calling rte_mem_check_dma_mask
leads to a deadlock because memory_hotplug_lock is locked by a
writer, the current code in execution, and rte_memseg_walk
tries to lock as a reader.
This patch adds a thread_unsafe version which will call the final
function specifying the memory_hotplug_lock does not need to be
acquired. The patch also modified rte_mem_check_dma_mask as a
intermediate step which will call the final function as before,
implying memory_hotplug_lock will be acquired.
PMDs should always use the version acquiring the lock with the
thread_unsafe one being just for internal EAL memory code.
Fixes: 223b7f1d5ef6 ("mem: add function for checking memseg IOVA") Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
If a device reports addressing limitations through a dma mask,
the IOVAs for mapped memory needs to be checked out for ensuring
correct functionality.
Previous patches introduced this DMA check for main memory code
currently being used but other options like legacy memory and the
no hugepages option need to be also considered.
If DMA mask checks shows mapped memory out of the supported range
specified by the DMA mask, nothing can be done but return an error
an report the error. This can imply the app not being executed at
all or precluding dynamic memory allocation once the app is running.
In any case, we can advice the user to force IOVA as PA if currently
IOVA being VA and user being root.
This patch adds the possibility of setting a dma mask to be used
once the memory initialization is done.
This is currently needed when IOVA mode is set by PCI related
code and an x86 IOMMU hardware unit is present. Current code calls
rte_mem_check_dma_mask but it is wrong to do so at that point
because the memory has not been initialized yet.
Ferruh Yigit [Fri, 2 Nov 2018 19:06:06 +0000 (19:06 +0000)]
eal: fix build with gcc 9.0
build error:
In function ‘eal_plugin_add’,
.../lib/librte_eal/common/eal_common_options.c:225:2:
error: ‘strncpy’ output may be truncated copying 4095 bytes from a
string of length 4095 [-Werror=stringop-truncation]
strncpy(solib->name, path, PATH_MAX-1);
strncpy may result a not null-terminated string,
replaced it with strlcpy
Fixes: f9a08f650211 ("eal: add support for shared object drivers") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Ferruh Yigit [Fri, 2 Nov 2018 19:06:05 +0000 (19:06 +0000)]
bus/dpaa: fix build with gcc 9.0
build error:
In function ‘fman_if_init’,
.../drivers/bus/dpaa/base/fman/fman.c:186:2:
error: ‘strncpy’ output may be truncated copying 4095 bytes from a
string of length 4095 [-Werror=stringop-truncation]
strncpy(__if->node_path, dpa_node->full_name, PATH_MAX - 1);
strncpy may result a not null-terminated string,
replaced it with strlcpy
Jerin Jacob [Sat, 3 Nov 2018 14:58:53 +0000 (14:58 +0000)]
crypto/scheduler: fix build with gcc 8.2
build_error:
drivers/crypto/scheduler/scheduler_pmd.c: In function ‘parse_name_arg’:
drivers/crypto/scheduler/scheduler_pmd.c:372:2: error: ‘strncpy’
specified bound 64 equals destination size [-Werror=stringop-truncation]
strncpy(params->name, value, RTE_CRYPTODEV_NAME_MAX_LEN);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
strncpy may result a not null-terminated string,
replaced it with strlcpy
Fixes: 503e9c5afb38 ("crypto/scheduler: register as vdev driver") Cc: stable@dpdk.org Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Jerin Jacob [Fri, 2 Nov 2018 08:11:23 +0000 (08:11 +0000)]
eal: fix error string function
errno_autotest testcase were failed since
commit 5d7b673d5fd6 ("mk: build with _GNU_SOURCE defined by default")
RTE>>errno_autotest
rte_strerror: 'Unknown error 11',
strerror: 'Resource temporarily unavailable'
Test Failed
There are two different version of strerror_t() based on
_GNU_SOURCE definition.
/* XSI-compliant */
int strerror_r(int errnum, char *buf, size_t buflen);
Since the GNU-specific version returns char* the exiting "if"
condition around the strerror_r fails.
Switching back to XSI-compliant version to allow
a) Portable strerror_r() usage as musl c library uses
non GNU speficic version
https://git.musl-libc.org/cgit/musl/tree/src/string/strerror_r.c
b) Based on strerror_r(3) man page, it is possible that GNU-specific
version need not use char *buf to fill error message instead it
can use the immutable static string from the library and return it.
note from strerror_r(3) man page:
The GNU-specific strerror_r() returns a pointer to a string containing
the error message. This may be either a pointer to a string that the
function stores in buf, or a pointer to some (immutable)
static string (in which case buf is unused).
Fixes: 5d7b673d5fd6 ("mk: build with _GNU_SOURCE defined by default") Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
David Hunt [Wed, 31 Oct 2018 11:50:32 +0000 (11:50 +0000)]
examples/vm_power: respect maximum CPUs
The vm_power_manager app was not respecting the POWER_MGR_MAX_CPUS
during initialisation, so if there were more CPUs than this value (64),
it would lead to buffer overruns of there were more then 64 cores in
the system.
Added in a check during init and un-init to only initialise up to
lcore_id 63.
This raises the question as to why not simply increase the value of
POWER_MGR_MAX_CPUS. Well, it's not that simple, as many of the APIs take
a uint64_t as a parameter for the core mask, and this will not work for
cores greater than 63. So some work needs to be done in the future to
remove this limitation. For now we'll fix the memory corruption.
Also, the patch that this fixes says "allow greater than 64 cores" but
that's not across the entire application, it's only for the out-of-band
monitoring. I'll add a notice for an API change in the next release to
clean this up, i.e. depricate any API calls that use masks.
Fixes: 6453b9284b64 ("examples/vm_power: allow greater than 64 cores") Cc: stable@dpdk.org Signed-off-by: David Hunt <david.hunt@intel.com>
Luca Boccassi [Wed, 31 Oct 2018 18:39:45 +0000 (18:39 +0000)]
eal/linux: handle UIO read failure in interrupt handler
If a device is unplugged while an interrupt is pending, the
read call to the uio device to remove it from the poll wait list
can fail resulting in it being continually polled forever. This
change checks for the read failing and if so, unregisters the device
as an interrupt source and causes the wait list to be rebuilt.
This race has been reported and observed in production.
Fixes: 0a45657a6794 ("pci: rework interrupt handling") Cc: stable@dpdk.org Signed-off-by: Brian Russell <brussell@brocade.com> Signed-off-by: Luca Boccassi <bluca@debian.org>
Luca Boccassi [Wed, 31 Oct 2018 18:39:44 +0000 (18:39 +0000)]
net/vmxnet3: fix hot-unplug
The vmxnet3 driver can't call back into dev_close(), and possibly
dev_stop(), in dev_uninit(). When dev_uninit() is called, anything
that those routines would want to clean up has already been released.
Further, for complete cleanup, it is necessary to release any of the
queue resources during dev_close().
This allows a vmxnet3 device to be hot-unplugged without leaking
queues.
Also set RTE_ETH_DEV_CLOSE_REMOVE on close so that the port resources
can be deallocated.
Return EBUSY if remove is called before stop.
Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver implementation") Cc: stable@dpdk.org Signed-off-by: Brian Russell <brussell@brocade.com> Signed-off-by: Luca Boccassi <bluca@debian.org>
Luca Boccassi [Wed, 31 Oct 2018 18:39:43 +0000 (18:39 +0000)]
net/virtio: register/unregister intr handler on start/stop
Register and unregister the virtio interrupt handler when the device is
started and stopped. This allows a virtio device to be hotplugged or
unplugged.
Fixes: c1f86306a026 ("virtio: add new driver") Cc: stable@dpdk.org Signed-off-by: Brian Russell <brussell@brocade.com> Signed-off-by: Luca Boccassi <bluca@debian.org> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thomas Monjalon [Sun, 28 Oct 2018 10:47:47 +0000 (11:47 +0100)]
drivers: remove useless constructor headers
A constructor is usually declared with RTE_INIT* macros.
As it is a static function, no need to declare before its definition.
The macro is used directly in the function definition.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Thomas Monjalon [Wed, 31 Oct 2018 16:28:42 +0000 (17:28 +0100)]
devtools: check wrong svg include in guides
Including svg files with the svg extension is a common mistake:
.. figure:: example.svg
must be
.. figure:: example.*
So it will work also when building pdf doc with figures converted
to png files.
A check is added in checkpatches.sh.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
Qi Zhang [Tue, 30 Oct 2018 15:27:20 +0000 (23:27 +0800)]
bus/pci: fix resource mapping override
When scanning an already plugged device, the virtual address
of mapped PCI resource in rte_pci_device will be overridden
with 0, that may cause driver does not work correctly.
The fix is not to update any rte_pci_device's field if the being
scanned device's driver is already probed.
Unit testcases are added for metrics library
Added metrics unit test to autotest list
Updated meson build file
Updated MAINTAINERSHIP for metrics unit test
Divided main test to smaller logical tests.
Registered with UT framework.
Added cleanup of the resources else ring creation fails
during consecutive test runs.
Freed the allocated mempool, rings and uninitalized the drivers.
Anatoly Burakov [Thu, 31 May 2018 16:14:02 +0000 (17:14 +0100)]
test: clean up on exit
The test application didn't call rte_eal_cleanup() on exit, which
caused leftover hugepages and memory leaks when running secondary
processes. Fix this by calling rte_eal_cleanup() on exit.
Vipin Varghese [Fri, 12 Oct 2018 13:14:03 +0000 (18:44 +0530)]
examples/service_cores: check cores before run
The service core samples has varied profiles created to run on specified
lcore count. The patch adds the check before each run, to ensure
example has sufficent lcores to be added as service cores on given run
profile. If sufficent cores are not found, the run is skipped with user
notification.
Signed-off-by: Vipin Varghese <vipin.varghese@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
This example has not been enable for receiving multicast
packet, so it will drop multicast packet. Users must send packet
with ether MAC destination address the same as pf port MAC address,
in order to forward packet successfully, but this is an example
for forwarding ipv4 multicastpacket. So calling function
rte_eth_promiscuous_enable() or rte_eth_allmulticast_enable() can
enable promiscuous mode of all multicast packet. And also, DPDK has
rte API function of rte_eth_dev_set_mc_addr_list() for setting
specific multicast filter table for specific multicast IP address,
but this example do not support this configuration, so it need to
be enable multicast promiscuous mode instead.
Signed-off-by: Wei Zhao <wei.zhao1@intel.com> Tested-by: Dong Wang <dong1.wang@intel.com>
Darek Stojaczyk [Fri, 26 Oct 2018 07:54:59 +0000 (09:54 +0200)]
bus/pci: propagate probing error codes
In a couple of places we check its error code against -EEXIST,
but this function returned either -1, 0, or 1.
This gets critical when hotplugging a device in secondary
process, while the same device is already plugged in the
primary. Failing to "hotplug" it in the primary will cause
the secondary to fail as well.
Fixes: e9d159c3d534 ("eal: allow probing a device again") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Darek Stojaczyk [Wed, 24 Oct 2018 10:11:31 +0000 (12:11 +0200)]
vfio: fix interrupt unregister for hotplug notifier
This function is documented to return the number of unregistered
callbacks or negative numbers on error, but pci_vfio checks for
ret != 0 to detect failures. Not anymore.
Darek Stojaczyk [Wed, 3 Oct 2018 12:39:25 +0000 (14:39 +0200)]
vfio: share default container in multi-process
So far each process in MP used to have a separate container
and relied on the primary process to register all memsegs.
Mapping external memory via rte_vfio_container_dma_map()
in secondary processes was broken, because the default
(process-local) container had no groups bound. There was
even no way to bind any groups to it, because the container
fd was deeply encapsulated within EAL.
This patch introduces a new SOCKET_REQ_DEFAULT_CONTAINER
message type for MP synchronization, makes all processes
within a MP party use a single default container, and hence
fixes rte_vfio_container_dma_map() for secondary processes.
From what I checked this behavior was always the same, but
started to be invalid/insufficient once mapping external
memory was allowed.
While here, fix up the comment on rte_vfio_get_container_fd().
This function always opens a new container, never reuses
an old one.
Fixes: 73a639085938 ("vfio: allow to map other memory regions") Cc: stable@dpdk.org Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Alejandro Lucero [Thu, 25 Oct 2018 10:49:28 +0000 (11:49 +0100)]
bus/pci: compare kernel driver instead of interrupt handler
Invoking the right pci read/write functions is based on interrupt
handler type. However, this is not configured for secondary processes
precluding to use those functions.
This patch fixes the issue using the driver name the device is bound
to instead.
Fixes: 632b2d1deeed ("eal: provide functions to access PCI config") Cc: stable@dpdk.org Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Brian Russell [Tue, 28 Aug 2018 10:12:40 +0000 (11:12 +0100)]
net/virtio: fix PCI config error handling
In virtio_read_caps and vtpci_msix_detect, rte_pci_read_config returns
the number of bytes read from PCI config or < 0 on error.
If less than the expected number of bytes are read then log the
failure and return rather than carrying on with garbage.
Fixes: 6ba1f63b5ab0 ("virtio: support specification 1.0") Cc: stable@dpdk.org Signed-off-by: Brian Russell <brussell@brocade.com> Signed-off-by: Luca Boccassi <bluca@debian.org> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Luca Boccassi [Tue, 28 Aug 2018 10:12:39 +0000 (11:12 +0100)]
bus/pci: harmonize return value of config read
On Linux, rte_pci_read_config on success returns the number of read
bytes, but on BSD it returns 0.
Document the return values, and have BSD behave as Linux does.
At least one case (bnx2x PMD) treats 0 as an error, so the change
makes sense also for that.
Signed-off-by: Luca Boccassi <bluca@debian.org> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Eric Zhang [Wed, 3 Oct 2018 20:53:13 +0000 (16:53 -0400)]
eal: force IOVA to a particular mode
This patch uses EAL option "--iova-mode" to force the IOVA mode to a
particular value. There exists virtual devices that are not directly
attached to the PCI bus, and therefore the auto detection of the IOVA
mode based on probing the PCI bus and IOMMU configuration may not
report the required addressing mode. Using the EAL option permits the
mode to be explicitly configured in this scenario.
Signed-off-by: Eric Zhang <eric.zhang@windriver.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Marko Kovacevic <marko.kovacevic@intel.com>
Commit 73a639085938 ("vfio: allow to map other memory regions")
introduced a bug in sPAPR IOMMU mapping. The commit removed necessary
ioctl with VFIO_IOMMU_SPAPR_REGISTER_MEMORY. Also, vfio_spapr_map_walk
should call vfio_spapr_dma_do_map instead of vfio_spapr_dma_mem_map.
Fixes: 73a639085938 ("vfio: allow to map other memory regions") Cc: stable@dpdk.org Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
NFP devices can not handle DMA addresses requiring more than
40 bits. This patch uses rte_dev_check_dma_mask with 40 bits
and avoids device initialization if memory out of NFP range.
bus/pci: use IOVA DMA mask check when setting IOVA mode
Currently the code precludes IOVA mode if IOMMU hardware reports
less addressing bits than necessary for full virtual memory range.
Although VT-d emulation currently only supports 39 bits, it could
be iovas for allocated memlory being within that supported range.
This patch allows IOVA mode in such a case adding a call to
rte_eal_check_dma_mask using the reported addressing bits by the
IOMMU hardware.
Indeed, memory initialization code has been modified for using lower
virtual addresses than those used by the kernel for 64 bits processes
by default, and therefore memsegs iovas can use 39 bits or less for
most systems. And this is likely 100% true for VMs.
bus/pci: check IOMMU addressing limitation just once
Current code checks if IOMMU hardware reports enough addressing
bits for using IOVA mode but it repeats the same check for any
PCI device present. This is not necessary because the IOMMU hardware
is the same for all of them.
This patch only checks the IOMMU using first PCI device found.
Linux kernel uses a really high address as starting address for
serving mmaps calls. If there exist addressing limitations and
IOVA mode is VA, this starting address is likely too high for
those devices. However, it is possible to use a lower address in
the process virtual address space as with 64 bits there is a lot
of available space.
This patch adds an address hint as starting address for 64 bits
systems and increments the hint for next invocations. If the mmap
call does not use the hint address, repeat the mmap call using
the hint address incremented by page size.