git.droids-corp.org - dpdk.git/log

pdump: check missing home environment variable

inside pdump_get_socket_path(), getenv can return
a NULL pointer if the match for SOCKET_PATH_HOME is
not found in the environment. NULL check is added to
return -1 immediately. Since pdump_get_socket_path()
returns -1 now, wherever this function is called
there the return value is checked and error message
is logged.

Coverity issue: 127344, 127347

Fixes: 278f945402c5 ("pdump: add new library for packet capture")

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>

pdump: fix default socket path

SOCKET_PATH_HOME is to specify environment variable "HOME",
so it should not contain "/pdump_sockets" in the macro.
So removed "/pdump_sockets" from SOCKET_PATH_HOME and
SOCKET_PATH_VAR_RUN. New changes will create pdump sockets under
/var/run/.dpdk/pdump_sockets for root users and
under HOME/.dpdk/pdump_sockets for non root users.
Changes are done in pdump_get_socket_path() to accommodate
new socket path changes.

Fixes: 278f945402c5 ("pdump: add new library for packet capture")

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>

app/test: avoid freeing mbufs twice in qat test

Test_multi_session was freeing mbufs used in the multiple sessions
created and setting obuf to NULL after it, but ibuf was not being
set to NULL, and therefore, it was being freed again (ibuf and obuf
are pointing at the same address), in the ut_teardown() function.

Fixes: 1b9cb73ecef1 ("app/test: fix qat autotest failure")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

app/test: fix PCI class probing

The PCI test was failing because some fake devices had no PCI class.

Fixes: 1dbba1650c89 ("app/test: remove real PCI ids")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>

app/test: avoid freeing mbuf twice

In cryptodev tests, when input and output buffers were the same,
the mbuf was being freed twice, causing refcnt_atomic to be negative.

Fixes: 202d375c60bc ("app/test: add cryptodev unit and performance tests")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

app/test: fix build with icc

Icc complains about variable may be used without setting.

Fixes: 97fe6461c7cbfb ("app/test: add SNOW 3G performance test")

Signed-off-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
Acked-by: John Griffin <john.griffin@intel.com>

port: fix build without KNI

Commit 9fc37d1c071c is missing a conditional in the dependencies,
causing builds to fail when KNI is not enabled:
    == Build lib/librte_port
      LD librte_port.so.3
    /usr/bin/ld: cannot find -lrte_kni
    collect2: error: ld returned 1 exit status

Fixes: 9fc37d1c071c ("port: support KNI")

Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

kni: fix build with gcc 6.1

Using gcc 6.1, in some cases, kni fails to compile
because of unused variables:

lib/librte_eal/linuxapp/kni/ixgbe_main.c:82:19:
error: ‘ixgbe_copyright’
defined but not used [-Werror=unused-const-variable=]

lib/librte_eal/linuxapp/kni/ixgbe_main.c:62:19:
error: ‘ixgbe_driver_string’
defined but not used [-Werror=unused-const-variable=]

Fixes: 3fc5ca2f6352 ("kni: initial import")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

mk: fix parallel build of test resources

The build was failing sometimes when building with multiple
parallel jobs:
    # rm build/build/app/test/*res*
    # make -j6
    objcopy: 'resource.tmp': No such file

The reason is that each resource was built from the same temporary file.
The failure is seen because of a race condition when removing the
temporary file after each resource creation.
It also means that some resources may be created from the wrong source.

The fix is to have a different input file for each resource.
The source file is not directly used because it may have a long path
which is used by objcopy to name the symbols after some transformations.
When linking a tar resource, the input file is already in the current
directory. The hard case is for simply linked resources.
The trick is to create a symbolic link of the source file if it is not
already in the current build directory.
Then there is a replacement of dot by an underscore to predict the
symbol names computed by objcopy which must be redefined.

There is an additional change for the test_resource_c which is both
a real source file and a test resource. An intermediate file
test_resource.res is created to avoid compiling resource.c from the
wrong directory through a symbolic link.

Fixes: 1e9e0a6270 ("app/test: fix resource creation with objcopy on FreeBSD")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

hash: add scalable multi-writer insertion with Intel TSX

This patch introduced scalable multi-writer Cuckoo Hash insertion
based on a split Cuckoo Search and Move operation using Intel
TSX. It can do scalable hash insertion with 22 cores with little
performance loss and negligible TSX abortion rate.

* Added an extra rte_hash flag definition to switch default single writer
  Cuckoo Hash behavior to multiwriter.
    - If HTM is available, it would use hardware feature for concurrency.
    - If HTM is not available, it would fall back to spinlock.

* Created a rte_cuckoo_hash_x86.h file to hold all x86-arch related
  cuckoo_hash functions. And rte_cuckoo_hash.c uses compile time flag to
  select x86 file or other platform-specific implementations. While HTM check
  is still done at runtime (same idea with
  RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT)

* Moved rte_hash private struct definitions to rte_cuckoo_hash.h, to allow
  rte_cuckoo_hash_x86.h or future platform dependent functions to include.

* Following new functions are created for consistent names when new platform
  TM support are added.
    - rte_hash_cuckoo_move_insert_mw_tm: do insertion with bucket movement.
    - rte_hash_cuckoo_insert_mw_tm: do insertion without bucket movement.

* One extra multi-writer test case is added.

Signed-off-by: Wei Shen <wei1.shen@intel.com>
Signed-off-by: Sameh Gobriel <sameh.gobriel@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

mbuf: fix dump format

Do not add 0x when using %p in format strings to avoid dump messages
with double 0x0x, e.g.,

  dump mbuf at 0x0x7fac7b17c800, phys=17b17c880, buf_len=2176
    pkt_len=2064, ol_flags=0, nb_segs=1, in_port=255
    segment at 0x0x7fac7b17c800, data=0x0x7fac7b17c8f0, data_len=2064

Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>

mbuf: use default mempool handler from config

By default, the mempool ops used for mbuf allocations is a multi
producer and multi consumer ring. We could imagine a target (maybe some
network processors?) that provides an hardware-assisted pool
mechanism. In this case, the default configuration for this architecture
would contain a different value for RTE_MBUF_DEFAULT_MEMPOOL_OPS.

Signed-off-by: David Hunt <david.hunt@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

app/test: add mempool handler

Create a minimal custom mempool handler and check that it
passes basic mempool autotests.

Signed-off-by: David Hunt <david.hunt@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

mempool: support handler operations

Until now, the objects stored in a mempool were internally stored in a
ring. This patch introduces the possibility to register external handlers
replacing the ring.

The default behavior remains unchanged, but calling the new function
rte_mempool_set_ops_byname() right after rte_mempool_create_empty() allows
the user to change the handler that will be used when populating
the mempool.

This patch also adds a set of default ops (function callbacks) based
on rte_ring.

Signed-off-by: David Hunt <david.hunt@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

net/virtio-user: fix 32-bit build

The compilation for 32-bit fails when CONFIG_RTE_VIRTIO_USER is enabled:

  drivers/net/virtio/virtio_user_ethdev.c:84:47:
    error: format ‘%llu’ expects argument of type ‘long long unsigned int’,
    but argument 5 has type ‘size_t {aka unsigned int}’

Fixes: e9efa4d93821 ("net/virtio-user: add new virtual PCI driver")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>

net/i40e: support NSH packet type

NSH packet can be recognized by Intel X710/XL710 series.
This patch enables the new packet type.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Tested-by: Yulong Pei <yulong.pei@intel.com>
Acked-by: Zhe Tao <zhe.tao@intel.com>

mbuf: add NSH packet type

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Zhe Tao <zhe.tao@intel.com>

ethdev: fix doxygen formatting

This commit fixes some functions missing in API documentation.

Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>

ethdev: align device structure with cache line

Elements of struct rte_eth_dev used in the fast path.
Make struct rte_eth_dev cache aligned to avoid the cases where
rte_eth_dev elements share the same cache line with other structures.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

ethdev: add RSS RETA size constant 256

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

ethdev: add tunnel and port RSS offload types

- added VXLAN, GENEVE and NVGRE tunnel flow types
- added PORT flow type for accounting physical/virtual
port or channel number in flow creation

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

net/virtio: fix used index retrieved only once

In the following loop:
    while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
            ...
    }
There is no external function call or any explict memory barrier
in the loop, the re-read of used->idx might be optimized and only
be retrieved once.

Use of voaltile normally should be prohibited, and access_once
is Linux kernel's style to handle this issue; Once we have that
macro in DPDK, we could change to that style.

virtio_recv_mergable_pkts might also have the same issue, so fix
it as well.

Fixes: 823ad647950a ("virtio: support multiple queues")
Fixes: 13ce5e7eb94f ("virtio: mergeable buffers")

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio: fix crash on querying xstats

Trying to access xstats_names after "if (xstats_names == NULL)" is
obviously wrong, which would result to a crash while running "show
port xstats 0" in testpmd with virtio PMD.

The fix is straightforward; just reverse the check.

Fixes: baf91c395b18 ("net/virtio: fetch extended statistics with integer ids")

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

vhost: check hugepage fstat error

Value returned from fstat is not checked for errors before being used.
This patch fixes following coverity issue.

    static uint64_t
    get_blk_size(int fd)
    {
     struct stat stat;

     fstat(fd, &stat);
     return (uint64_t)stat.st_blksize;
    >>>  CID 107103 (#1 of 1): Unchecked return value from library
         (CHECKED_RETURN)
    >>>  check_return: Calling fstat(fd, &stat) without checking
         return value.
    >>>  This library function may fail and return an error code.

Fixes: 8f972312b8f4 ("vhost: support vhost-user")

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

vhost: unmap log memory on cleanup

Fixes memory leak on QEMU migration.

Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

vhost: fix leak of file descriptors

While migration of vhost-user device QEMU allocates memfd
to store information about dirty pages and sends fd to
vhost-user process.

File descriptor for this memory should be closed to prevent
"Too many open files" error for vhost-user process after
some amount of migrations.

Ex.:
# ls /proc/<ovs-vswitchd pid>/fd/ -alh
total 0
root qemu  .
root qemu  ..
root qemu  0 -> /dev/pts/0
root qemu  1 -> pipe:[1804353]
root qemu  10 -> socket:[1782240]
root qemu  100 -> /memfd:vhost-log (deleted)
root qemu  1000 -> /memfd:vhost-log (deleted)
root qemu  1001 -> /memfd:vhost-log (deleted)
root qemu  1004 -> /memfd:vhost-log (deleted)
[...]
root qemu  996 -> /memfd:vhost-log (deleted)
root qemu  997 -> /memfd:vhost-log (deleted)

ovs-vswitchd.log:
|WARN|punix:ovs-vswitchd.ctl: accept failed: Too many open files

Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio-user: handle control queue in driver

In virtio-user driver, when notify ctrl-queue, invoke API of
virtio-user device emulation to handle ctrl-q command.

Besides, multi-queue requires ctrl-queue and ctrl-queue will be
enabled automatically when multi-queue is specified.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio-user: add multiple queues in device emulation

The main purpose of this patch is to enable multi-queue. But
multi-queue requires ctrl-queue so that driver can send how many
queues will be enabled through ctrl-queue messages.

So we partially implement ctrl-queue to handle control command
with class of VIRTIO_NET_CTRL_MQ and with cmd of
VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET to handle mq support. This patch
provides a function, virtio_user_handle_cq(), for driver to handle
ctrl-queue messages.

Besides, multi-queue requires VIRTIO_NET_F_MQ and VIRTIO_NET_F_CTRL_VQ
are enabled when we do feature negotiation.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio-user: add multiple queues in vhost-user adapter

This patch mainly adds method in vhost user adapter to communicate
enable/disable queues messages with vhost user backend, aka,
VHOST_USER_SET_VRING_ENABLE.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio-user: add virtual device

Add a new virtual device named virtio-user, which can be used just like
eth_ring, eth_null, etc. To reuse the code of original virtio, we do
some adjustment in virtio_ethdev.c, such as remove key _static_ of
eth_virtio_dev_init() so that it can be reused in virtual device; and
we add some check to make sure it will not crash.

Configured parameters include:
  - queues (optional, 1 by default), number of queue pairs, multi-queue
    not supported for now.
  - cq (optional, 0 by default), not supported for now.
  - mac (optional), random value will be given if not specified.
  - queue_size (optional, 256 by default), size of virtqueues.
  - path (madatory), path of vhost user.

When enable CONFIG_RTE_VIRTIO_USER (enabled by default), the compiled
library can be used in both VM and container environment.

Examples:
path_vhost=<path_to_vhost_user> # use vhost-user as a backend

sudo ./examples/l2fwd/build/l2fwd -c 0x100000 -n 4 \
    --socket-mem 0,1024 --no-pci --file-prefix=l2fwd \
    --vdev=virtio-user0,mac=00:01:02:03:04:05,path=$path_vhost -- -p 0x1

Known issues:
- Control queue and multi-queue are not supported yet.
- Cannot work with --huge-unlink.
- Cannot work with no-huge.
- Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8)
   hugepages.
- Root privilege is a must (mainly becase of sorting hugepages according
   to physical address).
- Applications should not use file name like HUGEFILE_FMT ("%smap_%d").
- Cannot work with vhost-net backend.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio-user: add new virtual PCI driver

This patch implements another new instance of struct virtio_pci_ops to
drive the virtio-user virtual device. Instead of rd/wr ioport or PCI
configuration space, this virtual pci driver will rd/wr the virtual
device struct virtio_user_hw, and when necessary, invokes APIs provided
by device emulation later to start/stop the device.

  ----------------------
  | ------------------ |
  | | virtio driver  | |----> (virtio_user_ethdev.c)
  | ------------------ |
  |         |          |
  | ------------------ | ------>  virtio-user PMD
  | | device emulate | |
  | |                | |
  | | vhost adapter  | |
  | ------------------ |
  ----------------------
            |
            |
            |
   ------------------
   | vhost backend  |
   ------------------

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio-user: add device emulation layer

Few device emulation layer functions are added for virtio driver to
call:
  - virtio_user_start_device()
  - virtio_user_stop_device()
  - virtio_user_dev_init()
  - virtio_user_dev_uninit()

These functions will get called by virtio driver, and they call vhost
adapter layer functions to implement the functionality.

All stats related to virtual user device as logged in virtio_user_dev
structure.

  ----------------------
  | ------------------ |
  | | virtio driver  | |
  | ------------------ |
  |         |          |
  | ------------------ | ------>  virtio-user PMD
  | | device emulate |-|----> (virtio_user_dev.c, virtio_user_dev.h)
  | |                | |
  | | vhost adapter  | |
  | ------------------ |
  ----------------------
            |
            |
            |
   ------------------
   | vhost backend  |
   ------------------

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio-user: add vhost-user adapter layer

This patch provides vhost adapter layer implementation. Two main
help functions are provided to upper layer (device emulation):
  - vhost_user_setup(), to set up vhost user backend;
  - vhost_user_sock(), to talk with vhost user backend.

  ----------------------
  | ------------------ |
  | | virtio driver  | |
  | ------------------ |
  |         |          |
  | ------------------ | ------>  virtio-user PMD
  | | device emulate | |
  | |                | |
  | | vhost adapter  |-|----> (vhost_user.c)
  | ------------------ |
  ----------------------
            |
            | -------------- --> (vhost-user protocol)
            |
   ------------------
   | vhost backend  |
   ------------------

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio: allow virtual address to fill vring descriptors

This patch is related to how to calculate relative address for vhost
backend.

The principle is that: based on one or multiple shared memory regions,
vhost maintains a reference system with the frontend start address,
backend start address, and length for each segment, so that each
frontend address (GPA, Guest Physical Address) can be translated into
vhost-recognizable backend address. To make the address translation
efficient, we need to maintain as few regions as possible. In the case
of VM, GPA is always locally continuous. But for some other case, like
virtio-user, GPA continuous is not guaranteed, therefore, we use virtual
address here.

It basically means:
  a. when set_base_addr, VA address is used;
  b. when preparing RX's descriptors, VA address is used;
  c. when transmitting packets, VA is filled in TX's descriptors;
  d. in TX and CQ's header, VA is used.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio: hide vring address check inside PCI ops

This patch moves phys addr check from virtio_dev_queue_setup
to pci ops. To make that happen, make sure virtio_ops.setup_queue
return the result if we pass through the check.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

vhost: fix null pointer dereference

Return value of function get_device() is not checking before
dereference. Fix this problem by adding checking condition.

Coverity issue: 119262

Fixes: 77d20126b4c2 ("vhost-user: handle message to enable vring")

Signed-off-by: Marcin Kerlin <marcinx.kerlin@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

vhost: remove concurrent enqueue

All other DPDK PMDs doesn't support concurrent receiving or sending
packets to the same queue. The upper application should deal with
this, normally through queue and core bindings.

Due to historical reason, vhost internally supports concurrent lockless
enqueuing packets to the same virtio queue through costly cmpset operation.
This patch removes this internal lockless implementation and should improve
performance a bit.

Luckily DPDK OVS doesn't rely on this behavior.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio: fix crash when no devargs

We skip kernel managed virtio devices, if it isn't whitelisted.
Before checking if the virtio device is whitelisted, check if devargs
is specified.

Fixes: ac5e1d838dc1 ("virtio: skip error when probing kernel managed device")

Reported-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

vhost: arrange struct fields for better cache sharing

The ifname[] field takes so much space, that it seperates some frequently
used fields into different caches, say, features and broadcast_rarp.

This patch moves all those fields that will be accessed frequently in Rx/Tx
together (before the ifname[] field) to let them share one cache line.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>

vhost: optimize dequeue for small packets

A virtio driver normally uses at least 2 desc buffers for Tx: the
first for storing the header, and the others for storing the data.

Therefore, we could fetch the first data desc buf before the main
loop, and do the copy first before the check of "are we done yet?".
This could save one check for small packets that just have one data
desc buffer and need one mbuf to store it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>

vhost: pre update used ring for Tx and Rx

Pre update and update used ring in batch for Tx and Rx at the stage
while fetching all avail desc idx. This would reduce some cache misses
and hence, increase the performance a bit.

Pre update would be feasible as guest driver will not start processing
those entries as far as we don't update "used->idx". (I'm not 100%
certain I don't miss anything, though).

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>

net/vhost: add client option

Add client option to vhost pmd, to let it act as the vhost-user client.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

examples/vhost: add client option

Add --client option to let vhost-switch acts as the client.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

vhost: workaround stale vring base

When DPDK app crashes (or quits, or gets killed), a restart of DPDK
app would get stale vring base from QEMU. That would break the kernel
virtio net completely, making it non-work any more, unless a driver
reset is done.

So, instead of getting the stale vring base from QEMU, Huawei suggested
we could get a much saner (and may not the most accurate) vring base
from used->idx. That would work because:

- there is a memory barrier between updating used ring entries and
  used->idx. So, even though we crashed at updating the used ring
  entries, it will not cause any issue, as the guest driver will not
  process those stale used entries, for used-idx is not updated yet.

- DPDK process vring in order, that means a crash may just lead some
  packet retransmission for Tx and drop for Rx.

Suggested-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>

vhost: add reconnect ability

Allow reconnecting on failure by default when:

- DPDK app starts first and QEMU (as the server) is not started yet.
  Without reconnecting, DPDK app would simply fail on vhost-user
  registration.

- QEMU restarts, say due to OS reboot.
  Without reconnecting, you can't re-establish the connection without
  restarting DPDK app.

This patch make it work well for both above cases. It simply creates
a new thread, and keep trying calling "connect()", until it succeeds.

The reconnect could be disabled when RTE_VHOST_USER_NO_RECONNECT flag
is set.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

vhost: add vhost-user client mode

Add a new paramter (flags) to rte_vhost_driver_register(). DPDK
vhost-user acts as client mode when RTE_VHOST_USER_CLIENT flag
is set. The flags would also allow future extensions without
breaking the API (again).

The rest is straingfoward then: allocate a unix socket, and
bind/listen for server, connect for client.

This extension is for vhost-user only, therefore we simply quit
and report error when any flags are given for vhost-cuse.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

vhost: rename structs for enabling client mode

DPDK vhost-user just acts as server so far, so, using a struct named
as "vhost_server" is okay. However, if we add client mode, it doesn't
make sense any more. Here renames it to "vhost_user_socket".

There was no obvious wrong about "connfd_ctx", but I think it's obviously
better to rename it to "vhost_user_connection", as it does represent
a connection, a connection between the backend (DPDK) and the frontend
(QEMU).

Similarly, few more renames are taken, such as "vserver_new_vq_conn"
to "vhost_user_new_connection".

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

vhost: make buffer vector for scatter Rx local

Array of buf_vector's is just an array for temporary storing information
about available descriptors. It used only locally in virtio_dev_merge_rx()
and there is no reason for that array to be shared.

Fix that by allocating local buf_vec inside virtio_dev_merge_rx().

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: make virtio header length per device

Virtio net header length is set per device, but not per queue. So, there
is no reason to store it in vhost_virtqueue struct, instead, we should
store it in virtio_net struct, to make one copy only.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: reserve few more space for future extension

"virtio_net_device_ops" is the only left open struct that an application
can access, therefore, it's the only place that might introduce potential
ABI break in future for extension.

So, do some reservation for it. 5 should be pretty enough, considering
that we have barely touched it for a long while. Another reason to
choose 5 is for cache alignment: 5 makes the struct 64 bytes for 64 bit
machine.

With this, it's confidence to say that we might be able to be free from
the ABI violation forever.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: remove virtio-net.h

It barely has anything useful there, just 2 functions prototype. Here
move them to vhost-net.h, and delete it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: remove unnecessary fields

The "reserved" field in virtio_net and vhost_virtqueue struct is not
necessary any more. We now expose virtio_net device with a number "vid".

This patch also removes the "priv" field: all fields are priviate now:
application can't access it now. The only way that we could still access
it is to expose it by a function, but I doubt that's needed or worthwhile.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: hide internal code

We are now safe to move all those internal structs/macros/functions to
vhost-net.h, to hide them from external access.

This patch also breaks long lines and removes some redundant comments.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: export device id as the interface to applications

With all the previous prepare works, we are just one step away from
the final ABI refactoring. That is, to change current API to let them
stick to vid instead of the old virtio_net dev.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: remove dependency on device private field

This change could let us avoid the dependency of "virtio_net"
struct, to prepare for the ABI refactoring.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: export queue free entries

The new API rte_vhost_avail_entries() is actually a rename of
rte_vring_available_entries(), with the "vring" to "vhost" name
change to keep the consistency of other vhost exported APIs.

This change could let us avoid the dependency of "virtio_net"
struct, to prepare for the ABI refactoring.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: export interface name

Introduce a new API rte_vhost_get_ifname() to export the ifname to
application.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: export number of queues

Introduce a new API rte_vhost_get_queue_num() to export the number of
queues.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: export numa node

Introduce a new API rte_vhost_get_numa_node() to get the numa node
from which the virtio_net struct is allocated.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: move cuse only struct to cuse

vhost cuse is now the last reference of the vhost_device_ctx struct;
move it there, and do a rename to "vhost_cuse_device_ctx", to make
it clear that it's "cuse only".

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: get device by device id only

get_device() just needs vid, so pass vid as the parameter only.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: rename device id variable

I failed to figure out what does "fh" mean here for a long while.
The only guess I could have had is "file handle". So, you get the
point that it's not well named.

I then figured it out that "fh" is derived from the fuse lib, and
my above guess is right. However, device_fh represents a virtio
net device ID. Therefore, here I rename it to vid (Virtio-net device
ID, or Vhost device ID; choose one you prefer) to make it easier for
understanding.

This name (vid) then will be considered to the only interface to
applications. That's another reason to do the rename: it's our
interface, make it more understandable.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

examples/vhost: make a copy of virtio device id

Make a copy of virtio device id (device_fh) from the virtio_net struct,
so that we could have less dependency on the virtio_net struct.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: declare device id as int

device_fh repsents the device id for a specific virtio net device.
Firstly, "int" would be big enough: we don't need 64 bit. Secondly,
this could let us avoid the ugly "%" PRIu64 ".." stuff.

And since ctx.fh is derived from device_fh, declare it as int, too.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: set/reset device flags internally

It does not make sense to ask the application to set/unset the flag
VIRTIO_DEV_RUNNING (that used internal only) at new_device()/
destroy_device() callback.

Instead, it should be set after new_device() succeeds and reset before
destroy_device() is invoked inside vhost lib. This patch fixes it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

vhost: declare backend with int type

It's an fd; so define it as "int", which could also save the unncessary
(int) case.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>

examples/vhost: fix corrupted vdev tailq list

There are two tailq lists, one for logging all vhost devices, another
one for logging vhost devices distributed on a specific core. However,
there is just one tailq entry, named "next", to chain the two list,
which is wrong and could result to a corrupted tailq list, that the
tailq list might always be non-empty: the entry is still there even
after you have invoked TAILQ_REMOVE several times.

Fix it by introducing two tailq entries, one for each list.

Fixes: 45657a5c6861 ("examples/vhost: use tailq to link vhost devices")

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio: split Rx/Tx queue

We keep a common vq structure, containing only vq related fields,
and then split others into RX, TX and control queue respectively.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
[Jianfeng Tan: found and fixed 2 bugs]
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

net/virtio: check mbuf is direct when using any layout

The commit dd856dfcb9e74 introduced an optimization that prepends virtio
header to mbuf data. It can be used when the tx mbuf is writeable, so we
need to check that the mbuf is direct (i.e. it embeds its own data).

Fixes: dd856dfcb9e7 ("virtio: use any layout on Tx")

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

table: fix next hop table entry type

Change type of nht field from uint32_t to uint8_t and increase max of
next hops.

nht_entry and nht should be declared as uint8_t because
entry_size is in bytes and is given as a parameter to compute
the position in nht array.

Fixes: dc81ebbacaeb ("lpm: extend IPv4 next hop field")

Signed-off-by: Michal Kobylinski <michalx.kobylinski@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

examples/ip_pipeline: support KNI

1. add KNI support to the IP Pipeline sample Application
2. some bug fix
3. update doc
4. add config file with two KNI interfaces connected using
a Linux kernel bridge

Signed-off-by: WeiJie Zhuang <zhuangwj@gmail.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

port: support KNI

add KNI port type to the packet framework

Signed-off-by: WeiJie Zhuang <zhuangwj@gmail.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

kni: fix build with gcc 6

This commit fixes build errors triggered due misleading indentation.

Fixes: b9ee370557f1 ("kni: update kernel driver ethtool baseline")
Fixes: 3fc5ca2f6352 ("kni: initial import")

Signed-off-by: Anupam Kapoor <anupam.kapoor@gmail.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

igb_uio: fix build on CentOS 6.8

Following compile error observed with CentOS 6.8, which uses kernel
kernel-devel-2.6.32-642.el6.x86_64:

In function 'igbuio_msix_mask_irq':
error: 'PCI_MSIX_ENTRY_CTRL_MASKBIT' undeclared

Reported-by: Thiago Martins <thiagocmartinsc@gmail.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>

examples/l2fwd-keepalive: fix memory leak

Fixes memory leaks detected by Coverity. These are due to ephemeral
memory allocations not being freed when errors occur.

Coverity issue: 127349

Fixes: e2aae1c1ced9 ("ethdev: remove name from extended statistic fetch")

Signed-off-by: Remy Horton <remy.horton@intel.com>

app/testpmd: fix memory leaks after xstats errors

Fixes memory leaks detected by Coverity. These are due to ephemeral
memory allocations not being freed when errors occur.

Coverity issue: 127348

Fixes: e2aae1c1ced9 ("ethdev: remove name from extended statistic fetch")

Signed-off-by: Remy Horton <remy.horton@intel.com>

qat: fix probing

The class id is not filled and makes probing to fail.
Updated the code to use RTE_PCI_DEVICE which fills
the class id with a wildcard value.

Fixes: 701c8d80c820 ("pci: support class id probing")

Signed-off-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

doc: update IPsec sample guide

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

examples/ipsec-secgw: support transport mode

IPSec transport mode support.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

examples/ipsec-secgw: support IPv6

Support IPSec IPv6 allowing IPv4/IPv6 traffic in IPv4 or IPv6 tunnel.

We need separate Routing (LPM) and SP (ACL) tables for IPv4 and IPv6,
but a common SA table.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

examples/ipsec-secgw: rename SP config

Modify the default SP config variables names to be consistent with SA.

The resulting naming convention is that variables with suffixes _out/_in
are the default for ep0 and the reverse for ep1.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

examples/ipsec-secgw: fix no SA found case

The application only ASSERTS that an SA is not NULL (only when debugging
is enabled) without properly dealing with the case of not having an SA
for the processed packet.

Behavior should be such as if no SA is found, drop the packet.

Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

examples/ipsec-secgw: rework processing loop

Rework implementation moving from function pointers approach, where each
function implements very specific functionality, to a generic function
approach.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

examples/ipsec-secgw: add debug build option

Add support for building the application with DEBUG=1.
This option adds the compiler stack protection flag and enables extra
output in the application.

Also remove unnecessary VPATH setup.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

examples/ipsec-secgw: fix stack smashing

Building the application with -O3 and -fstack-protection (default in
Ubuntu) results in the following error:

*** stack smashing detected ***: ./build/ipsec-secgw terminated

The error is caused by storing an 8B value in a 4B variable.

Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

examples/ipsec-secgw: fix esp padding check

Current code fails to correctly check padding sequence for inbound
packets.
Padding sequence starts on 1 but it checks for 0.

Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

app/test: refactor SNOW 3G and KASUMI tests

SNOW3G and KASUMI unit tests are very similar and
they were using duplicated code, so this commit
refactor and remove some of the duplicated functions.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

app/test: add SNOW 3G UEA2 with offset

With the new libsso library, buffers can be encrypted/decrypted,
providing an offset in bits, so an extra unit test has been
added to cover this case.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

app/test: add bit-level SNOW 3G UIA2

Snow3G PMD supports now buffers that are non byte multiple,
so tests to cover this case have been added.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

app/test: add out-of-place cases for SNOW 3G

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

app/test: fix buffer lengths for SNOW 3G

No padding was added in the input buffers for snow3G tests,
due to a wrong calculation of the length (should be multiple
of the block size). This fix takes into account the case
where the length is not byte multiple.

Fixes: 8bdf665fe6c0 ("app/test: add SNOW 3G")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

app/test: use new bit-level memcmp macro

Instead of modifying the content of the buffers, to compare them
at bit-level, use the new macro TEST_ASSERT_BUFFERS_ARE_EQUAL_BIT,
which does not make any modifications in the buffers.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

crypto/snow3g: add missing feature flags

The underlying libsso library support SSE4.1 instruction set,
so feature flags of the crypto device must be updated
to reflect this.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

crypto/snow3g: support bit-level operations

Underlying libsso_snow3g library now supports bit-level
operations, so PMD has been updated to allow them.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

crypto/snow3g: define IV and digest length constants

In order to avoid using magic numbers, macros for
the IV and digest lengths for Snow3G have been added.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

doc: update build instructions for libsso_snow3g

With the library update, the way to compile the library
has changed, so documentation reflects this change.
Also, the patch to fix the compilation issues present with gcc > 5.0
has been removed, as the issues have been fixed in the library.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

crypto/snow3g: rename libsso reference due to library update

The underlying libsso library that SNOW3G PMD uses has been updated,
so now it is called libsso_snow3g. Also, the path to the library
has been renamed to reflect this changes (now called LIBSSO_SNOW3G_PATH).

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

app/test: add KASUMI crypto

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

app/test: add new buffer comparison macros

In order to compare buffers with length and offset in bits,
new macros have been created, which use the previous compare function
to compare full bytes and then, compare first and last bytes of
each buffer separately.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>

crypto/kasumi: add driver for KASUMI library

Added new SW PMD which makes use of the libsso_kasumi SW library,
which provides wireless algorithms KASUMI F8 and F9
in software.

This PMD supports cipher-only, hash-only and chained operations
("cipher then hash" and "hash then cipher") of the following
algorithms:
- RTE_CRYPTO_SYM_CIPHER_KASUMI_F8
- RTE_CRYPTO_SYM_AUTH_KASUMI_F9

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Deepak Kumar Jain <deepak.k.jain@intel.com>