X-Git-Url: http://git.droids-corp.org/?a=blobdiff_plain;f=doc%2Fguides%2Frawdevs%2Fntb.rst;h=2bb115d13f991dcb49ea533df12b494b527594d1;hb=f16662885472d33570b564e62427199d733be363;hp=0a61ec03d2f041fa1352adf75e9d69a0b6d528e9;hpb=62012a76811ec1344231fee57a056bd1f5ab9dde;p=dpdk.git diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst index 0a61ec03d2..2bb115d13f 100644 --- a/doc/guides/rawdevs/ntb.rst +++ b/doc/guides/rawdevs/ntb.rst @@ -14,28 +14,22 @@ allocation for the peer to access and read/write allocated memory from peer. Also, the PMD allows to use doorbell registers to notify the peer and share some information by using scratchpad registers. -BIOS setting on Intel Skylake ------------------------------ +BIOS setting on Intel Xeon +-------------------------- -Intel Non-transparent Bridge needs special BIOS setting. Since the PMD only -supports Intel Skylake platform, introduce BIOS setting here. The referencce -is https://www.intel.com/content/dam/support/us/en/documents/server-products/Intel_Xeon_Processor_Scalable_Family_BIOS_User_Guide.pdf +Intel Non-transparent Bridge needs special BIOS setting. The reference for +Skylake is https://www.intel.com/content/dam/support/us/en/documents/server-products/Intel_Xeon_Processor_Scalable_Family_BIOS_User_Guide.pdf - Set the needed PCIe port as NTB to NTB mode on both hosts. -- Enable NTB bars and set bar size of bar 23 and bar 45 as 12-29 (2K-512M) - on both hosts. Note that bar size on both hosts should be the same. +- Enable NTB bars and set bar size of bar 23 and bar 45 as 12-29 (4K-512M) + on both hosts (for Ice Lake, bar size can be set as 12-51, namely 4K-128PB). + Note that bar size on both hosts should be the same. - Disable split bars for both hosts. - Set crosslink control override as DSD/USP on one host, USD/DSP on another host. - Disable PCIe PII SSC (Spread Spectrum Clocking) for both hosts. This is a hardware requirement. -Build Options -------------- - -- ``CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV`` (default ``y``) - - Toggle compilation of the ``ntb`` driver. Device Setup ------------ @@ -45,8 +39,110 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to show devices status and to bind them to a suitable kernel driver. They will appear under the category of "Misc (rawdev) devices". +Prerequisites +------------- + +NTB PMD needs kernel PCI driver to support write combining (WC) to get +better performance. The difference will be more than 10 times. +To enable WC, there are 2 ways. + +- Insert igb_uio with ``wc_activate=1`` flag if use igb_uio driver. + +.. code-block:: console + + insmod igb_uio.ko wc_activate=1 + +- Enable WC for NTB device's Bar 2 and Bar 4 (Mapped memory) manually. + The reference is https://www.kernel.org/doc/html/latest/x86/mtrr.html + Get bar base address using ``lspci -vvv -s ae:00.0 | grep Region``. + +.. code-block:: console + + # lspci -vvv -s ae:00.0 | grep Region + Region 0: Memory at 39bfe0000000 (64-bit, prefetchable) [size=64K] + Region 2: Memory at 39bfa0000000 (64-bit, prefetchable) [size=512M] + Region 4: Memory at 39bfc0000000 (64-bit, prefetchable) [size=512M] + +Using the following command to enable WC. + +.. code-block:: console + + echo "base=0x39bfa0000000 size=0x20000000 type=write-combining" >> /proc/mtrr + echo "base=0x39bfc0000000 size=0x20000000 type=write-combining" >> /proc/mtrr + +And the results: + +.. code-block:: console + + # cat /proc/mtrr + reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back + reg01: base=0x07f000000 ( 2032MB), size= 16MB, count=1: uncachable + reg02: base=0x39bfa0000000 (60553728MB), size= 512MB, count=1: write-combining + reg03: base=0x39bfc0000000 (60554240MB), size= 512MB, count=1: write-combining + +To disable WC for these regions, using the following. + +.. code-block:: console + + echo "disable=2" >> /proc/mtrr + echo "disable=3" >> /proc/mtrr + +Ring Layout +----------- + +Since read/write remote system's memory are through PCI bus, remote read +is much more expensive than remote write. Thus, the enqueue and dequeue +based on ntb ring should avoid remote read. The ring layout for ntb is +like the following: + +- Ring Format:: + + desc_ring: + + 0 16 64 + +---------------------------------------------------------------+ + | buffer address | + +---------------+-----------------------------------------------+ + | buffer length | resv | + +---------------+-----------------------------------------------+ + + used_ring: + + 0 16 32 + +---------------+---------------+ + | packet length | flags | + +---------------+---------------+ + +- Ring Layout:: + + +------------------------+ +------------------------+ + | used_ring | | desc_ring | + | +---+ | | +---+ | + | | | | | | | | + | +---+ +--------+ | | +---+ | + | | | ---> | buffer | <+---+-| | | + | +---+ +--------+ | | +---+ | + | | | | | | | | + | +---+ | | +---+ | + | ... | | ... | + | | | | + | +---------+ | | +---------+ | + | | tx_tail | | | | rx_tail | | + | System A +---------+ | | System B +---------+ | + +------------------------+ +------------------------+ + <---------traffic--------- + +- Enqueue and Dequeue + Based on this ring layout, enqueue reads rx_tail to get how many free + buffers and writes used_ring and tx_tail to tell the peer which buffers + are filled with data. + And dequeue reads tx_tail to get how many packets are arrived, and + writes desc_ring and rx_tail to tell the peer about the new allocated + buffers. + So in this way, only remote write happens and remote read can be avoid + to get better performance. + Limitation ---------- -- The FIFO hasn't been introduced and will come in 19.11 release. -- This PMD only supports Intel Skylake platform. +- This PMD only supports Intel Skylake and Ice Lake platforms.