show devices status and to bind them to a suitable kernel driver. They will
appear under the category of "Misc (rawdev) devices".
+Prerequisites
+-------------
+
+NTB PMD needs kernel PCI driver to support write combining (WC) to get
+better performance. The difference will be more than 10 times.
+To enable WC, there are 2 ways.
+
+- Insert igb_uio with ``wc_active=1`` flag if use igb_uio driver.
+
+.. code-block:: console
+
+ insmod igb_uio.ko wc_active=1
+
+- Enable WC for NTB device's Bar 2 and Bar 4 (Mapped memory) manually.
+ The reference is https://www.kernel.org/doc/html/latest/x86/mtrr.html
+ Get bar base address using ``lspci -vvv -s ae:00.0 | grep Region``.
+
+.. code-block:: console
+
+ # lspci -vvv -s ae:00.0 | grep Region
+ Region 0: Memory at 39bfe0000000 (64-bit, prefetchable) [size=64K]
+ Region 2: Memory at 39bfa0000000 (64-bit, prefetchable) [size=512M]
+ Region 4: Memory at 39bfc0000000 (64-bit, prefetchable) [size=512M]
+
+Using the following command to enable WC.
+
+.. code-block:: console
+
+ echo "base=0x39bfa0000000 size=0x20000000 type=write-combining" >> /proc/mtrr
+ echo "base=0x39bfc0000000 size=0x20000000 type=write-combining" >> /proc/mtrr
+
+And the results:
+
+.. code-block:: console
+
+ # cat /proc/mtrr
+ reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back
+ reg01: base=0x07f000000 ( 2032MB), size= 16MB, count=1: uncachable
+ reg02: base=0x39bfa0000000 (60553728MB), size= 512MB, count=1: write-combining
+ reg03: base=0x39bfc0000000 (60554240MB), size= 512MB, count=1: write-combining
+
+To disable WC for these regions, using the following.
+
+.. code-block:: console
+
+ echo "disable=2" >> /proc/mtrr
+ echo "disable=3" >> /proc/mtrr
+
Ring Layout
-----------
+------------------------+ +------------------------+
<---------traffic---------
+- Enqueue and Dequeue
+ Based on this ring layout, enqueue reads rx_tail to get how many free
+ buffers and writes used_ring and tx_tail to tell the peer which buffers
+ are filled with data.
+ And dequeue reads tx_tail to get how many packets are arrived, and
+ writes desc_ring and rx_tail to tell the peer about the new allocated
+ buffers.
+ So in this way, only remote write happens and remote read can be avoid
+ to get better performance.
+
Limitation
----------