Hi
I'm trying to measure the throughput of DPDK for physical ports (using 64-byte udp packets), but it seems very low. My test setup is shown below:
+─────────────────────────────+
│ 82599ES 10-Gigabit SFI/SFP+ │
+───+────+──────────+────+────+
│ p0 │ │ p1 │
+────+ +────+
^ ^
| |
v v
+────+ +────+
│ p0 │ │ p1 │
+───+────+──────────+────+────+
│ NAPATECH Adapter - 2 port │
+─────────────────────────────+
I isolated cores (master and logical cores on hyper-threading) from Linux kernel according to the output of dpdk/tools/cpu_layout.p, iommu is set to pt, intel_iommu is off and 1G hugepages allocated for numa nodes as follows;
$ cat /sys/devices/system/node/node*/meminfo | fgrep Huge
Node 0 AnonHugePages: 137216 kB
Node 0 HugePages_Total: 3
Node 0 HugePages_Free: 2
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 227328 kB
Node 1 HugePages_Total: 2
Node 1 HugePages_Free: 1
Node 1 HugePages_Surp: 0
$ ./cpu_layout.py
cores = [0, 1, 2, 8, 9, 10]
sockets = [0, 1]
Socket 0 Socket 1
-------- --------
Core 0 [0, 12] [6, 18]
Core 1 [1, 13] [7, 19]
Core 2 [2, 14] [8, 20]
Core 8 [3, 15] [9, 21]
Core 9 [4, 16] [10, 22]
Core 10 [5, 17] [11, 23]
$ cat /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash iommu=pt intel_iommu=off default_hugepagesz=1G hugepagesz=1G hugepages=5 isolcpus=1,2,3,4,5,7,8,9,10,11,13,14,15,16,17,19,20,21,22,23"
After that I started forwarding (fwd) on testpmd application as below;
$ sudo ./testpmd -c 0xFBEFBE -n2 -v -m 1024M -- --burst=512 -i --rxq 1 --txq 1 --rxd 64 --txd 64
testpmd> set nbcore 10
Number of forwarding cores set to 10
testpmd> show config fwd
io packet forwarding - ports=2 - cores=2 - streams=2 - NUMA support disabled, MP over anonymous pages disabled
Logical Core 2 (socket 0) forwards packets on 1 streams:
RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
Logical Core 3 (socket 0) forwards packets on 1 streams:
RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
testpmd> start
io packet forwarding - CRC stripping disabled - packets/burst=512
nb forwarding cores=10 - nb forwarding ports=2
RX queues=1 - RX desc=64 - RX free threshold=32
RX threshold registers: pthresh=8 hthresh=8 wthresh=0
TX queues=1 - TX desc=64 - TX free threshold=32
TX threshold registers: pthresh=32 hthresh=0 wthresh=0
TX RS bit threshold=32 - TXQ flags=0xf01
testpmd> clear port stats all
NIC statistics for port 0 cleared
NIC statistics for port 1 cleared
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 716315071 RX-missed: 0 RX-bytes: 42978906308
RX-errors: 5229181298
RX-nombuf: 0
TX-packets: 0 TX-errors: 0 TX-bytes: 0
############################################################################
######################## NIC statistics for port 1 ########################
RX-packets: 0 RX-missed: 0 RX-bytes: 0
RX-errors: 0
RX-nombuf: 0
TX-packets: 704317861 TX-errors: 0 TX-bytes: 42259067720
############################################################################
According to NAPATECH output, DPDK forwarding rate is ~4200 Mbps (~6Mpps) and it is very low when compared to Intel's benchmark documents. Also, the RX-errors field is increasing rapidly. Moreover, EAL says that "PCI device 0000:60:00.1 on NUMA socket -1" in the testpmd application output as below.
Do you have an idea of the problem's origin? What is your suggestion and How should I continue to figure the performance problem out?
Thanks in advance
- Volkan
testpmd> argela@argela-HP-Z800-Workstation:~/ovs_dpdk/dpdk/app/test-pmd/build/app$ sudo ./testpmd -c 0xFBEFBE -n2 -v -m 1024M -- --burst=512 -i --rxq 1 -rxd 64 --txd 64
[sudo] password for argela:
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 8 on socket 0
EAL: Detected lcore 4 as core 9 on socket 0
EAL: Detected lcore 5 as core 10 on socket 0
EAL: Detected lcore 6 as core 0 on socket 1
EAL: Detected lcore 7 as core 1 on socket 1
EAL: Detected lcore 8 as core 2 on socket 1
EAL: Detected lcore 9 as core 8 on socket 1
EAL: Detected lcore 10 as core 9 on socket 1
EAL: Detected lcore 11 as core 10 on socket 1
EAL: Detected lcore 12 as core 0 on socket 0
EAL: Detected lcore 13 as core 1 on socket 0
EAL: Detected lcore 14 as core 2 on socket 0
EAL: Detected lcore 15 as core 8 on socket 0
EAL: Detected lcore 16 as core 9 on socket 0
EAL: Detected lcore 17 as core 10 on socket 0
EAL: Detected lcore 18 as core 0 on socket 1
EAL: Detected lcore 19 as core 1 on socket 1
EAL: Detected lcore 20 as core 2 on socket 1
EAL: Detected lcore 21 as core 8 on socket 1
EAL: Detected lcore 22 as core 9 on socket 1
EAL: Detected lcore 23 as core 10 on socket 1
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 24 lcore(s)
EAL: RTE Version: 'RTE 2.2.0-rc2'
EAL: Setting up physically contiguous memory...
EAL: Ask a virtual area of 0xc0000000 bytes
EAL: Virtual area found at 0x7f4700000000 (size = 0xc0000000)
EAL: Ask a virtual area of 0x80000000 bytes
EAL: Virtual area found at 0x7f4640000000 (size = 0x80000000)
EAL: Requesting 1 pages of size 1024MB from socket 0
EAL: Requesting 1 pages of size 1024MB from socket 1
EAL: TSC frequency is ~2664050 KHz
EAL: Master lcore 1 is ready (tid=c504e900;cpuset=[1])
EAL: lcore 13 is ready (tid=be947700;cpuset=[13])
EAL: lcore 14 is ready (tid=be146700;cpuset=[14])
EAL: lcore 8 is ready (tid=c094b700;cpuset=[8])
EAL: lcore 17 is ready (tid=bc943700;cpuset=[17])
EAL: lcore 2 is ready (tid=c3150700;cpuset=[2])
EAL: lcore 3 is ready (tid=c294f700;cpuset=[3])
EAL: lcore 4 is ready (tid=c214e700;cpuset=[4])
EAL: lcore 11 is ready (tid=bf148700;cpuset=[11])
EAL: lcore 16 is ready (tid=bd144700;cpuset=[16])
EAL: lcore 22 is ready (tid=ba93f700;cpuset=[22])
EAL: lcore 21 is ready (tid=bb140700;cpuset=[21])
EAL: lcore 23 is ready (tid=ba13e700;cpuset=[23])
EAL: lcore 19 is ready (tid=bc142700;cpuset=[19])
EAL: lcore 15 is ready (tid=bd945700;cpuset=[15])
EAL: lcore 10 is ready (tid=bf949700;cpuset=[10])
EAL: lcore 9 is ready (tid=c014a700;cpuset=[9])
EAL: lcore 5 is ready (tid=c194d700;cpuset=[5])
EAL: lcore 7 is ready (tid=c114c700;cpuset=[7])
EAL: lcore 20 is ready (tid=bb941700;cpuset=[20])
EAL: PCI device 0000:60:00.0 on NUMA socket -1
EAL: probe driver: 8086:10fb rte_ixgbe_pmd
EAL: PCI memory mapped at 0x7f4740000000
EAL: Trying to map BAR 4 that contains the MSI-X table. Trying offsets: 0x40000000000:0x0000, 0x1000:0x3000
EAL: PCI memory mapped at 0x7f4740081000
PMD: eth_ixgbe_dev_init(): MAC: 2, PHY: 12, SFP+: 3
PMD: eth_ixgbe_dev_init(): port 0 vendorID=0x8086 deviceID=0x10fb
EAL: PCI device 0000:60:00.1 on NUMA socket -1
EAL: probe driver: 8086:10fb rte_ixgbe_pmd
EAL: PCI memory mapped at 0x7f4740084000
EAL: Trying to map BAR 4 that contains the MSI-X table. Trying offsets: 0x40000000000:0x0000, 0x1000:0x3000
EAL: PCI memory mapped at 0x7f4740105000
PMD: eth_ixgbe_dev_init(): MAC: 2, PHY: 15, SFP+: 6
PMD: eth_ixgbe_dev_init(): port 1 vendorID=0x8086 deviceID=0x10fb
Interactive-mode selected
Configuring Port 0 (socket 0)
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f471533cd40 hw_ring=0x7f471533d180 dma_addr=0x1d533d180
PMD: ixgbe_set_tx_function(): Using simple tx code path
PMD: ixgbe_set_tx_function(): Vector tx enabled.
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f471532c640 sw_sc_ring=0x7f471532c300 hw_ring=0x7f471532c980 dma_addr=0x1d532c980
PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst size no less than 4 (port=0).
Port 0: 00:1B:21:65:42:FC
Configuring Port 1 (socket 0)
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f471531bc40 hw_ring=0x7f471531c080 dma_addr=0x1d531c080
PMD: ixgbe_set_tx_function(): Using simple tx code path
PMD: ixgbe_set_tx_function(): Vector tx enabled.
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f471530b540 sw_sc_ring=0x7f471530b200 hw_ring=0x7f471530b880 dma_addr=0x1d530b880
PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst size no less than 4 (port=1).
Port 1: 00:1B:21:65:42:FD
Checking link statuses...
Port 0 Link Up - speed 10000 Mbps - full-duplex
Port 1 Link Up - speed 10000 Mbps - full-duplex
Done