diff options
| author | Jesper Dangaard Brouer <brouer@redhat.com> | 2013-12-11 22:27:43 +0100 | 
|---|---|---|
| committer | Jesper Dangaard Brouer <brouer@redhat.com> | 2013-12-11 22:27:43 +0100 | 
| commit | c3602a995b21e8133c7f4fd1fb1e7e21b6a844f1 (patch) | |
| tree | f4d896119c1db980bc97429649f71f6d5463e47d /flowtop/Makefile | |
| parent | b168f5f60a9cc4490b499a067da5273b0f40337c (diff) | |
trafgen: speedup TX only path by avoiding kernel packet_rcv() call
The tool trafgen is used in a pktgen style transmit only scenario.
We discovered a performance bottleneck in the kernel, when
running trafgen, where the kernel stalled on a lock in
packet_rcv().  This call is unnecessary for trafgen given its
transmit only nature.
This packet_rcv() call can, easily be avoided by instructing the
RAW/PF_PACKET socket, to not listen to any protocols (by passing
protocol argument zero, when creating the socket).
The performance gain is huge, increasing performance from approx
max 2Mpps to 12Mpps, basically causing trafgen to scale with
the number of CPUs.
Following tests were run on a 2xCPU E5-2650 with Intel 10Gbit/s ixgbe:
Trafgen using sendto() syscall via parameter -t0:
 *  # CPUs --  *with*    --  *without* packet_rcv() call
 *  1 CPU  ==  1,232,244 --  1,236,144 pkts/sec
 *  2 CPUs ==  1,592,720 --  2,593,620 pkts/sec
 *  3 CPUs ==  1,635,623 --  3,692,216 pkts/sec
 *  4 CPUs ==  1,567,768 --  4,102,866 pkts/sec
 *  5 CPUs ==  1,700,270 --  5,151,489 pkts/sec
 *  6 CPUs ==  1,762,392 --  6,124,512 pkts/sec
 *  7 CPUs ==  1,850,139 --  7,120,496 pkts/sec
 *  8 CPUs ==  1,770,909 --  8,058,710 pkts/sec
 *  9 CPUs ==  1,721,072 --  8,963,192 pkts/sec
 * 10 CPUs ==  1,359,157 --  9,584,535 pkts/sec
 * 11 CPUs ==  1,175,520 -- 10,498,038 pkts/sec
 * 12 CPUs ==  1,075,867 -- 11,189,292 pkts/sec
 * 13 CPUs ==  1,012,602 -- 12,048,836 pkts/sec
 * [...]
 * 20 CPUs ==  1,030,446 -- 11,202,449 pkts/sec
Trafgen using mmap() TX tpacket_v2 (default)
 *  # CPUs --  *with*    --  *without* packet_rcv() call
 *  1 CPU  ==    920,682 --    927,984 pkts/sec
 *  2 CPUs ==  1,607,940 --  2,061,406 pkts/sec
 *  3 CPUs ==  1,668,488 --  2,979,463 pkts/sec
 *  4 CPUs ==  1,423,066 --  3,169,565 pkts/sec
 *  5 CPUs ==  1,507,708 --  3,910,756 pkts/sec
 *  6 CPUs ==  1,555,616 --  4,625,844 pkts/sec
 *  7 CPUs ==  1,560,961 --  5,298,441 pkts/sec
 *  8 CPUs ==  1,596,092 --  6,000,465 pkts/sec
 *  9 CPUs ==  1,575,139 --  6,722,130 pkts/sec
 * 10 CPUs ==  1,311,676 --  7,114,202 pkts/sec
 * 11 CPUs ==  1,157,650 --  7,859,399 pkts/sec
 * 12 CPUs ==  1,060,366 --  8,491,004 pkts/sec
 * 13 CPUs ==  1,012,956 --  9,269,761 pkts/sec
 * [...]
 * 20 CPUs ==    955,716 --  8,653,947 pkts/sec
It is fairly strange that the mmap() version runs slower than the
sendto() version.  This is likely another performance problem related
to mmap() which seems worth fixing.
Note, that the mmap() version speed can be improved by reducing the
default --ring-size to around 1-2 MiB.  But this does not fix general
trend with mmap() performance.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Diffstat (limited to 'flowtop/Makefile')
0 files changed, 0 insertions, 0 deletions
