summaryrefslogtreecommitdiff
path: root/bpf.h
diff options
context:
space:
mode:
authorJesper Dangaard Brouer <brouer@redhat.com>2013-12-11 22:27:43 +0100
committerJesper Dangaard Brouer <brouer@redhat.com>2013-12-11 22:27:43 +0100
commitc3602a995b21e8133c7f4fd1fb1e7e21b6a844f1 (patch)
treef4d896119c1db980bc97429649f71f6d5463e47d /bpf.h
parentb168f5f60a9cc4490b499a067da5273b0f40337c (diff)
trafgen: speedup TX only path by avoiding kernel packet_rcv() call
The tool trafgen is used in a pktgen style transmit only scenario. We discovered a performance bottleneck in the kernel, when running trafgen, where the kernel stalled on a lock in packet_rcv(). This call is unnecessary for trafgen given its transmit only nature. This packet_rcv() call can, easily be avoided by instructing the RAW/PF_PACKET socket, to not listen to any protocols (by passing protocol argument zero, when creating the socket). The performance gain is huge, increasing performance from approx max 2Mpps to 12Mpps, basically causing trafgen to scale with the number of CPUs. Following tests were run on a 2xCPU E5-2650 with Intel 10Gbit/s ixgbe: Trafgen using sendto() syscall via parameter -t0: * # CPUs -- *with* -- *without* packet_rcv() call * 1 CPU == 1,232,244 -- 1,236,144 pkts/sec * 2 CPUs == 1,592,720 -- 2,593,620 pkts/sec * 3 CPUs == 1,635,623 -- 3,692,216 pkts/sec * 4 CPUs == 1,567,768 -- 4,102,866 pkts/sec * 5 CPUs == 1,700,270 -- 5,151,489 pkts/sec * 6 CPUs == 1,762,392 -- 6,124,512 pkts/sec * 7 CPUs == 1,850,139 -- 7,120,496 pkts/sec * 8 CPUs == 1,770,909 -- 8,058,710 pkts/sec * 9 CPUs == 1,721,072 -- 8,963,192 pkts/sec * 10 CPUs == 1,359,157 -- 9,584,535 pkts/sec * 11 CPUs == 1,175,520 -- 10,498,038 pkts/sec * 12 CPUs == 1,075,867 -- 11,189,292 pkts/sec * 13 CPUs == 1,012,602 -- 12,048,836 pkts/sec * [...] * 20 CPUs == 1,030,446 -- 11,202,449 pkts/sec Trafgen using mmap() TX tpacket_v2 (default) * # CPUs -- *with* -- *without* packet_rcv() call * 1 CPU == 920,682 -- 927,984 pkts/sec * 2 CPUs == 1,607,940 -- 2,061,406 pkts/sec * 3 CPUs == 1,668,488 -- 2,979,463 pkts/sec * 4 CPUs == 1,423,066 -- 3,169,565 pkts/sec * 5 CPUs == 1,507,708 -- 3,910,756 pkts/sec * 6 CPUs == 1,555,616 -- 4,625,844 pkts/sec * 7 CPUs == 1,560,961 -- 5,298,441 pkts/sec * 8 CPUs == 1,596,092 -- 6,000,465 pkts/sec * 9 CPUs == 1,575,139 -- 6,722,130 pkts/sec * 10 CPUs == 1,311,676 -- 7,114,202 pkts/sec * 11 CPUs == 1,157,650 -- 7,859,399 pkts/sec * 12 CPUs == 1,060,366 -- 8,491,004 pkts/sec * 13 CPUs == 1,012,956 -- 9,269,761 pkts/sec * [...] * 20 CPUs == 955,716 -- 8,653,947 pkts/sec It is fairly strange that the mmap() version runs slower than the sendto() version. This is likely another performance problem related to mmap() which seems worth fixing. Note, that the mmap() version speed can be improved by reducing the default --ring-size to around 1-2 MiB. But this does not fix general trend with mmap() performance. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Diffstat (limited to 'bpf.h')
0 files changed, 0 insertions, 0 deletions
b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff92fb65ec2588: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff92fb65ec2598: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff92fb65ec25a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff92fb65ec25b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff92fb65ec25c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff92fb65ec25d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff92fb65ec25e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. Redzone ffff92fb65ec25f8: bb bb bb bb bb bb bb bb ........ Padding ffff92fb65ec2738: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ CPU: 3 PID: 180 Comm: kworker/3:2 Tainted: G BU 4.10.0-rc6-patser+ #5039 Hardware name: /NUC5PPYB, BIOS PYBSWCEL.86A.0031.2015.0601.1712 06/01/2015 Workqueue: events intel_atomic_helper_free_state [i915] Call Trace: dump_stack+0x4d/0x6d print_trailer+0x20c/0x220 free_debug_processing+0x1c6/0x330 ? drm_atomic_state_default_clear+0xf7/0x1c0 [drm] __slab_free+0x48/0x2e0 ? drm_atomic_state_default_clear+0xf7/0x1c0 [drm] kfree+0x159/0x1a0 drm_atomic_state_default_clear+0xf7/0x1c0 [drm] ? drm_atomic_state_clear+0x30/0x30 [drm] intel_atomic_state_clear+0xd/0x20 [i915] drm_atomic_state_clear+0x1a/0x30 [drm] __drm_atomic_state_free+0x13/0x60 [drm] intel_atomic_helper_free_state+0x5d/0x70 [i915] process_one_work+0x260/0x4a0 worker_thread+0x2d1/0x4f0 kthread+0x127/0x130 ? process_one_work+0x4a0/0x4a0 ? kthread_stop+0x120/0x120 ret_from_fork+0x29/0x40 FIX kmalloc-128: Object at 0xffff92fb65ec2578 not freed Fixes: 3b24f7d67581 ("drm/atomic: Add struct drm_crtc_commit to track async updates") Fixes: 9626014258a5 ("drm/fence: add in-fences support") Cc: <stable@vger.kernel.org> # v4.8+ Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1485854725-27640-1-git-send-email-maarten.lankhorst@linux.intel.com
Diffstat (limited to 'include/dt-bindings/mfd/st-lpc.h')