Age | Commit message (Collapse) | Author | Files | Lines | |
---|---|---|---|---|---|
2018-01-22 | ring: use xzmalloc_aligned | Tobias Klauser | 1 | -2/+1 | |
Use xzmalloc_aligned instead of open-coding it. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> | |||||
2016-04-27 | ring: Remove unused parameter sock from setup_ring_layout_generic() | Tobias Klauser | 1 | -1/+1 | |
setup_ring_layout_generic() takes an "int sock" parameter but never uses it. Remove it to prevent -Wunused-parameter warnings. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> | |||||
2015-10-29 | ring, bind_ring_generic: no need to nullify members twice | Daniel Borkmann | 1 | -7/+2 | |
We already do a memset before, no need to set members to null twice, just some minor cleanup. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> | |||||
2015-10-29 | ring: alloc_ring_frames_generic make types size_t | Daniel Borkmann | 1 | -3/+2 | |
Lets make i and num as size_t, there's no particular reason for them to be int. At least i is used to setup iov_base offsets. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> | |||||
2015-10-29 | ring: Simplify calculation of number of frames in a tpacket ring | Tobias Klauser | 1 | -4/+1 | |
The number of frames in a tpacket ring (ring->layout.tp_frame_nr) is currently calculated as: tp_frame_nr = tp_block_size / tp_frame_size * tp_block_nr Substituting tp_block_nr with 'size / tp_block_size' (as calculated in the line above), we get: tp_frame_nr = tp_block_size / tp_frame_size * (size / tp_block_size) and realize that we can omit tp_block_size as it cancels out, leading to: tp_frame_nr = 1 / tp_frame_size * (size / 1) = size / tp_frame_size Adjust the calculation in setup_ring_layout_generic() accordingly. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> | |||||
2015-10-29 | ring: Move generic code for ring layout setup to own function | Tobias Klauser | 1 | -0/+21 | |
Initialization of the ring->layout members is the same for RX and TX rings. Instead of duplicating the code in setup_rx_ring_layout() and setup_tx_ring_layout(), create a new function setup_ring_layout_generic() which is called from the former two. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> | |||||
2013-12-11 | trafgen: speedup TX only path by avoiding kernel packet_rcv() call | Jesper Dangaard Brouer | 1 | -2/+5 | |
The tool trafgen is used in a pktgen style transmit only scenario. We discovered a performance bottleneck in the kernel, when running trafgen, where the kernel stalled on a lock in packet_rcv(). This call is unnecessary for trafgen given its transmit only nature. This packet_rcv() call can, easily be avoided by instructing the RAW/PF_PACKET socket, to not listen to any protocols (by passing protocol argument zero, when creating the socket). The performance gain is huge, increasing performance from approx max 2Mpps to 12Mpps, basically causing trafgen to scale with the number of CPUs. Following tests were run on a 2xCPU E5-2650 with Intel 10Gbit/s ixgbe: Trafgen using sendto() syscall via parameter -t0: * # CPUs -- *with* -- *without* packet_rcv() call * 1 CPU == 1,232,244 -- 1,236,144 pkts/sec * 2 CPUs == 1,592,720 -- 2,593,620 pkts/sec * 3 CPUs == 1,635,623 -- 3,692,216 pkts/sec * 4 CPUs == 1,567,768 -- 4,102,866 pkts/sec * 5 CPUs == 1,700,270 -- 5,151,489 pkts/sec * 6 CPUs == 1,762,392 -- 6,124,512 pkts/sec * 7 CPUs == 1,850,139 -- 7,120,496 pkts/sec * 8 CPUs == 1,770,909 -- 8,058,710 pkts/sec * 9 CPUs == 1,721,072 -- 8,963,192 pkts/sec * 10 CPUs == 1,359,157 -- 9,584,535 pkts/sec * 11 CPUs == 1,175,520 -- 10,498,038 pkts/sec * 12 CPUs == 1,075,867 -- 11,189,292 pkts/sec * 13 CPUs == 1,012,602 -- 12,048,836 pkts/sec * [...] * 20 CPUs == 1,030,446 -- 11,202,449 pkts/sec Trafgen using mmap() TX tpacket_v2 (default) * # CPUs -- *with* -- *without* packet_rcv() call * 1 CPU == 920,682 -- 927,984 pkts/sec * 2 CPUs == 1,607,940 -- 2,061,406 pkts/sec * 3 CPUs == 1,668,488 -- 2,979,463 pkts/sec * 4 CPUs == 1,423,066 -- 3,169,565 pkts/sec * 5 CPUs == 1,507,708 -- 3,910,756 pkts/sec * 6 CPUs == 1,555,616 -- 4,625,844 pkts/sec * 7 CPUs == 1,560,961 -- 5,298,441 pkts/sec * 8 CPUs == 1,596,092 -- 6,000,465 pkts/sec * 9 CPUs == 1,575,139 -- 6,722,130 pkts/sec * 10 CPUs == 1,311,676 -- 7,114,202 pkts/sec * 11 CPUs == 1,157,650 -- 7,859,399 pkts/sec * 12 CPUs == 1,060,366 -- 8,491,004 pkts/sec * 13 CPUs == 1,012,956 -- 9,269,761 pkts/sec * [...] * 20 CPUs == 955,716 -- 8,653,947 pkts/sec It is fairly strange that the mmap() version runs slower than the sendto() version. This is likely another performance problem related to mmap() which seems worth fixing. Note, that the mmap() version speed can be improved by reducing the default --ring-size to around 1-2 MiB. But this does not fix general trend with mmap() performance. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> | |||||
2013-07-13 | misc: fix multiple NULL pointer sparse warnings | Daniel Borkmann | 1 | -1/+1 | |
Those are fixes for the following warnings: pcap_mm.c:119:29: warning: Using plain integer as NULL pointer pcap_mm.c:141:29: warning: Using plain integer as NULL pointer ring.c:24:31: warning: Using plain integer as NULL pointer flowtop.c:1114:22: warning: Using plain integer as NULL pointer ifpps.c:1133:29: warning: Using plain integer as NULL pointer Signed-off-by: Daniel Borkmann <dborkman@redhat.com> | |||||
2013-05-31 | ring: setup frame structure for v2/v3 in a generic way | Daniel Borkmann | 1 | -6/+5 | |
Prepare TPACKET_V3 for allowing to transparently setting up the frame structure such that we do not need to change much in the netsniff-ng/trafgen code. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> | |||||
2013-05-31 | ring: move duplicate/generic code parts from rx/tx into ring.c | Daniel Borkmann | 1 | -0/+64 | |
We do not want to maintain duplicate code, so move this into a separate file and name those *_generic() helpers. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> |