Age | Commit message (Collapse) | Author | Files | Lines |
|
The number of CPUs is stored in ctx.cpus which is unsigned int, so use
unsigned int consistently when using CPU number. Negative CPU numbers
wont occur anyhow.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
Use zmalloc()'ed buffer or statically sized array to replace usage of
variable sized arrays. Found by the sparse static checker.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
Pushed last commit a bit too fast. The help text described that
the single letter option for "--no-sock-mem" were "-m", which is
wrong the right option is "-A" (only help txt, correct in the code
and manual).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
|
|
The default behavior of trafgen is to boost the systems default
socket memory limits during a testrun. This behaviour does not
always result in improved performance, because this will stress
the kernels SLAB allocators unnecessary.
Introducing an option "--no-sock-mem" that instruct trafgen
to not perform any socket memory adjustments.
Fixes: #130
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
|
|
Instead of having to perform the individual steps to initialize a ring
and open coding them in multiple places, provide convenience functions
to do all at once. This has the nice side effect of allowing to make
most of these *_{rx,tx}_ring() functions static in their respective
module.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
It will be set later on depending on command line option (or panic()
out) and it is initialized to 0 by a memset() before anyways.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
The mm_len member of struct ring is of type size_t, but in the code
paths leading to set it, unsigned int is used. In circumstances where
unsigned int is 32 bit and size_t is 64 bit, this could lead to an
integer overflow, which causes an improper ring size being mmap()'ed in
mmap_ring_generic().
In order to prevent this, consistently use size_t to store the ring
size, since this is also what mmap() takes as its `length' parameter.
This now allows to specify ring sizes larger than 4 GiB for both
netsniff-ng and trafgen (fixes #90).
Reported-by: Jon Schipp <jonschipp@gmail.com>
Reported-by: Michał Purzyński <michalpurzynski1@gmail.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
The -k/--kernel-pull option got useless with commit c139e80 ("trafgen:
remove timer-based trigger model"). Instead of entirely removing it and
thus possibly breaking people's scripts, still accept it as an option,
but warn the user about it. We might want to remove the option in a
future release.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
If setting an unsigned long variable, use strtoul() instead of strtol().
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
commit c10621e ("trafgen: keep a small initial ring buffer size to
reduce cache-misses") reduced default ring buffer size to 196KiB,
but on my big machines with 10Gbit/s this size is too small.
Increase default ring-size to 512 KiB, yield the best results,
without increasing ring buffer size too much, this fixes #120.
Single CPU results from my E5-2630 CPU with intel ixgbe/82599.
(Cmd: trafgen --cpp --dev eth8 --conf udp_example01.trafgen --cpu 1)
* 769,440 pkts/sec -- default ring-size 196 KiB
* 1,417,908 pkts/sec -- ring-size 500 KiB
Going above CPUs L3 cache size which is (15Mb)
* 1,236,580 pkts/sec -- ring-size 20000KiB
The mmap'ed ring buffer is now faster than using sendto().
For comparison, not using the ring-buffer, by using option "-t0":
* 1,381,364 (with qdisc bypass)
And using the qdisc code path in the kernel (enable via
parameter "--qdisc-path")
* 1,227,772 pkts/sec (with --qdisc-path)
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
|
|
Heavily reduce the initial ring buffer size for trafgen to just 196KiB,
so that we can heavily reduce the number of cache-misses related to
cache references. People can still overwrite this setting later on via
command line option for their specific architecture if they wish.
Perhaps the RX_RING should be checked as well in netsniff-ng though the
use case there is slightly different.
Before:
Performance counter stats for 'trafgen -i blub -o dummy0 -n100000000':
137,765,493,346 instructions:k # 0.82 insns per cycle
167,438,826,578 cycles:k # 0.000 GHz
59,508,315 branch-misses:k
361 context-switches:k
6 cpu-migrations:k
134,751,541 cache-misses:k # 85.019 % of all cache refs
158,495,358 cache-references:k
755 kmem:kmem_cache_alloc
15.139458202 seconds time elapsed
After:
Performance counter stats for 'trafgen -i blub -o dummy0 -n100000000':
137,889,782,650 instructions:k # 0.92 insns per cycle
150,239,185,971 cycles:k # 0.000 GHz
71,583,573 branch-misses:k
423 context-switches:k
7 cpu-migrations:k
60,239 cache-misses:k # 0.073 % of all cache refs
82,502,468 cache-references:k
740 kmem:kmem_cache_alloc
12.028787964 seconds time elapsed
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
perf reports in mmap case a huge number of kmem_cache_alloc's
which seem to stem from triggering signals from kernel to user
application, against dummy device:
Performance counter stats for 'trafgen -i blub -o dummy0 -n100000000 -k100': <-- mmap case
175,837 kmem:kmem_cache_alloc
14.758900522 seconds time elapsed
Performance counter stats for 'trafgen -i blub -o dummy0 -n100000000 -k100 -t0': <-- non-mmap case
707 kmem:kmem_cache_alloc
15.591667364 seconds time elapsed
It seems not to case significant number of cache-misses, but
it's better to switch to a direct trigger when we cannot fill
new frames anymore. After this patch, we see a similar number
of kmem_cache_alloc's as in the non-mmap case. This basically
renders the kpull interval useless, we can optionally remove
it if we don't care about people's scripts. ;-)
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Handle all termination signals that we're allowed to handle (SIGKILL
can't be handled) in order to exit gracefully in any regular termination
case.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
Follow commit bed9b6bb ("netsniff-ng: Don't modify optarg/argv").
We shouldn't modify optarg (and thus argv) since it's e.g. used to
display the commandline string in `ps'. Since strtoul() reads until it
encounters the first non-numeric character and ignores the rest, we can
just revert from setting a NULL byte after the numeric part of the
string.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
Since Linux 3.14, the kernel supports a socket option PACKET_QDISC_BYPASS,
which trafgen enables by default. That allow us to bypass the kernels
normal qdisc (traffic control) layer.
An option -q, --qdisc-path is added to allow enabling the qdisc path
explicity, useful for testing purposes.
This will be avail in kernels >= 3.14 via
commit d346a3fae3 (packet: introduce PACKET_QDISC_BYPASS socket option).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
|
|
The tool trafgen is used in a pktgen style transmit only scenario.
We discovered a performance bottleneck in the kernel, when
running trafgen, where the kernel stalled on a lock in
packet_rcv(). This call is unnecessary for trafgen given its
transmit only nature.
This packet_rcv() call can, easily be avoided by instructing the
RAW/PF_PACKET socket, to not listen to any protocols (by passing
protocol argument zero, when creating the socket).
The performance gain is huge, increasing performance from approx
max 2Mpps to 12Mpps, basically causing trafgen to scale with
the number of CPUs.
Following tests were run on a 2xCPU E5-2650 with Intel 10Gbit/s ixgbe:
Trafgen using sendto() syscall via parameter -t0:
* # CPUs -- *with* -- *without* packet_rcv() call
* 1 CPU == 1,232,244 -- 1,236,144 pkts/sec
* 2 CPUs == 1,592,720 -- 2,593,620 pkts/sec
* 3 CPUs == 1,635,623 -- 3,692,216 pkts/sec
* 4 CPUs == 1,567,768 -- 4,102,866 pkts/sec
* 5 CPUs == 1,700,270 -- 5,151,489 pkts/sec
* 6 CPUs == 1,762,392 -- 6,124,512 pkts/sec
* 7 CPUs == 1,850,139 -- 7,120,496 pkts/sec
* 8 CPUs == 1,770,909 -- 8,058,710 pkts/sec
* 9 CPUs == 1,721,072 -- 8,963,192 pkts/sec
* 10 CPUs == 1,359,157 -- 9,584,535 pkts/sec
* 11 CPUs == 1,175,520 -- 10,498,038 pkts/sec
* 12 CPUs == 1,075,867 -- 11,189,292 pkts/sec
* 13 CPUs == 1,012,602 -- 12,048,836 pkts/sec
* [...]
* 20 CPUs == 1,030,446 -- 11,202,449 pkts/sec
Trafgen using mmap() TX tpacket_v2 (default)
* # CPUs -- *with* -- *without* packet_rcv() call
* 1 CPU == 920,682 -- 927,984 pkts/sec
* 2 CPUs == 1,607,940 -- 2,061,406 pkts/sec
* 3 CPUs == 1,668,488 -- 2,979,463 pkts/sec
* 4 CPUs == 1,423,066 -- 3,169,565 pkts/sec
* 5 CPUs == 1,507,708 -- 3,910,756 pkts/sec
* 6 CPUs == 1,555,616 -- 4,625,844 pkts/sec
* 7 CPUs == 1,560,961 -- 5,298,441 pkts/sec
* 8 CPUs == 1,596,092 -- 6,000,465 pkts/sec
* 9 CPUs == 1,575,139 -- 6,722,130 pkts/sec
* 10 CPUs == 1,311,676 -- 7,114,202 pkts/sec
* 11 CPUs == 1,157,650 -- 7,859,399 pkts/sec
* 12 CPUs == 1,060,366 -- 8,491,004 pkts/sec
* 13 CPUs == 1,012,956 -- 9,269,761 pkts/sec
* [...]
* 20 CPUs == 955,716 -- 8,653,947 pkts/sec
It is fairly strange that the mmap() version runs slower than the
sendto() version. This is likely another performance problem related
to mmap() which seems worth fixing.
Note, that the mmap() version speed can be improved by reducing the
default --ring-size to around 1-2 MiB. But this does not fix general
trend with mmap() performance.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
|
|
Also make these options available to trafgen.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
If we loose carrier, don't bother about panic'ing, but continue sending!
Only exit in case we're doing a smoke test.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Add ability to set IGP in time units in seconds, milliseconds,
microseconds, and nanoseconds by appending a postfix to --gap e.g. --gap
100ms. Also, update the man page and trafgen.zsh to reflect the
changes.
[Fix whitespaces, coding style and minor wording changes -- tklauser]
Signed-off-by: Jon Schipp <jonschipp@gmail.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
Get rid of inner loop in fast path. This simplifies code and readability.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Combine multiple likely conditions into one.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Fix the following compiler warnings that occur when building with "-W
-Wall -Wextra":
trafgen.c: In function ‘timer_elapsed’:
trafgen.c:136:31: warning: unused parameter ‘number’ [-Wunused-parameter]
trafgen.c: In function ‘apply_csum16’:
trafgen.c:346:36: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c: In function ‘dump_trafgen_snippet’:
trafgen.c:415:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c: In function ‘xmit_smoke_probe’:
trafgen.c:490:31: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c:506:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c:509:56: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c:512:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c: In function ‘__wait_and_sum_others’:
trafgen.c:721:30: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c: In function ‘__correct_global_delta’:
trafgen.c:743:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c:761:30: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c: In function ‘xmit_packet_precheck’:
trafgen.c:797:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c:816:47: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c: In function ‘main_loop’:
trafgen.c:837:17: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c: In function ‘main’:
trafgen.c:1047:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c:1065:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c:1078:43: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
trafgen.c:1090:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
Compiling with "-W -Wall -Wextra" reveals the following warnings in
mac80211.c:
mac80211.c: In function ‘nl80211_init’:
mac80211.c:78:67: warning: unused parameter ‘device’ [-Wunused-parameter]
mac80211.c: In function ‘nl80211_wait_handler’:
mac80211.c:106:48: warning: unused parameter ‘msg’ [-Wunused-parameter]
mac80211.c: In function ‘nl80211_error_handler’:
mac80211.c:115:54: warning: unused parameter ‘nla’ [-Wunused-parameter]
mac80211.c:117:12: warning: unused parameter ‘arg’ [-Wunused-parameter]
mac80211.c: In function ‘nl80211_del_mon_if’:
mac80211.c:181:72: warning: unused parameter ‘device’ [-Wunused-parameter]
Fix them by either marking them as unused (where we need to conform to
library APIs or remove them alltogether (for our own APIs). For the
function leave_rfmon_mac80211() the according users (netsniff-ng and
trafgen) are also changed.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
The variable `i' in apply_counter(), apply_randomizer() and
apply_csum16() is only set once, so use the original parameter
`counter_id' instead. Rename `counter_id' to `id' in order to keep it
short.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
For systems with >8 CPUs it might be i) annoying or ii) uninteresting
to print CPU time statistics. So introduce a switch that can skip this
at exit.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Those are the last occurences of warnings like:
netsniff-ng.c:697:48: warning: Using plain integer as NULL pointer
netsniff-ng.c:726:48: warning: Using plain integer as NULL pointer
...
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Commit bf9232fb6 ("trafgen: setup_shared_var: fix couple of things")
forgot to amend the remaining changes in 'len' usage. So lets fix
them right now.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
sparse reported that the variable seed can be made static:
trafgen.c:110:14: warning: symbol 'seed' was not declared. Should it be static?
While at it, also rename a local variable 'seed' to not overlap
contexts.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
1) Fix a couple of sparse warnings:
trafgen.c:382:27: error: cannot size expression
trafgen.c:391:33: error: cannot size expression
trafgen.c:393:33: error: cannot size expression
trafgen.c:401:25: error: cannot size expression
2) Use MAP_FAILED instead of (void *) -1
3) No need to cast to void * on mmap(2)
4) Use NULL instead of 0 as mmap(2)'s first argument
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
In order to be able to better track regressions or to give support,
let us track the Git id as well in version information. This makes
the ``--version'' switch actually useful.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Took quite a while to git bisect the cause for the wrong TCP checksum
in the -e example. It turned out that commit bf43e1993c7037 ("trafgen:
lexer: return original string if no shellcode") "broke" it, since
before that commit the TCP checksum from -e example was correct and
afterwards not anymore. Well, it didn't break it. What was happening
here is that with this fix above, the packet got 1 byte longer since
the first character of the example string is not omitted anymore,
therefore the checksum got wrong. Fix this by fixing the IP total
length of the packet in the -e and man page example. The UDP example
from the man page still works well if csumudp() is used, so not
affected of this.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
It seems not critical at this point, but lets check it for all offsets
here as well, and mark this check as unlikely to happen.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
In the current situation, it can happen when we set -n1, that no packet
at all will be scheduled. This is due to the case that nearbyint() will
for e.g. 2 cpus round to 0 each, and since in __correct_global_delta()
we only correct a total delta when a particular CPU is allowed to tx
packets (means already has a num > 0), then we correct the delta on the
first such CPU. Switch to using round(), so that on 0.5 it will be round
to the next higher int, and fix the check to >= 0 in __correct_global_delta()
so that a CPU could also get a 0 share of packets. I did a couple of tests
with different -n params and cpu(..) configs and this seems to fix that.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Finally eliminate xutils.{c,h} and move the rest to epoll2.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Add an extra file for signal handling functions.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Remove them from xutils, and add them to socket management.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Move them out of xutils, so that we can maintain them separately.
Also simplify things a bit.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Again, also to be able to maintain this more easily.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Rename xio to ioops (io-ops) and boil its include files down to a
minimum.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Break this out so that we only need to have sigint non-static where
it is really needed.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Break out all string handling functions and lockme stuff in order
to further eliminate the big code blob in xutils, so that it can
be easier maintained.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Break out IRQ functionality from xutils, simplify it, and
save + restore IRQ affinity list.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Prepare TPACKET_V3 for allowing to transparently setting up the
frame structure such that we do not need to change much in the
netsniff-ng/trafgen code.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
Include long version string into tools when called with --version.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
|
|
The entire packet is zeroed using memset() three lines above, thus there
is no need to set icmp->code and icmp->checksum to 0 explicitely again.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
The version() function was missed in the previous commit 785fe152
("trafgen: Add __noreturn attribute to exiting functions"), so add
__noreturn to it now.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
Add the __noreturn attribute to all functions which wont return
but call die() themselves to exit().
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
Replace "on default" by "by default", make it a bit more clear what the
seed in the -E/--seed option is for and mention exit after display of
information on --version and --help.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|