Work that relates to netsniff-ng and how we differ from it: /////////////////////////////////////////////////////////// ntop * W: http://www.ntop.org/ The ntop projects offers zero-copy for network packets. Is this approach significantly different from the already built-in from the Linux kernel? High likely not. In both cases packets are memory mapped between both address spaces. The biggest difference is that you get this for free, without modifying your kernel with netsniff-ng since it uses the kernel's RX_RING and TX_RING functionality. Unfortunately this is not really mentioned on the ntop's website. Surely for promotional reasons. For many years the ntop projects lives on next to the Linux kernel, attempts have been made to integrate it [1] but discussions got stuck and both sides seem to have no interest in it anymore, e.g. [2]. Therefore, if you want to use ntop, you are dependent on ntop's modified drivers that are maintained out of the Linux kernel's mainline tree. Thus, this will not provide you with the latest improvements. Also, the Linux kernel's PF_PACKET is maintained by a much bigger audience, probably better reviewed and optimized. Therefore, also we decided to go with the Linux kernel's variant. So to keep it short: both approaches are zero-copy, both have similar performance (if someone tells you something different, he would lie due to their technical similarities) and we are using the kernel's built-in variant to reach a broader audience. [1] http://lists.openwall.net/netdev/2009/10/14/37 [2] http://www.spinics.net/lists/netfilter-devel/msg20212.html tcpdump * W: http://www.tcpdump.org/ tcpdump is probably the oldest and most famous packet analyzer. It is based on libpcap and in fact the MIT team that maintains tcpdump also maintains libpcap. It has been ported to much more architectures and operating systems than netsniff-ng. However, we don't aim to rebuild or clone tcpdump. We rather focus on achieving a higher capturing speed by carefully tuning and optimizing our code. That said doesn't mean that tcpdump people do not take care of it. It just means that we don't have additional layers of abstractions for being as portable as possible. This already gives us a smaller code footprint. Also, on default we perform some system tuning such as remapping the NIC's IRQ affinity that tcpdump probably would never do due to its generic nature. By generic, we mean to serve as many different user groups as possible. We rather aim at serving users for high-speed needs. By that, they have less manual work to do since it's already performed in the background. Next to this, we also aim at being a useful networking toolkit rather than only an analyzer. So many other tools are provided such as trafgen for traffic generation. Wireshark/tshark * W: http://www.wireshark.org/ Probably we could tell you the same as in the previous section. I guess it is safe to say that Wireshark might have the best protocol dissector out there. However, this is not a free lunch. You pay for it with a performance degradation, which is quite expensive. It is also based on libpcap (we are not) and it comes with a graphical user interface, whereas we rather aim at being used somewhere on a server or middle-box site where you only have access to a shell, for instance. Again, offline analysis of /large/ pcap files might even let it hang for a long time. Here netsniff-ng has a better performance also in capturing pcaps. Again, we furthermore aim at being a toolkit rather than only an analyzer. libpcap * W: http://www.tcpdump.org/ Price question: why don't you rely on libpcap? The answer is quite simple. We started developing netsniff-ng with its zero-copy capabilities back in 2009 when libpcap was still doing packet copies between address spaces. Since the API to the Linux kernel was quite simple, we felt more comfortable using it directly and bypassing this additional layer of libpcap code. Today we feel good about this decision, because since the TX_RING functionality was added to the Linux kernel we have a clean integration of both, RX_RING and TX_RING. libpcap on the other hand was designed for capturing and not for transmission of network packets. Therefore, it only uses RX_RING on systems where it's available but no TX_RING functionality. This would have resulted in a mess in our code. Additionally, with netsniff-ng, one is able to a more fine grained tuning of those rings. Why didn't you wrap netsniff-ng around your own library just like tcpdump and libpcap? Because we are ignorant. If you design a library than you have to design it well right at the beginning. A library would be a crappy one if it changes its API ever. Or, if it changes its API, than it has to keep its old one for the sake of being backwards compatible. Otherwise no trust in its user or developer base can be achieved. Further, by keeping this long tail of deprecated functions you will become a code bloat over time. We wanted to keep this freedom of large-scale refactoring our code and not having to maintain a stable API to the outer world. This is the whole story behind it. If you desperately need our internal functionality, you still can feel free to copy our code as long as your derived code complies with the GPL version 2.0. So no need to whine. ;-) vfs_dentry.c parent1b1bc42c1692e9b62756323c675a44cb1a1f9dbd (diff)
percpu-refcount: fix reference leak during percpu-atomic transition
percpu_ref_tryget() and percpu_ref_tryget_live() should return "true" IFF they acquire a reference. But the return value from atomic_long_inc_not_zero() is a long and may have high bits set, e.g. PERCPU_COUNT_BIAS, and the return value of the tryget routines is bool so the reference may actually be acquired but the routines return "false" which results in a reference leak since the caller assumes it does not need to do a corresponding percpu_ref_put(). This was seen when performing CPU hotplug during I/O, as hangs in blk_mq_freeze_queue_wait where percpu_ref_kill (blk_mq_freeze_queue_start) raced with percpu_ref_tryget (blk_mq_timeout_work). Sample stack trace: __switch_to+0x2c0/0x450 __schedule+0x2f8/0x970 schedule+0x48/0xc0 blk_mq_freeze_queue_wait+0x94/0x120 blk_mq_queue_reinit_work+0xb8/0x180 blk_mq_queue_reinit_prepare+0x84/0xa0 cpuhp_invoke_callback+0x17c/0x600 cpuhp_up_callbacks+0x58/0x150 _cpu_up+0xf0/0x1c0 do_cpu_up+0x120/0x150 cpu_subsys_online+0x64/0xe0 device_online+0xb4/0x120 online_store+0xb4/0xc0 dev_attr_store+0x68/0xa0 sysfs_kf_write+0x80/0xb0 kernfs_fop_write+0x17c/0x250 __vfs_write+0x6c/0x1e0 vfs_write+0xd0/0x270 SyS_write+0x6c/0x110 system_call+0x38/0xe0 Examination of the queue showed a single reference (no PERCPU_COUNT_BIAS, and __PERCPU_REF_DEAD, __PERCPU_REF_ATOMIC set) and no requests. However, conditions at the time of the race are count of PERCPU_COUNT_BIAS + 0 and __PERCPU_REF_DEAD and __PERCPU_REF_ATOMIC set. The fix is to make the tryget routines use an actual boolean internally instead of the atomic long result truncated to a int. Fixes: e625305b3907 percpu-refcount: make percpu_ref based on longs instead of ints Link: https://bugzilla.kernel.org/show_bug.cgi?id=190751 Signed-off-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Reviewed-by: Jens Axboe <axboe@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: e625305b3907 ("percpu-refcount: make percpu_ref based on longs instead of ints") Cc: stable@vger.kernel.org # v3.18+
Diffstat (limited to 'fs/9p/vfs_dentry.c')