.
To compile this driver as a module, choose M here: the
module will be called usbcore.
if USB
source "drivers/usb/core/Kconfig"
source "drivers/usb/mon/Kconfig"
source "drivers/usb/wusbcore/Kconfig"
source "drivers/usb/host/Kconfig"
source "drivers/usb/renesas_usbhs/Kconfig"
source "drivers/usb/class/Kconfig"
source "drivers/usb/storage/Kconfig"
source "drivers/usb/image/Kconfig"
source "drivers/usb/usbip/Kconfig"
endif
source "drivers/usb/mtu3/Kconfig"
source "drivers/usb/musb/Kconfig"
source "drivers/usb/dwc3/Kconfig"
source "drivers/usb/dwc2/Kconfig"
source "drivers/usb/chipidea/Kconfig"
source "drivers/usb/isp1760/Kconfig"
comment "USB port drivers"
if USB
config USB_USS720
tristate "USS720 parport driver"
depends on PARPORT
select PARPORT_NOT_PC
---help---
This driver is for USB parallel port adapters that use the Lucent
Technologies USS-720 chip. These cables are plugged into your USB
port and provide USB compatibility to peripherals designed with
parallel port interfaces.
The chip has two modes: automatic mode and manual mode. In automatic
mode, it looks to the computer like a standard USB printer. Only
printers may be connected to the USS-720 in this mode. The generic
USB printer driver ("USB Printer support", above) may be used in
that mode, and you can say N here if you want to use the chip only
in this mode.
Manual mode is not limited to printers, any parallel port
device should work. This driver utilizes manual mode.
Note however that some operations are three orders of magnitude
slower than on a PCI/ISA Parallel Port, so timing critical
applications might not work.
Say Y here if you own an USS-720 USB->Parport cable and intend to
connect anything other than a printer to it.
To compile this driver as a module, choose M here: the
module will be called uss720.
source "drivers/usb/serial/Kconfig"
source "drivers/usb/misc/Kconfig"
source "drivers/usb/atm/Kconfig"
endif # USB
source "drivers/usb/phy/Kconfig"
source "drivers/usb/gadget/Kconfig"
config USB_LED_TRIG
bool "USB LED Triggers"
depends on LEDS_CLASS && LEDS_TRIGGERS
select USB_COMMON
help
This option adds LED triggers for USB host and/or gadget activity.
Say Y here if you are working on a system with led-class supported
LEDs and you want to use them as activity indicators for USB host or
gadget.
config USB_ULPI_BUS
tristate "USB ULPI PHY interface support"
select USB_COMMON
help
UTMI+ Low Pin Interface (ULPI) is specification for a commonly used
USB 2.0 PHY interface. The ULPI specification defines a standard set
of registers that can be used to detect the vendor and product which
allows ULPI to be handled as a bus. This module is the driver for that
bus.
The ULPI interfaces (the buses) are registered by the drivers for USB
controllers which support ULPI register access and have ULPI PHY
attached to them. The ULPI PHY drivers themselves are normal PHY
drivers.
ULPI PHYs provide often functions such as ADP sensing/probing (OTG
protocol) and USB charger detection.
To compile this driver as a module, choose M here: the module will
be called ulpi.
endif # USB_SUPPORT
=nds-private-remove&id=b91e1302ad9b80c174a4855533f7e3aa2873355e'>net/can
parent | 2d706e790f0508dff4fb72eca9b4892b79757feb (diff) |
mm: optimize PageWaiters bit use for unlock_page()
In commit 62906027091f ("mm: add PageWaiters indicating tasks are
waiting for a page bit") Nick Piggin made our page locking no longer
unconditionally touch the hashed page waitqueue, which not only helps
performance in general, but is particularly helpful on NUMA machines
where the hashed wait queues can bounce around a lot.
However, the "clear lock bit atomically and then test the waiters bit"
sequence turns out to be much more expensive than it needs to be,
because you get a nasty stall when trying to access the same word that
just got updated atomically.
On architectures where locking is done with LL/SC, this would be trivial
to fix with a new primitive that clears one bit and tests another
atomically, but that ends up not working on x86, where the only atomic
operations that return the result end up being cmpxchg and xadd. The
atomic bit operations return the old value of the same bit we changed,
not the value of an unrelated bit.
On x86, we could put the lock bit in the high bit of the byte, and use
"xadd" with that bit (where the overflow ends up not touching other
bits), and look at the other bits of the result. However, an even
simpler model is to just use a regular atomic "and" to clear the lock
bit, and then the sign bit in eflags will indicate the resulting state
of the unrelated bit #7.
So by moving the PageWaiters bit up to bit #7, we can atomically clear
the lock bit and test the waiters bit on x86 too. And architectures
with LL/SC (which is all the usual RISC suspects), the particular bit
doesn't matter, so they are fine with this approach too.
This avoids the extra access to the same atomic word, and thus avoids
the costly stall at page unlock time.
The only downside is that the interface ends up being a bit odd and
specialized: clear a bit in a byte, and test the sign bit. Nick doesn't
love the resulting name of the new primitive, but I'd rather make the
name be descriptive and very clear about the limitation imposed by
trying to work across all relevant architectures than make it be some
generic thing that doesn't make the odd semantics explicit.
So this introduces the new architecture primitive
clear_bit_unlock_is_negative_byte();
and adds the trivial implementation for x86. We have a generic
non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
combination) which can be overridden by any architecture that can do
better. According to Nick, Power has the same hickup x86 has, for
example, but some other architectures may not even care.
All these optimizations mean that my page locking stress-test (which is
just executing a lot of small short-lived shell scripts: "make test" in
the git source tree) no longer makes our page locking look horribly bad.
Before all these optimizations, just the unlock_page() costs were just
over 3% of all CPU overhead on "make test". After this, it's down to
0.66%, so just a quarter of the cost it used to be.
(The difference on NUMA is bigger, but there this micro-optimization is
likely less noticeable, since the big issue on NUMA was not the accesses
to 'struct page', but the waitqueue accesses that were already removed
by Nick's earlier commit).
Acked-by: Nick Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>