Nice piece of Guava concurrency infrastructure in the latest release:
A striped Lock/Semaphore/ReadWriteLock. This offers the underlying lock striping similar to that of ConcurrentHashMap in a reusable form, and extends it for semaphores and read-write locks. Conceptually, lock striping is the technique of dividing a lock into many stripes, increasing the granularity of a single lock and allowing independent operations to lock different stripes and proceed concurrently, instead of creating contention for a single lock.
The guarantee provided by this class is that equal keys lead to the same lock (or semaphore), i.e. if (key1.equals(key2)) then striped.get(key1) == striped.get(key2) (assuming Object.hashCode() is correctly implemented for the keys). Note that if key1 is not equal to key2, it is not guaranteed that striped.get(key1) != striped.get(key2); the elements might nevertheless be mapped to the same lock. The lower the number of stripes, the higher the probability of this happening.
Prior to this class, one might be tempted to use Map
, where K represents the task. This maximizes concurrency by having each unique key mapped to a unique lock, but also maximizes memory footprint. On the other extreme, one could use a single lock for all tasks, which minimizes memory footprint but also minimizes concurrency. Instead of choosing either of these extremes, Striped allows the user to trade between required concurrency and memory footprint. For example, if a set of tasks are CPU-bound, one could easily create a very compact Striped of availableProcessors() * 4 stripes, instead of possibly thousands of locks which could be created in a Map structure.
‘In this article I have collected a list of options related to GC tuning in JVM. This is not a comprehensive list, I have only collected options which I use in practice (or at least understand why I may want to use them). Compared to previous version a few useful diagnostic options was added. Additionally section for G1 specific options was introduced.’
Contains these millisecond estimates for highly-contended inter-thread signalling when incrementing a 64-bit counter in java:
One Thread300Undoubtedly not realistic for a lot of cases, but it’s still useful for order-of-magnitude estimates of locking cost. Bottom line: don’t lock if you can avoid it, even with ‘volatile’ or AtomicFoo types.
One Thread with Memory Barrier4,700
One Thread with CAS5,700
Two Threads with CAS18,000
One Thread with Lock10,000
Two Threads with Lock118,000
Firstly, this is 3 orders of magnitude greater latency than what I illustrated in the previous article using just memory barriers to signal between threads. This cost comes about because the kernel needs to get involved to arbitrate between the threads for the lock, and then manage the scheduling for the threads to awaken when the condition is signalled. The one-way latency to signal a change is pretty much the same as what is considered current state of the art for network hops between nodes via a switch. It is possible to get ~1µs latency with InfiniBand and less than 5µs with 10GigE and user-space IP stacks. Secondly, the impact is clear when letting the OS choose what CPUs the threads get scheduled on rather than pinning them manually. I’ve observed this same issue across many use cases whereby Linux, in default configuration for its scheduler, will greatly impact the performance of a low-latency system by scheduling threads on different cores resulting in cache pollution. Windows by default seems to make a better job of this.
nice write-up. nginx, Rails, RabbitMQ, MySQL, Cassandra, Elastic Search, HAProxy