I have been reading The Art of Multiprocessor Programming 1 recently. Chapter 7 shows a way to implement mutex locks using registers supporting test-and-set operation. This documents my understanding of the locks and the stuff I learned when trying to implement the same in Java and Rust.
When programming with multiple threads, we sometimes require a mechanism to ensure that at every moment, "critical section" of the code is run at most by one thread. Other threads wishing to run this section of code must wait for the first thread to finish. This is called Mutual Exclusion. To achieve this, all the threads follow the below pattern:
L1. Let other threads know we are entering the critical section L2. +------------------+ | | | Critical Section | | | +------------------+ L3. Let other threads know we are done with the critical section
The part in
L1 is usually called "locking" or "acquiring a lock". The part in
L2 is called "unlocking" or "releasing the lock".
To implement locking, the threads need a way to communicate. One mode of communication is reading and writing to shared memory. There are some famous implementations like Peterson's algorithm 2, Filter lock, and Bakery Lock 3. An inherent problem with this mode of communication is that, within this model, we need to know in advance the maximum number of threads.
The situation can be improved by adding more ways to communicate beyond reading and writing to shared memory. One such operation is test-and-set. Performing this operation on a memory location atomically returns its current value and replaces it with
true. In Java, we can create an
var and call
var.getAndSet(true) to simulate the test-and-set operation.
Using this operation, locking and unlocking can be done in the following way -
The shared variable
occupied is initially set to
false. To acquire the lock, every thread waits till the variable is
false and atomically, sets the variable to
true. Since this is atomic because of the magic of the test-and-set instruction, no other thread will see the old value
false till this thread is done with the critical section. When it's indeed done, the variable is simply set back to
Implementing the same lock in Rust only involves simulating the test-and-set instruction. We can use Rust's AtomicBool present in the standard library. Rust's equivalent of Java's
swap. Surprisingly, this operation takes two arguments - the new value and an Ordering argument.
When reading about
Ordering, I realized that Java's
getAndSet does two distinct operations -
It atomically sets the value and returns the old value
Makes sure the instructions in critical sections are not executed before acquiring the lock
While the second point may look like a spurious requirement, compilers and hardware routinely reorder the instructions as long as they don't change the output in a single thread. In case of a single thread, it doesn't make any difference if a few instructions from the critical section are reordered to before the lock acquisition. So, the compiler and the hardware are free to do so. This will cause problems because we use the locks only because we want mutual exclusion between the threads running the critical section. If that guarantee is violated, we are not providing mutual exclusion. So, Java providing the second guarantee is indeed a good thing.
While Rust provides the first guarantee as-is, it takes a different approach with the second one. It asks us if we need this guarantee at all. We've to let Rust know our answer using the
Ordering option -
Ordering::Relaxedmeans we don't need the second guarantee
Ordering::SeqCstmeans the ordering present in the program is guaranteed. I think this is effectively Java's equivalent
Ordering::Releaseare guarantees that complement each other.
From what I understand, and
Acquireload guarantees that no subsequent instructions are reordered before this instruction and a
Releasestore guarantees that no previous instructions are reordered after this instruction.
We want the instructions of critical section to remain between the lock and unlock instructions. So, they shouldn't be reordered before the lock. So, we've to use
Acquire guarantee when locking. They shouldn't also be reordered after the unlock. So, we should use the
Release guarantee when unlocking. The final code becomes -
I'm still trying to wrap my mind around the crazy things that can happen in the world of multiprocessor programming. Let me know if I'm mistaken in any of these.
While test-and-set locks solve the problem of mutual exclusion, they have a pretty serious problem - all the threads repeatedly write to a variable. To maintain cache coherence, a core must propagate these writes to all the other cores. What is worse - these writes repeatedly write
true. So, these writes don't even change the value of the memory location but spam the interconnect's limited bandwidth.
The Art of Multiprocessor Programming by Maurice Herlihy, Nir Shavit