# Test-and-set locks and Rust's Ordering

## Aim

I have been reading The Art of Multiprocessor Programming 1 recently. Chapter 7 shows a way to implement mutex locks using registers supporting test-and-set operation. This documents my understanding of the locks and the stuff I learned when trying to implement the same in Java and Rust.

## Test-and-set locks

When programming with multiple threads, we sometimes require a mechanism to ensure that at every moment, "critical section" of the code is run at most by one thread. Other threads wishing to run this section of code must wait for the first thread to finish. This is called Mutual Exclusion. To achieve this, all the threads follow the below pattern:

L1. Let other threads know we are entering the critical section

L2. +------------------+
|                  |
| Critical Section |
|                  |
+------------------+

L3. Let other threads know we are done with the critical section


The part in L1 is usually called "locking" or "acquiring a lock". The part in L2 is called "unlocking" or "releasing the lock".

To implement locking, the threads need a way to communicate. One mode of communication is reading and writing to shared memory. There are some famous implementations like Peterson's algorithm 2, Filter lock, and Bakery Lock 3. An inherent problem with this mode of communication is that, within this model, we need to know in advance the maximum number of threads.

The situation can be improved by adding more ways to communicate beyond reading and writing to shared memory. One such operation is test-and-set. Performing this operation on a memory location atomically returns its current value and replaces it with true. In Java, we can create an AtomicBoolean variable var and call var.getAndSet(true) to simulate the test-and-set operation.

Using this operation, locking and unlocking can be done in the following way -

  1 2 3 4 5 6 7 8 9 10 11 public class TASLock { AtomicBoolean occupied = new AtomicBoolean(false); public void lock() { while(occupied.getAndSet(true)){} } public void unlock(){ occupied.set(false); } } 

The shared variable occupied is initially set to false. To acquire the lock, every thread waits till the variable is false and atomically, sets the variable to true. Since this is atomic because of the magic of the test-and-set instruction, no other thread will see the old value false till this thread is done with the critical section. When it's indeed done, the variable is simply set back to false.

### Implementation in Rust

Implementing the same lock in Rust only involves simulating the test-and-set instruction. We can use Rust's AtomicBool present in the standard library. Rust's equivalent of Java's getAndSet is swap. Surprisingly, this operation takes two arguments - the new value and an Ordering argument.

When reading about Ordering, I realized that Java's getAndSet does two distinct operations -

• It atomically sets the value and returns the old value

• Makes sure the instructions in critical sections are not executed before acquiring the lock

While the second point may look like a spurious requirement, compilers and hardware routinely reorder the instructions as long as they don't change the output in a single thread. In case of a single thread, it doesn't make any difference if a few instructions from the critical section are reordered to before the lock acquisition. So, the compiler and the hardware are free to do so. This will cause problems because we use the locks only because we want mutual exclusion between the threads running the critical section. If that guarantee is violated, we are not providing mutual exclusion. So, Java providing the second guarantee is indeed a good thing.

While Rust provides the first guarantee as-is, it takes a different approach with the second one. It asks us if we need this guarantee at all. We've to let Rust know our answer using the Ordering option -

• Choosing Ordering::Relaxed means we don't need the second guarantee

• Choosing Ordering::SeqCst means the ordering present in the program is guaranteed. I think this is effectively Java's equivalent

• Choosing Ordering::Acquire and Ordering::Release are guarantees that complement each other.

From what I understand, and Acquire load guarantees that no subsequent instructions are reordered before this instruction and a Release store guarantees that no previous instructions are reordered after this instruction.

We want the instructions of critical section to remain between the lock and unlock instructions. So, they shouldn't be reordered before the lock. So, we've to use Acquire guarantee when locking. They shouldn't also be reordered after the unlock. So, we should use the Release guarantee when unlocking. The final code becomes -

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 use std::sync::atomic::AtomicBool; struct TASLock { busy: AtomicBool, } impl TASLock { fn new() -> Self { Self { busy: AtomicBool::new(false), } } fn lock(&self) { while self.busy.swap(true, Ordering::Acquire) {} } fn unlock(&self) { self.busy.store(false, Ordering::Release); } } 

I'm still trying to wrap my mind around the crazy things that can happen in the world of multiprocessor programming. Let me know if I'm mistaken in any of these.

## Problems with test-and-set locks

While test-and-set locks solve the problem of mutual exclusion, they have a pretty serious problem - all the threads repeatedly write to a variable. To maintain cache coherence, a core must propagate these writes to all the other cores. What is worse - these writes repeatedly write true. So, these writes don't even change the value of the memory location but spam the interconnect's limited bandwidth.

## References

1

The Art of Multiprocessor Programming by Maurice Herlihy, Nir Shavit

2

https://en.wikipedia.org/wiki/Peterson's_algorithm

3

https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm