Kernel Synchronization II
Concept of race conditions in operating systems and the importance of atomicity in ensuring deterministic program execution. Learn how to mitigate race conditions to achieve reliable parallel programming.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Kernel Synchronization II David Ferry, Chris Gill, Brian Kocoloski, Marion Sudvarg, James Orr CSE 422S - Operating Systems Organization Washington University in St. Louis St. Louis, MO 63143 1
Review: Race Conditions Recall the problem from the last lecture, if X is a regular integer and two threads want to increment its value; the final value should be X+2 Thread 1: X++ Thread 2: X++ CSE 422S Operating Systems Organization 2
Review: Race Conditions Problem: if X is a regular integer and is accessed via Load/Store instructions, we have a race condition Thread 1: load X X = X + 1 store X Thread 2: load X X = X + 1 store X What are possible values for X after both threads have finished? X + 1 X + 2 CSE 422S Operating Systems Organization 3
Review: Race Conditions E.g., considering the following sequence of operations where threads 1 and 2 execute concurrently on two different cores Thread 1: load X Thread 2: load X X = X + 1 X = X + 1 store X finish time store X CSE 422S Operating Systems Organization 4
Review: Race Conditions A race condition exists if the output (maybe even state) of a program is non-deterministic Relies on the sequence or timing of events that are not programmatically controlled Parallel programs that execute deterministically need to be free of race conditions How might we go about resolving the problem in the previous example? CSE 422S Operating Systems Organization 5
Atomicity We need to have stronger atomicity guarantees on the execution of multiple semantically separate operations Why the term atomic An atom was (originally) thought to be an indivisible particle The term atomic should be thought of as guaranteeing that a specified sequence of operations which we say is made atomic either fully executes without interruption or does not execute at all CSE 422S Operating Systems Organization 6
Atomicity Thread 1: load X X = X + 1 store X Thread 2: load X X = X + 1 store X CSE 422S Operating Systems Organization 7
Atomicity Thread 1: start_atomic(){ load X X = X + 1 store X end_atomic() Thread 2: start_atomic(){ load X X = X + 1 store X end_atomic() Assume we have constructs start_atomic() and end_atomic() that make the load/increment/store sequence of instructions to be atomic Thus, we assume the operations of loading X, adding 1 to it, and storing its new value is a single indivisible operation CSE 422S Operating Systems Organization 8
Mechanisms to Enforce Atomicity Hardware-specific mechanisms Test and set Compare and exchange ARM exclusive monitors Portable Linux kernel mechanisms Atomic Variables Spin Locks Mutexes Readers/Writer Locks Sequential Locks Read Copy Update (RCU) CSE 422S Operating Systems Organization 9
Test and set Semantics (remember: the whole thing is made atomic by hardware, and the value the variable HAD is returned) int test_and_set(int * v, int to) { int old = *v; *v = to; return old; } CSE 422S Operating Systems Organization 10
Compare-and-Exchange Semantics (again, hardware makes the whole thing atomic, and the value the variable HAD is returned) int compare_and_exchange(int * v, int from, int to) { int old = *v; if (old == from) { *v = to; } return old; } CSE 422S Operating Systems Organization 11
Atomicity on ARM ARM does not provide either of these instructions It provides multiple instructions that are used with a monitor to provide synchronization ldrex: load a word from memory strex: conditional store of a word to memory CSE 422S Operating Systems Organization 12
Semantics of ldrex/strex LDREX R1, p ADD R1, 1 STREX R2, R1, p // (p <- memory address) i.e., Load a word from memory address p into register R1 Increment register R1 by 1 Conditionally store the value to p Determine whether the value of p was stored by checking the return value stored in register R2 What if the return failed? Go back and retry the whole thing CSE 422S Operating Systems Organization 13
Synchronization Mechanisms in Linux Atomic instructions are hardware-dependent. To maintain portability, the Linux kernel provides multiple synchronization mechanisms to achieve atomicity of sequences of actions. 1. Atomic Variables 2. Spin Locks 3. Mutexes (sometimes called sleeping locks) 4. Semaphores 5. Readers/Writer Locks 6. Sequential Locks 7. Read-Copy-Update CSE 422S Operating Systems Organization 14
Which Mechanisms to Use and When Atomic variables - When shared data fits inside a single word of memory, or a lock- free synchronization protocol can be implemented Spin locks - Very short lock durations and/or very low overhead requirements Mutexes - Long lock duration and/or process needs to sleep while holding lock Semaphores Allows for simultaneous access to critical section of up to n threads Readers/Writer Locks Critical sections where multiple readers can execute concurrently Sequential Locks Special application of spin locks to readers/writer locks that favors writers Read-Copy-Update (RCU) - Modern locking approach that may improve performance (link provided on course website) CSE 422S Operating Systems Organization 15
The BKL is No Longer Used The Big Kernel Lock (BKL) was the first lock introduced to the kernel Described in LKD pp 198-199 Locked the entire kernel Provides correctness, but very slow New uses are prohibited Gradually replaced with finer-grained locking schemes Entirely removed in Linux 2.6.39 CSE 422S Operating Systems Organization 16
Atomic Variables Opaque type atomic_t Allows for portable implementations of common atomic operations, e.g. atomic_set() atomic_add() atomic_add_return() See LKD pp. 178 A good building block, but a bit low-level, so other higher-level abstractions are useful CSE 422S Operating Systems Organization 17
Spin Locks Conceptually, a spin lock can be thought of as an integer variable that can have only two values: 0: unlocked 1: locked How might a spin lock address our race condition problem? To acquire a spin lock, a thread must wait until its value is 0 (unlocked), then set it to 1 (locked) To release a spin lock, set the value to 0 (unlocked) CSE 422S Operating Systems Organization 18
Spin Locks DEFINE_SPINLOCK(lock); // Thread 1: spin_lock(&lock) load X X = X + 1 store X spin_unlock(&lock) Thread 2: spin_lock(&lock) load X X = X + 1 store X spin_unlock(&lock) CSE 422S Operating Systems Organization 19
Spin Locks Semantics of locking a spin lock: void spin_lock(spinlock_t * lock) { while (lock->value == 1); lock->value = 1; } void spin_unlock(spinlock_t * lock) { lock->value = 0; } CSE 422S Operating Systems Organization 20
Spin Locks void spin_lock(spinlock_t * lock) { while (lock->value == 1); lock->value = 1; } void spin_unlock(spinlock_t * lock) { lock->value = 0; } Sounds great . or does it? Does anyone see any issues? What happens if multiple threads invoke spin_lock() at the exact same time? CSE 422S Operating Systems Organization 21
Dependence on Atomic Instructions We re back to the same problem we ve just shifted the race condition from the integer X to the integer that is used to implement the spin lock The solution relies on the hardware support for atomic operations we talked about earlier Because this support is hardware specific, the specific instructions used to provide atomicity vary on different architectures x86: test_and_set(), compare_and_exchange(), etc. ARM: ldrex, strex CSE 422S Operating Systems Organization 22
Test and set How to implement spin locks with test-and-set? #define LOCKED 1 #define UNLOCKED 0 void spin_lock(spinlock_t * lock) { while (test_and_set(&(lock->value), ?) == ?); } void spin_unlock(spinlock_t * lock) { assert(test_and_set(&(lock->value), ?) == ?); } CSE 422S Operating Systems Organization 23
Test and set How to implement spin locks with test-and-set? #define LOCKED 1 #define UNLOCKED 0 void spin_lock(spinlock_t * lock) { while (test_and_set(&(lock->value), LOCKED) == LOCKED); } void spin_unlock(spinlock_t * lock) { assert(test_and_set(&(lock->value), ?) == ?); } CSE 422S Operating Systems Organization 24
Test and set How to implement spin locks with test-and-set? #define LOCKED 1 #define UNLOCKED 0 void spin_lock(spinlock_t * lock) { while (test_and_set(&(lock->value), LOCKED) == LOCKED); } void spin_unlock(spinlock_t * lock) { assert(test_and_set(&(lock->value), UNLOCKED) == LOCKED); } CSE 422S Operating Systems Organization 25
Compare-and-Exchange How to implement spin_locks with compare- and-exchange? #define LOCKED 1 #define UNLOCKED 0 void spin_lock(spinlock_t * lock) { while (compare_and_exchange(&(lock->value), UNLOCKED, LOCKED) == LOCKED); } void spin_unlock(spinlock_t * lock) { assert(compare_and_exchange(&(lock->value), LOCKED, UNLOCKED) == LOCKED); } CSE 422S Operating Systems Organization 26
Spin Locks vs Mutexes Thread 1 Thread 2 Acquires spin lock Tries to acquire spin lock < spins (does not sleep) > Acquires spin lock <enters critical section> Releases spin lock <enters critical section> Releases spin lock CSE 422S Operating Systems Organization 27
Spin Locks vs Mutexes Thread 1 Thread 2 Acquires mutex Tries to acquire mutex < mutex not available, goes to sleep> < woken up > Acquires mutex <enters critical section> Releases mutex <enters critical section> Releases mutex Must also wake up processes waiting to acquire the mutex CSE 422S Operating Systems Organization 28
Sleeping while holding Locks Sleeping while holding a lock should be avoided (unless absolutely necessary) Delays other processes Deadlock risk: is it guaranteed to wake up (at least eventually) and release the lock? A process cannot sleep holding a spin lock Calling schedule() while holding a spin lock will trigger a kernel oops() Required to release lock, then sleep or, use a mutex instead of a spin lock CSE 422S Operating Systems Organization 29
Spin Locks vs Mutexes What to use (LKD pp. 197) Requirement Low overhead locking Solution Spin lock Short lock hold time Spin lock Long lock hold time Mutex Need to lock from interrupt context Spin lock Need to sleep while holding lock Mutex CSE 422S Operating Systems Organization 30
Spin Locks in Interrupts LKD pp 185: You must disable local interrupts before obtaining a spin lock in an interrupt handler to prevent deadlock. This is no longer true: local interrupts are now disabled in top-half interrupt handling by default in Linux kernel. Lock functions that disable interrupts (e.g. spin_lock_irq) are still relevant in bottom- half handlers that share locks with top- halves (we ll cover top half / bottom half processing later in the semester) CSE 422S Operating Systems Organization 31
Semaphores Allows for concurrent access to critical section Value initialized to number of threads that can enter down() decrements semaphore s value and acquires lock up() increments semaphore s value and releases lock If semaphore s value == 0, a thread calling down() sleeps and enters a queue, head of queue awakened when up() called again down() Critical Section up() Semaphore with initial value of 1 acts as a mutex CSE 422S Operating Systems Organization 32
Motivation for Readers/Writer Locks A common kernel problem: Multiple threads share a data structure. Some are reading Some are writing Shared data Reads and writes should not interfere! CSE 422S Operating Systems Organization 33
Synchronization Design Tradeoffs All synchronization methods need to balance the needs of readers and writers: Lots of reads Few writes Balanced Reads and writes Lots of writes Few reads Reads tend to be more common Synchronization can prevent concurrency Readers/writer locks Mutual exclusion Or it can allow concurrency at the expense of overheads: Lock free / wait free algorithms Transactional memory CSE 422S Operating Systems Organization 34
Readers/Writer Locks All readers can execute concurrently with no locking or synchronization Writers are mutually exclusive with all readers and all other writers Use case? In a linked list, inserting/deleting nodes (writers) vs. iterating through existing nodes (readers) Implemented in Linux kernel as rwlock_t in include/linux/rwlock.h CSE 422S Operating Systems Organization 35
Sequential Locks Specialized kernel implementation of readers/ writer locking. Favors writers over readers. Provides a spin lock to enforce mutual exclusion on writers. Defines an additional sequence variable incremented by writers. A reader checks sequence variable at beginning and end of read. If changed, must restart read. Pending writers might (though do not always, as claimed on LKD pp 200!) force readers to continually retry. CSE 422S Operating Systems Organization 36
Read-Copy-Update (RCU) Modern locking approach that may improve performance Allows concurrent execution from both readers and writers A data structure is accessed via a global pointer A writer: 1. Copies the data structure 2. Modifies the new copy 3. Updates the global pointer to point to the new structure 4. Sleeps until no more readers are in old structure 5. Removes old structure Link provided on course website to detailed description Note: Remember, Linux is always evolving. Since description was written, hlist_for_each_entry_rcu() now only requires single pointer (https://elixir.bootlin.com/linux/v5.4.42/source/include/linux/rculist.h#L635) CSE 422S Operating Systems Organization 37