Thread Synchronization Mechanisms in pthread Programs

thread synchronization n.w
1 / 33
Embed
Share

Explore the importance of thread synchronization, mutex, conditions, and variables in pthread programming to ensure deterministic outcomes and avoid race conditions. Learn about typical types of synchronizations and discover how signal ordering can control the execution of threads. Additionally, consider potential issues in a pthread example code and the need for atomic operations in threaded programs.

  • Thread Synchronization
  • Mutex
  • Condition Variables
  • pthread Programming
  • Parallel Execution

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Thread Synchronization Mutex Condition variables There are other synchronization mechanisms in pthread Reading APUE 11.6 1

  2. Thread Synchronization Threads in mm_pthread.c and pi_pthread.c have very minor interactions. All computations are independent (essential for parallel execution) Dependencies in a program can cause problems in parallel execution. for (i=0; i<100; i++) for (i=0; i<100; i++) a[i] = 0; a[i+1] = a[i]+1; 2

  3. Thread Synchronization Most threaded programs interact with one another. Interaction in the form of sharing access to variables. Multiple concurrent reads (ok) Multiple concurrent writes (not ok, outcome non-deterministic) One write, multiple reads (not ok, outcome non-deterministic) Needs to make sure that the outcome is deterministic. Synchronization: allowing concurrent accesses to variables, removing non-deterministic outcome by enforcing the order of thread execution. 3

  4. Thread synchronization Typical types of synchronizations. Mutual exclusion (mutex in pthread): Thread 2: insert B to tree Thread 1: insert A to tree Thread 2: lock(tree) insert B to tree unlock(tree) Thread 1: lock(tree) insert A to tree unlock(tree) 4

  5. Thread Synchronization Signal (ordering the execution of threads, condition variable) Thread 1: Thread 2: Thread 3: for (i=0; i<25; i++) for (i=25; i<50; i++) for (i=50; i<75;i++) a(i+1) = a(i)+1 a(i+1) = a(i) +1; a(i+1) = a(i)+1; Thread 1: Thread 2: Thread 3: for (i=0; i<25; i++) a(i+1) = a(i)+1 signal a(25) ready wait for a(25) ready for(i=25;i<50;i++) a(i+1) = a(i)+1 signal a(50) ready wait for a(50) ready 5

  6. A pthread Example (example1.c) int counter = 0; void *thread_producer(void *arg) { int val; /* produce a product */ counter++; return NULL; } Could there be any problem in this code? 6

  7. An Example (example1.c) int counter = 0; void *thread_producer(void *arg) { int val; /* produce a product */ counter++; /* this may not be atomic */ return NULL; } Most constructs in the high level language are not atomic!! Need to make them atomic explicitly in a threaded program. Solution: mutex 7

  8. Mutex Variables Mutex = abbreviation for mutual exclusion Primary means of implementing thread synchronization and protecting shared data with multiple concurrent writes. A mutex variable acts like a lock Multple threads can try to lock a mutex, only one will be successful; other threads will be blocked until the owning thread unlock that mutex. 8

  9. Mutex Variables A typical sequence in the use of a mutex is as follows: Create and initialize a mutex variable Several threads attempt to lock the mutex Only one succeeds and that thread owns the mutex The owner thread performs some set of actions The owner unlocks the mutex Another thread acquires the mutex and repeats the process Finally the mutex is destroyed 9

  10. Mutex Operations Creation: pthread_mutex_t my = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_init(pthread_mutex_t *mutex, pthread_mutexattr_t *attr) Destroying: pthread_mutex_destroy(pthread_mutex_t *mutex); Locking and unlocking mutexes pthread_mutex_lock(pthread_mutex_t *mutex); pthread_mutex_trylock(pthread_mutex_t *mutex); pthread_mutex_unlock(pthread_mutex_t *mutex); 10

  11. Mutex Example (example3.c) int counter = 0; ptread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; void *thread_func(void *arg) { int val; /* protected by mutex */ pthread_mutex_lock( &mutex ); val = counter; counter = val + 1; pthread_mutex_unlock( &mutex ); How about Making mutex a local variable? (example2.c) return NULL; } 11

  12. Condition Variable Waiting and signaling on condition variables Routines pthread_cond_wait(condition, mutex) Blocks the thread until the specific condition is signalled. Should be called with mutex locked Automatically release the mutex lock while it waits When return (condition is signaled), mutex is locked again pthread_cond_signal(condition) Wake up a thread waiting on the condition variable. Called after mutex is locked, and must unlock mutex after pthread_cond_broadcast(condition) Used when multiple threads blocked in the condition 12

  13. Condition Variable for signaling Think of producer consumer problem Producers and consumers run in separate threads. Producer produces data and consumer consumes data. Producer has to inform the consumer when data is available Consumer has to inform producer when buffer space is available 13

  14. Without Condition Variables

  15. /* Globals */ int data_avail = 0; pthread_mutex_t data_mutex = PTHREAD_MUTEX_INITIALIZER; void *producer(void *) { Pthread_mutex_lock(&data_mutex); Produce data Insert data into queue; data_avail=1; Pthread_mutex_unlock(&data_mutex); } 15

  16. void *consumer(void *) { while( !data_avail ); /* do nothing keep looping!!*/ pthread_mutex_lock(&data_mutex); Extract data from queue; if (queue is empty) data_avail = 0; pthread_mutex_unlock(&data_mutex); consume_data(); } See example5.c 16

  17. With Condition Variables

  18. int data_avail = 0; pthread_mutex_t data_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_cont_t data_cond = PTHREAD_COND_INITIALIZER; void *producer(void *) { pthread_mutex_lock(&data_mutex); Produce data Insert data into queue; data_avail = 1; pthread_cond_signal(&data_cond); pthread_mutex_unlock(&data_mutex); } 18

  19. void *consumer(void *) { pthread_mutex_lock(&data_mutex); while( !data_avail ) { /* sleep on condition variable*/ pthread_cond_wait(&data_cond, &data_mutex); } /* woken up */ Extract data from queue; if (queue is empty) data_avail = 0; pthread_mutex_unlock(&data_mutex); consume_data(); } See example6.c 19

  20. Master-worker design pattern Distributing computation among threads is an essential issue in parallel computing o For some problems, we can do this by deriving the sub- problem domain for each thread from nprocs and myid. Good for physical simulation (3D domain) or matrix operations No trivial for non-Cartesian domain. E.g. a shared tree o Master-worker paradigm is another common design pattern for distributing computation among threads More generic approach

  21. Master-worker design pattern Master: usually the main thread/process, performs the sequential part of the program, does bookkeeping, and initializes workers, computes and distributes computation among worker processes. Workers: gets tasks from the master, performs heavy duty computation tasks in parallel, and return results to the master until no more work to do

  22. Master-worker pi Worker: receives work (lower, upper, ii), performs the loop for (;;) { getwork(&lower, &upper, &index); // get values for lower, upper, and index If (lower == -1) pthread_exit(); for (i = lower; i < upper; i ++) { x = h * ((double)i - 0.5); sum += 4.0 / (1.0 + x*x); } // do the work psum[ii] = sum; // store the results } 22

  23. Master-worker pi Compute works and store in a data structure gLower[], gUpper[] the value can be anything as long as the whole iteration range is covered. total_work total segments of works cur_index next segment to be worked on. Accessing the global work data structure needs to be protected (work must be assigned to one thread at a time, use mutex). 23

  24. Master-worker pi Accessing the global work data structure needs to be protected by mutex getwork(int *lower, int * upper, int *index) { pthread_mutex_lock(&work); if (cur_index < total_work) { *lower = gLower[cur_index]; *upper = gUpper[cur_index]; *index = cur_index; cur_index = cur_index + 1; // remove the work } else { *lower = *upper = *index = -1; } pthread_mutex_unlock(&work); 24

  25. Master-worker pi Put it all together, see pi_masterworker.c Master work paradigm is very commonly used in parallel and distributed computing. It has many variations. With Master-worker design, one can compute the distribution in any way. This allows better load balancing. A parallel program is load balanced when all threads do a similar amount of computation (or all threads finish at similar time). Is nprocs, myid approach load balanced? What can we do if we want to do better? 25

  26. An multi-threaded application: the Traveling Salesman Problem (TSP) Input: a list a city and a matrix of distances between them, and a starting city. Goal: Find the shortest tour in which all cities are visited exactly once. An example of a well-known NP-complete problem 26

  27. The branch and bound algorithm for TSP Initialization: Go from the starting city to each of the remaining cities Put resulting partial path into priority queue, order by its current length Further (repeatedly): Take head element out of priority queue Expand by each one remaining cities Put resulting partial path into the priority queue. 27

  28. Finding the solution When a complete path is found, check if the path is better than the current best tour, and update the best tour if needed. When the priority queue is empty, best path is found. 28

  29. Bound Once a tour is found, the upper bound on the length of the shortest is known No need to exploring partial paths that are already longer than the bound Remove from the queue. 29

  30. Sequential TSP (omit bounding) init_q(); init_best(); While ((p=de_queue())!=NULL) { for each expansion by one city { q = add_city(p); if (complete(q)) {update_best(q);} else {en_queue(q);} } } 30

  31. Parallel TSP Have each process do one expansion. Have each process do expansion of one partial path. Have each process do expansion of multiple partial paths. All correct, a matter of performance/granularity. 31

  32. Parallel TSP Thread i: while ((p = de_queue()) != NULL) { for each expansion by one city { q = add_city(p); if (complete(q)) {update_best(q)} else { en_queue(q); } } } 32

  33. Synchronization in Parallel TSP In de_queue: wait if q is empty Done with all threads are in de_queue. In en_queue: signal that q is no longer empty. Q(en_queue and de_queue) and best path are shared: should be protected by mutex. 33

Related


More Related Content