Parallel Algorithms for Efficient Computation

parallel algorithms theory and practice n.w

1 / 21

Embed Share

Delve into the world of parallel algorithms, exploring theory and practical applications. Learn how code execution interfaces with hardware, the significance of analyzing work and span, and the intricacies of scheduling parallel computation. Discover the concepts of greedy scheduling, work-stealing schedulers, and more for optimizing computational efficiency in parallel processing.

tute492 Follow

Uploaded on Mar 19, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Parallel Algorithms: Theory and Practice CS260 CS260 Lecture 13 Lecture 13 Yan Gu Yan Gu Scheduling Parallel Computation

parallel_for (int i=0; i<n; i++) a[i] = f(a[i]); How is your code actually executed on hardware? Why analyzing work and span?

parallel_for (int i=0; i<n; i++) a[i] = f(a[i]);

parallel_for (int i=0; i<n; i++) a[i] = f(a[i]);

Treat the computation as a DAG Function SUM(A) If |A| = 1 then return A(1) In Parallel a = SUM(first half of A) b = SUM(second half of A) return a + b

Greedy scheduler I IDEA DEA: : Do as much as possible on every step

Greedy scheduler I IDEA DEA: : Do as much as possible on every step P = 3 Either execute ? operations

Greedy scheduler I IDEA DEA: : Do as much as possible on every step P = 3 Either execute ? operations Or reduce the span by 1 ? ? ?+ ?

Greedy scheduler Impractical: Assumes processors/threads run in lockstep Big overhead in context switching Different operations have very different costs P = 3

Work-stealing scheduler Full details in 6.172: Performance Engineering of Software Systems (Cilk implementation) P = 3 If a processor spawns two tasks at a FORK, it continues execution with one of the spawned subtasks, and push the other subtask to the front its queue If a processor completes a task, it tries to pull a task from the front of its own queue If a processor finishes all tasks in its own queue, it randomly selects another processor, and steals a task from the end of the victim queue (retry if failed)

Work-stealing scheduler P = 3 P P P P P P

Work-stealing scheduler P = 3 P P P P P P

parallel_for (int i=0; i<n; i++) a[i] = f(a[i]);

Overhead of work-stealing scheduler P = 3 Bound the number of steals (whp): ? ?? P P P P P P

Overhead of work-stealing scheduler Bound the number of steals (whp): ? ?? Running time (whp): ? =? + ? ?? =? ?+ ? ? ?

Proof sketch Consider one specific path How many steals do we need to help to finish this path? We want to show that ? ?? steals are sufficient to steal ? tasks whp

Simplest case: steals attempted one by one Each steal has 1/ ? 1 probability to steal one task Chernoff bound: for ? independent random variables in {0,1}, let ? be the sum, and ? = E ? , then for any 0 < ? < 1, Pr ? 1 ? ? ? ?2? 2

Simplest case: steals attempted one by one Each steal has 1/ ? 1 probability to steal one task Let s say we have 2 ? 1 ? + ln 1/? steal attempts Expected steals: ? = 2 ? + ln 1/? The probability that we have at least ? successful steals from 2 ? 1 ? + ln 1/? If we have less than ? steals, then ? = ? ? /?, and attempts is 1 ? 2? = ? (? ?)2/? ? ?2? ? ? /? 2 = ? < ?2? ? /2= ? ln 1/?= ? 2 2 Pr ? 1 ? ? ? ?2? 2

Overhead of work-stealing scheduler The number of steals (whp): ? ?? Running time (whp): ? =? + ? ?? =? ?+ ? ? ?

Successful steals can be expensive The number of steals (whp): ? ?? Physical communication between two processors Can lead to considerably more cache misses Physical communication between two processors Can lead to considerably more cache misses Coarsening will not increase #SuccSteal

Design and Analysis of Parallel Algorithms Work Parallelism for work: ? Work ?, depth , depth ?, I/O cost , I/O cost ? (sequential / random) (sequential / random) Parallelism for work: ? ? ?, ? Time for I/O: max Time for I/O: ???? Number of steals: ?(??) Most combinatorial algorithms are I/O bottlenecked Number of steals: Most combinatorial algorithms are I/O bottlenecked 21

Parallel Algorithms for Efficient Computation

Download Presentation

Presentation Transcript

Related

More Related Content