Matrix Multiplication: Theory and Practice in Parallel Algorithms

parallel algorithms theory and practice n.w
1 / 37
Embed
Share

Explore the concepts of matrix multiplication, solving recurrences using the Master Theorem, and applications solvable in parallel. Learn about standard iterative matrix multiplication algorithms and how to compute cells in parallel for efficient processing. Stay updated on homework deadlines and access course materials for further learning.

  • Matrix Multiplication
  • Parallel Algorithms
  • Recurrences
  • Master Theorem
  • Iterative Algorithms

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Parallel Algorithms: Theory and Practice CS260 CS260 Lecture 3 Yihan Sun Lecture 3 Yihan Sun * Some of the slides are from MIT 6.712, 6.886 and CMU 15-853.

  2. Last Lecture Two scan algorithms Divide-and-conquer Reduce the problem size Two scan algorithms Computational models PRAM Fork-join: N-way vs. binary Atomic primitives Computational models Implement parallel algorithms using C++ I ll ask for your feedback (about) every two lectures, you can be prepared for that Implement parallel algorithms using C++ I ll ask for your feedback (about) every two lectures, you can be prepared for that 2

  3. In this lecture Matrix multiplication Solve recurrences using Master Theorem Some applications that can be solved in parallel Matrix multiplication Solve recurrences using Master Theorem Some applications that can be solved in parallel 3

  4. Homework 1 The deadline has been extended to 27 Code submitted to If you want to use the late days, let me know The deadline has been extended to 27th Code submitted to iLearn If you want to use the late days, let me know thJan Jan 5 more days 5 more days iLearn After this lecture, you ll be able to solve all the problems in Homework 1 After this lecture, you ll be able to solve all the problems in Homework 1 Course Website: https://www.cs.ucr.edu/~yihans/teaching/palgo.html Under my UCRhomepage 4

  5. Matrix Multiplication 5

  6. Matrix Multiplication Consider standard iterative matrix Consider standard iterative matrix- -multiplication algorithm multiplication algorithm := Z X Y Where ?, ?, and ? are ? ? matrices for i = 1 to N do for j = 1 to N do for k = 1 to N do Z[i][j] += X[i][k] * Y[k][j] (?3) computation in RAM model.

  7. for i = 1 to N do for j = 1 to N do for k = 1 to N do Z[i][j] += X[i][k] * Y[k][j] Matrix Multiplication Each Z[ Compute each cell in Z in parallel ?(?3) work, ?(?) depth Each Z[i i][j] is computed independently ][j] is computed independently par_for i = 1 to N do par_for j = 1 to N do for k = 1 to N do Z[i][j] += X[i][k] * Y[k][j] Compute each cell in ? in parallel using a parallel reduce ?(?3) work, ? log? depth par_for i = 1 to N do par_for j = 1 to N do par_for k = 1 to N do temp[k] = X[i][k] * Y[k][j]; Z[i][j] = parallel_reduce(temp, N); 7

  8. Recursive Matrix Multiplication Compute 8 submatrix products recursively Z11:= X11Y11+ X12Y21 Z12:= X11Y12+ X12Y22 Z21:= X21Y11+ X22Y21 Z22:= X21Y12+ X22Y21 Z11 Z21Z22 Z12 X11 X21X22Y21 X12 Y11 Y12 Y22 := 8 8- -way divide In binary forking: ? ? = ? ??+ ?? way divide- -and In binary forking: and- -conquer conquer ? ? ? ? - - ??? ? is from binary forking. It is It is ? ? for arbitrary for arbitrary- -way forking is from binary forking. way forking ? ? = ? ???? + ?

  9. Matrix Multiplication ? ? ? 2 ? ? = ?2+ 8? ? ? = ? ??+ ?? Omit constant: ? ? = ?2+ 8? Omit constant: 2 ? 2 ? 4 = ?2+ 8 + 8? ? 2 2 2 ? 2 ? 4 ? 8 = ?2+ 8 + 64 + 8? 2 ? 2 ? 4 ? 2 ? 4 ? 4 ? 8 ? = + 8? = (log2? rounds) = ?2+ 2?2+ 4?2+ + 8log2? = ?(?3) 2 ? = + 8? 2 ? 2log2? ? ? ? ? = ? ???? + ? ? ? = ?(log2?) 9

  10. How to solve a recurrence in general? 10

  11. Solving recurrences Master Theorem The Master Method for solving divide recurrences applies to the recurrences in the form: ? ? = ?? Where ? ? is a constant when The Master Method for solving divide- -and recurrences applies to the recurrences in the form: and- -conquer conquer ? ? + ? ? Where ? ?,? ?, and is a constant when ? is a constant , and ? is asymptotically positive. is a constant is asymptotically positive. 11

  12. ? ? Recursion Tree: ? ? = ?? + ?(?) 1 task, each ?(?) ?(?) ? ?(?/?) ? tasks, each ?(? ?(?/?) ?(?/?) ?) ? ? = ? = log?? ?(?/?2) ?(?/?2) ?(?/?2) ?2 tasks, each ?(? ?2) Idea: Compare ?log?? with ?(?) ?(?(1)) ?log?? tasks, each ?(1) = (?log??) 12

  13. ? ? Recursion Tree: ? ? = ?? ?log??> ?(?) + ?(?) Case 1, ? ? = ?(??), where ? < ????? e.g., when log?? = 3 but ? ? = ?2 GEOMETRICALLY INCREASING (LEAF- DOMINATED) ? ? = (?log??) 1 task, each ?(?) ? ? ?? = ? ?? ? ? ? ? ? ? ???? ? tasks, each ?(? ?? ? ??= ?) ? = ? = log?? ? 2 ? ?2 ?2 ? ?2 ? ?? ?2? ?? = ?2 tasks, each ?(? ?2) log?? terms Grow geometrically, common ratio is ? ??> 1 ?log?? tasks, each ?(?(1)) = (?log??) log?? ? ?? ?log??= ?? 13

  14. ? ? Recursion Tree: ? ? = ?? ?log?? ?(?) + ?(?) Case 2, ? ? = ?(???????), where ? = ????? e.g., when log?? = 1 and ? ? = ?log? ARITHMETICALLY INCREASING ? ? = (?log??log?+1?) 1 task, each ?(?) ? ? = ??logk? ? ? = ? ? ? log?? ?? = ? ? ? tasks, each ?(? ? ?? ??log?? ? ?2 ?2 ? ?? ?) ? ?(?) ? ?2 ? = ? = log?? ? log?? ?2? ?2 tasks, each ?(? ? ?2) 2 ??log?? = ?2 ?(?) log?? terms Same order of magnitude, ? ? = (??log?+1?) ?log?? tasks, each ?(?(1)) = (?log??) log?? ? ?? ?log??= ?? 14

  15. ? ? Recursion Tree: ? ? = ?? ?log??< ?(?) + ?(?) Case 3, ? ? = ?(??), where ? > ????? e.g., when log?? = 2 but ? ? = ?3 1 task, each ?(?) Generally case 3 does not imply anything. But if ?(?) satisfies the regularity condition that ??(?/?) ??(?) for some constant ? < 1, then ? tasks, each ?(? ?) GEOMETRICALLY DECREASING (ROOT- DOMINATED) ? ? = (?(?)) ? = ? = log?? ?2 tasks, each ?(? ?2) ?log?? tasks, each ?(1) = (?log??) E.g., when ? ? = ?? where ? > log??. ? ? = (??) 15

  16. ? ? Recursion Tree: ? ? = ?? + ?(?) 1 task, each ?(?) ?(?) ? ?(?/?) ? tasks, each ?(? ?(?/?) ?(?/?) ?) ? ? = ? = log?? ?(?/?2) ?(?/?2) ?(?/?2) ?2 tasks, each ?(? ?2) You can always analyze a specific algorithm using the tree ?(?(1)) ?log?? tasks, each ?(1) = (?log??) 16

  17. Master Theorem ? ?+ ?(?), where and constant ?,? > ? Solve Let Case 1: Case 2: ? ? ? ???? Case 3: ?(? ? ) Solve ? ? = ?? Let ? = ????? and constant Case 1: ? ? = ?(?? ?) ? ? = ?(??) Case 2: ? ? = ?(???????) ? ? = ? ??????+?? = , where ? ? and and ? > ? Case 3: ? ? = ?(??+?) and regularity condition and regularity condition ? ? = 17

  18. Matrix multiplication ? ?+ ?(?), where Solve ? ? = ?? ? ? and ? > ? Let ? = ????? and constant ?,? > ? Case 1: ? ? = ?(?? ?) ? ? = ?(??) Case 2: ? ? = ?(???????) ? ? = ?(??????+??) Case 3: ? ? = ?(??+?) and regularity condition ? ? = ?(? ? ) ? ? ? ? = ? ??+ ?? ? = 8,? = 2,log?? = 3,? ? = ?2 ? ? < ?3 => Case 1, ? ? = (?3) ? ? ? ? = ? ???? + ? ? = 1,? = 2,log?? = 0,? ? = log? ? ? = ?0log1? => Case 2 with ? = 1, ? ? = log?+1? = (log2?) 18

  19. Master Theorem Quiz ? ?+ ?(?), where Solve ? ? = ?? ? ? and ? > ? Let ? = ????? and constant ?,? > ? Case 1: ? ? = ?(?? ?) ? ? = ?(??) Case 2: ? ? = ?(???????) ? ? = ?(??????+??) Case 3: ? ? = ?(??+?) and regularity condition ? ? = ?(? ? ) ? ?+ ? ? ? = ?? ? = 4,? = 2,? = logb? = 2 . ? ? = ?1 case 1: ? ? = (?2) ? ? = ?? ? = 4,? = 2,? = logb? = 2 . ? ? = ?2log0? case 2: ? ? = (?2log?) ? ? = ?? ? = 4,? = 2,? = logb? = 2 . ? ? = ?3 And regularity condition holds case 3: ? ? = (?3) ? ?+ ?? ? ?+ ?? 19

  20. Use recurrence to compute work and depth Case 1: S = execute S2 after finishing S1 Case 1: S = execute S2 after finishing S1 ? ? = ? ?1 + ? ?2 ? ? = ? ?1 + ?(?2) S1; S2; Case 2: S = execute S2 and S1 in parallel Case 2: S = execute S2 and S1 in parallel ? ? = ? ?1 + ? ?2 ? ? = max ? ?1 + ? ?2 In parallel: S1; S2; 20

  21. Flatten algorithm 21

  22. Flatten Algorithm Given a nested sequence (i.e., one 1 Input: ?[?] (? = 1..?) stores the head pointer of ? arrays, ? ? size of array ?, ? ? [?] is the j-th element in array ?[?] Output: array ? concatenating all arrays ?[?] flatten([5,1,2], [3,1],[9,3,5]) = [5,1,2,3,1,9,3,5] Given a nested sequence (i.e., ? 1 1- -D arrays), output them in one 1- -D array D arrays), output them in D array is the 22

  23. Flatten(A, n) { parallel_for (i = 0 to n) S[i] = |A[i]|; offset = scan_exclusive(S, n); parallel_for (i = 0 to n) { off = offset[i]; parallel_for (j = 0 to S[i]) B[off+j]=A[i][j]; } return B;} Flatten algorithm Prefix sum Prefix sum Array size Array size 0 0 3 3 8 8 3 3 5 5 2 2 3 3 4 4 a0 a1 a2 b0 b1 b2 b3 b4 Element b2 is at location 3+2=5 c0 c1 Element e1 is at location 13+1=14 10 13 10 13 d0 d1 d2 Work: ?(?) Depth: ?(log?) ? is the total size of the arrays e0 e1 e2 e3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 a0 a1 a2 b0 b1 b2 b3 b4 c0 c1 d0 d1 d2 e0 e1 e2 e3 23

  24. Constructing a Tableau 24

  25. atcacacb Edit Distance: Dynamic Programming acab Given two strings A and B, the edit distance of A and B is the least number of edits (inserts and deletes) to modify A to B S[ S[i i][j] is the edit distance of A[1..i] and B[1..j] S[ S[i i][j] = min of S[i-1][j-1] if A[i]==B[j]: convert A[1..i-1] to B[1..j-1] and keep A[i] since A[i]=B[j] S[i-1][j]+1: convert A[1..i-1] to B[1..j] and delete A[i] S[i][j-1]+1: convert A[1..i] to B[1..j-1] and insert B[j] at the end Given two strings A and B, the edit distance of A and B is the least number of edits (inserts and deletes) to modify A to B ][j] is the edit distance of A[1..i] and B[1..j] ][j] = min of Case 1: just convert atcaca to aca (if A[i]==B[j]) S[i][j]=S[i-1][j-1] A[1..i]= B[1..j]= atcacac acac Take the minimum of them Case 2: convert atcaca to acac, and delete the last c in A S[i][j]=S[i-1][j]+1 A[1..i]= B[1..j]= atcacac acac Case 3: convert atcacac to aca, and insert a c at the end of A S[i][j]=S[i][j-1]+1 A[1..i]= B[1..j]= atcacac acac

  26. Edit Distance: Dynamic Programming S[ S[i i][j] = min of S[i-1][j-1] if A[i]==B[j] S[i-1][j]+1 S[i][j-1]+1 B ][j] = min of Note: can be filled in any order as long as the cells to the left and above are filled. Can follow path back through matrix to construct edits. Note: can be filled in any order as long as the cells to the left and above are filled. Can follow path back through matrix to construct edits. a t c a c a c 0 1 2 3 4 5 6 7 insert delete common t 1 2 1 2 3 4 5 6 c 2 3 2 1 2 3 4 5 A a 3 2 3 2 1 2 3 4 t 4 3 2 3 2 3 4 5

  27. Edit Distance: Dynamic Programming Sequential code: for i = 1 to n M[i,1] = i; for j = 1 to m M[1,j] = j; Sequential code: for i = 2 to n for j = 2 to m if (A[i] == B[j]) M[i,j] = M[i-1,j-1]; else M[i,j] = 1 + min(M[i-1,j],M[i,j-1]); 15-853 Page27

  28. Constructing a Tableau Fill in an A[ Fill in an ? ? tableau A[i i][j]=f(A[ tableau ?, where ][j- -1], A[i , where 1][j], A[i- -1][j Dynamic Programming Longest common subsequence Edit distance ][j]=f(A[i i][j 1], A[i- -1][j], A[i 1][j- -1]) Dynamic Programming 1]) 28

  29. Constructing a Tableau ? 2 Work: ? ? = 4? Master Theorem case 1: ? = 4,? = 2,log?? = 2, ? ? = (1) ? ? = (?2) + (1) 29

  30. Constructing a Tableau ? 2 Depth: ? ? = 3? Master Theorem case 1: ? = 3,? = 2, ? ? = (1) ? ? = ?log23 (?1.59) + (1) 30

  31. Constructing a Tableau: 4-way divide-and-conquer Work: Depth: Work: ? ? = ?(??) Depth: ? ? ?(??.??) ? ? ? ?= ?(??.??) Parallelism: Parallelism: 31

  32. Constructing a Tableau ? 3 Work: ? ? = 9? Master Theorem case 1: ? = 9,? = 3,log?? = 2, ? ? = (1) ? ? = (?2) + (1) 32

  33. Constructing a Tableau ? 3 Depth: ? ? = 5? Master Theorem case 1: ? = 5,? = 3, ? ? = (1) ? ? = ?log35 (?1.47) + (1) 33

  34. Constructing a Tableau: 9-way divide-and-conquer Work: Depth: Work: ? ? = ?(??) Depth: ? ? ?(??.??) ? ? ? ?= ?(??.??) Parallelism: Parallelism: Nine-way divide-and-conquer has about (?0.12) more parallelism than four-way divide-and-conquer, but it exhibits less cache locality. How good can we do? 34

  35. Filtering/packing 35

  36. Parallel filtering / packing Given an array output an array Given an array ? of elements and a predicate function output an array ? with elements in of elements and a predicate function ?, , with elements in ? that satisfy that satisfy ? ? ? = ???? ?? ? ?? ??? ????? ?? ? ?? ???? 4 2 9 3 6 5 7 11 10 8 ? = 9 3 5 7 11 ? = 36

  37. Parallel filtering / packing How can we know the length of Count the number of red elements parallel reduce ?(?) work and ?(log?) depth How can we know the length of ? in parallel? in parallel? 4 2 9 3 6 5 7 11 10 8 ? = 0 0 1 1 0 1 1 1 0 0 37

Related


More Related Content