
PRAM and Parallel Computing: Exploring High-Performance Computing Concepts
Delve into the world of PRAM and parallel computing with Dr. Jie Liu from Western Oregon University. Learn about advanced algorithms, Amdahl's Law, and multi-core programming as you discover the fastest computers and innovative technologies in the field.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
PRAM and Parallel Computing Jie Liu, Ph.D. Professor Computer Science Division Western Oregon University Monmouth, Oregon, USA liuj@wou.edu
outline The fastest computers The PRAM model The O(1) algorithm that finds the max An elegant parallel merge sorting algorithm A practical parallel sorting algorithm we developed Amdahl s Law and Gustafson-Barsis Law Technologies we should pay attention Q&A session
Multi-Core Programming Sequential Parallel 4
PRAM 5
More About PRAM Each PRAM processor can either Perform the prescribed operation (the same for all processors), Carry out an I/O operation, Idle, or Activate another processor So, it takes n processors to activate another n processors, then we have 2n active processors Now two questions What happens if two processors write to the same memory location? How many steps does it take to activate p processors 6
Handling Writing Conflicts in PRAM EREW (Exclusive Read Exclusive Write) CREW (Concurrent Read Exclusive Write) CRCW (Concurrent Read Concurrent Write) Common all the values are the same Arbitrary pick a value and set it Priority the processors with the highest priority is the winner A multi-core computer is which one of the above? 7
Activating n Processors What is the complexity O(log n)? It forms a binomal tree 8
Finding Max in a constant time O(1) Input: an array of n integers arrA[0..n-1] Output: the largest of numbers in arrA[0..n-1] Global variable arrB[0..n-1], i, j Assume the computer is a CRCW/Common with n2 CPUs FindignMax(arrA[0..n-1]) { 1. for all where 0 <= i < n-1 2. arrB[i] = 1 3. for all where 0 <= i, j < n-1 4. if (arrA[i] < arrA[j]) 5. arrB[i] = 0 6. for all where 0 <= i < n-1 7. if (arrB[i] = 1) 8. return arrA[i] } 0 , ip ip, j , i p 0 9
Finding Max how does it work After line 2, every B[i] is 1 for all where 0 <= i < n-1 arrB[i] = 1 for all where 0 <= i, j < n-1 if (arrA[i] < arrA[j]) arrB[i] = 0 for all where 0 <= i < n-1 if (arrB[i] = 1) return arrA[i] Write a 0 to B[i] if A[i] is smaller than any element in A because it is CRCW/Common 1. 2. 3. 4. 5. 6. 7. 8. 0 , ip ip, j ip ip, , i p 0 j 10
Finding Max questions How to do it sequentially, and what is the complexity then? How to do it in parallel, and what is the complexity? How many processors are needed? On the PRAM, what is the min amount of time required to run the algorithm, assuming only is activated? What is the cost? Cost of a parallel algorithm is defined to be the Number of processors X execution time For our algorithm, the cost is O(n2), or even O(n2 log n) while the sequential one is O(n), so ours is NOT cost optimal 0 P 11
Merging Two Sorted Arrays The problem: n is an even number. An array of size n stores two sorted sequence of integers of size n/2, we need to merge the two sorted segment in O(log (n)) steps. 12
Merging Two Sorted Arrays (2) The sequential approach uses two yardsticks and has no concurrency to exploit Calling for new algorithms Key idea: if we know there are k elements smaller than A[i], we can copy A[i] to A[k] in one step. If i<n/2, then there are i -1 elements smaller than A[i] (assuming array is 1 based). Now how can we find the number of elements in the second half of A that is also smaller than A[i] binary search (a log (n) algorithm)! The sequential algorithm identifies a spot and find the element to occupy the spot. The parallel algorithm find identifies an element and find the spot it needs to occupy 13
Merging Two Sorted Arrays In Parallel //A[1] to A[n/2] and A[n/2 +1] to A[n] are two sorted sections MergeArray(A[1..n]) { int x, low, high, index ) , , ( 1 - n 2 1 p ... ... p p spawn ip for all where 1 <= i <= n // The lower half search the upper half, the upper half search for the lower half { high = 1 // assuming it is the upper half low = n/2 If i <= (n/2) { high = n low = (n/2) + 1} x = A[i] // perform binary search Repeat { index = If x < A[index] high = index 1 else low = index + 1 } until low > high A[high + I n/2] = x } low+ ( / ) 2 high } 14
A practical parallel sorting algorithm Sorting on a real shared memory parallel computers has its uniqueness The entire array is accessible The number of processors is much much less than the number of elements Generally there must be some partitioning of data Data move distance is irrelevant to costs We developed a practical algorithms also used this Move to idea My students called our algorithm the Jie- Sort, I call it the J-Sort 15
J-Sort through an example The array 5 17 42 3 9 22 51 26 15 32 19 99 Marking S1 1 1 0 1 1 1 0 0 1 0 1 0 Prefix Sum S1 1 2 2 3 4 5 5 5 6 6 7 7 Marking S2 0 0 1 0 0 0 1 1 0 1 0 1 Prefix Sum S2 0 0 1 1 1 1 2 3 3 4 4 5 Partitioned Array 5 17 3 9 22 15 17 42 52 26 32 99 What if you have only 4 processors 16
When fix the number of processors First, divide in to the p = 4 chunks Find sizes of the S1 and S2 for each chunk Perform prefix sum on size arrays, Copy the elements
J-Sort is cost optimal We proved that Which means the cost is O(n log n), which is the cost of merge sort, and the lower bound of comparison bases sorting algorithms, so J-Sort is cost optimal!!!
Amdahls Law and Gustafson-Barsis Law Amdahl s Law: Let s be the fraction of operations in a computation that must be performed sequentially, where 0 s 1. The maximum speedup achievable by a parallel computer with p processors performing the computation is 1 + 1 ( / ) s s p Gustafson-Barsis s Law: Given a parallel program solving a problem using p processors, let s denote the fraction of the total execution performed sequentially. The maximum speedup achievable by this program is 1 ( + ) * p p s These two laws contradict with each other. How can we explain this contradiction? 20
10 technologies we should pay attention 5G + Cloud + As a service model Big Data/BI/ML + deep learning DBMS for analytics Autonomous Vehicles Block chain Artificial Intelligence Virtual & Augmented Reality Internet of Things Parallel Processing Mobile software development Android surpassed Windows and is the most popular OS 21