
Understanding PRAM and VLSI Models in Parallel Computing
Explore the concepts of PRAM (Parallel Random Access Machines) and VLSI models in parallel computing, covering topics such as machine size, synchronization, memory access, communication, and application to SIMD and MIMD machines. Learn about memory update operations and the synchronization overhead. Prof. V.B. More from MET's Institute of Engineering in Nashik provides insights into these models.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
PRAM and VLSI models Prof V.B.More, MET's IOE, BKC, Nashik
PRAM Parallel Random Access Machines Arbitrary but finite number of processors can access any value in an arbitrarily large shared memory in a single time step. Processors execute different instruction stream but work synchronously. Prof V.B.More, MET's IOE, BKC, Nashik
Model: The machine size n can be arbitrarily large Machine is synchronous at instruction level. Each processor is executing its own series of instructions and entire machine operates at a basic time step / cycle. Within each cycle, each processor executes exactly one operation or idle. Instructions can fetch operands from memory, perform ALU operation on data, store result back in memory. Prof V.B.More, MET's IOE, BKC, Nashik
All processors implicitly synchronize on each cycle and synchronization overhead is zero. Communication is done through reading and writing shared variables. Memory access through UMA, NUMA, CREW, EREW, or CRCW with defined conflict policy. Prof V.B.More, MET's IOE, BKC, Nashik
PRAM model can be applied to SIMD class of machines if all processors execute identical instructions on same cycle or MIMD class of machines if processors are executing different instructions. Load imbalance is the only overhead in PRAM model. Prof V.B.More, MET's IOE, BKC, Nashik
P1 Tightly synchronized P2 Processing elements operates on a synchronized read-memory, compute and write-memory cycle Shared Memory Pn Prof V.B.More, MET's IOE, BKC, Nashik
Four memory update operations are possible: Exclusive read (ER) At most one processor is allowed to read from any memory location in each cycle, Exclusive write (EW) At most one processor is allowed to write into a memory location at a time. Concurrent read (CR) Multiple processors are allowed to read the same information from same memory cell in the same cycle. Concurrent write (CW) Allows simultaneous writes to the same memory location. Here policy must be defined to avoid write conflicts. Prof V.B.More, MET's IOE, BKC, Nashik
Four most important variations of PRAM Model EREW PRAM Model Exclusive Read Exclusive Write Any memory location can be accessed once in any one step. Blocking other processors from accessing the same memory location simultaneously. CREW PRAM Model - Concurrent Read Exclusive Write Any memory location may be read any number of times during a single step. But write takes place once in a step. ERCW PRAM Model - Exclusive Read Concurrent Write to the same memory location CRCW PRAM Model Concurrent Read Concurrent Write to the same memory location EREW and CRCW are the most popular models among algorithm designers. Q. Which model is faster? Prof V.B.More, MET's IOE, BKC, Nashik
VLSI Complexity Model Parallel computers largely dependent on use of VLSI chips to fabricate major components like processor arrays, memory arrays, large scale switching networks. Due to rapid advancements in VLSI technology, computer architect trying to implement parallel algorithms directly in hardware. Prof V.B.More, MET's IOE, BKC, Nashik
The AT2 model Let, A is chip area T is latency for completing given computation s is problem size then A x T2 >= O(f(s)) Chip Area A is the measure of chip s complexity. Latency T is the time required from when inputs are applied until all outputs are produced for single problem instance. Prof V.B.More, MET's IOE, BKC, Nashik
Memory bound on chip area Many computations are memory bound because they need to process large data sets (fig a). Each bit flows through a unit area of the horizontal chip slice. Therefore, chip area bounds the amount of memory bits stored on the chip. I/O bound on volume AT Data flow through the chip for a period of time T, here number of input bits cannot exceed volume AT (fig a) Bisection communication bound AT Bisection area represents data flow between two halves of the chip circuit during T. Cross section AT limits communication bandwidth of a computation (fig b). Prof V.B.More, MET's IOE, BKC, Nashik
Time Time T T A A Chip Area A b) Communication-limited bound on the bisection A T a) Memory-limited bound on chip area A and I/O limited bound on chip represented by the volume AT Prof V.B.More, MET's IOE, BKC, Nashik