Overview of Lectures on Parallel and Distributed Computing by Dr. Wei Chen

lectures on parallel and distributed computing n.w
1 / 26
Embed
Share

Explore a series of lectures on parallel and distributed computing by Dr. Wei Chen, covering topics such as parallel computational models, algorithm design, distributed systems, and more. Discover the importance of parallel computing in solving complex problems and overcoming CPU limitations.

  • Parallel Computing
  • Distributed Computing
  • Dr. Wei Chen
  • Computer Science
  • Algorithms

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Lectures on Parallel and Distributed Computing Dr. Wei Chen ( Tennessee State University 2013 8 ), Professor 1

  2. Outline Lecture 1: Introduction to parallel computing Lecture 2: Parallel computational models Lecture 3: Parallel algorithm design and analysis Lecture 4: Distributed-memory programming with PVM/MPI Lecture 5: Shared-memory programming with Open MP Lecture 6: Shared-memory programming with GPU Lecture 7: Introduction to distributed systems Lecture 8: Synchronous network algorithms Lecture 9: Asynchronous shared-memory/network algorithms 2

  3. Reference: (1) Lecture 1 & 2 & 3: Joseph Jaja, Introduction to Parallel Algorithms, Addison Wesley, 1992. (2) Lecture 4 & 5 & 6: Peter S. Pacheco: An Introduction to Parallel Programming, Morgan Kaufmann Publishers, 2011. (3) Lecture 7 & 8: Nancy A. Lynch, Distributed Algorithms, Morgan Kaufmann Publishers, 1996. 3

  4. Lecture1: Introduction to Parallel Computing 4

  5. Why Parallel Computing Problems with large computing complexity Computing hard problems (NP-complete problems) exponential computing time. Problems with large scale of input size quantum chemistry, statistic mechanics, relative physics, universal physics, fluid mechanics, biology, genetics engineering, For example, it costs 1 hour using the current computer to simulate the procedure of 1 second reaction of protein molecule and water molecule. It costs procedure the simulate to years 10 14 . 1 8 hour 1 of reaction. 5

  6. Why parallel Computing Physical limitation of CPU computational power In past 50 years, CPU was speeded up double every 2.5 years. But, there are a physical limitation. Light speed is Therefore, the limitation of the number of CPU clocks is expected to be about 10GHz. 8m 3 10 / sec . To solve computing hard problems Parallel processing DNA computer Quantum computer 6

  7. What is parallel computing Using a number of processors to process one task Speeding up the processing by distributing it to the processors One problem 7

  8. Classification of parallel computers Two kinds of classification Flann s Classification SISD (Single Instruction stream, Single Data stream) MISD (Multiple Instruction stream, Single Data stream) SIMD (Single Instruction stream, Multiple Data stream) MIMD (Multiple Instruction stream, Multiple Data stream) 2. Classification by memory status Share memory Distributed memory 8

  9. Flynns classification(1) SISD (Single Instruction Single Data) computer Von Neuman s one processor computer Memory Control Processor Data Stream Instruction Stream 9

  10. Flynns classification (2) MISD (Multi Instructions Single Data) computer All processors share a common memory, have their own control devices and execute their own instructions on same data. Control Processor Instruction Stream Processor Control Instruction Stream Memory Data Stream Processor Control Instruction Stream 10

  11. Flynns classification (3) SIMD (Single Instructions Multi Data) computer Processors execute the same instructions on different data Operations of processors are synchronized by global clock. Processor Data Stream Shared Memory Processor Control Data Stream or Instruction Stream Inter- connection Nerwork Processor Data Stream 11

  12. Flynns classification (4) MIMD (Multi Instruction Multi Data) Computer Processors have their own control devices, and execute different instructions on different data. Operations of processors are executed asynchronously in most time. It is also called as distributed computing system. Control Processor Instruction Stream Data Stream Shared Memory Processor ControlInstruction Data Stream or Stream Inter- connection Nerwork Processor Control Instruction Stream Data Stream 12

  13. Classification by memory types (1) 1. Parallel computer with a shared common memory Communication based on shared memory For example, consider the case that processor i sends some data to processor j. First, processor i writes the data to the share memory, then processor j reads the data from the same address of the share memory. Processor Processor Shared Memory Processor 13

  14. Classification by memory types (2) Features of parallel computers with shared common memory Programming is easy. Exclusive control is necessary for the access to the same memory cell. Realization is difficult when the number of processors is large. Reason) The number of processors connected to shared memory is limited by the physical factors such as the size and voltage of units, and the latency caused in memory accessing. 14

  15. Classification by memory tapes (3) 2. Parallel computers with distributed memory Communication style is one to one based on interconnection network. For example, consider the case that processor i sends data to processor j. First, processor i issues a send command such as processor j sends xxx to process i , then processor j gets the data by a receiving command. Processor Local Memory Processor Local Memory Inter- connection Nerwork Processor Local Memory 15

  16. Classification by memory types (4) Features of parallel computers using distributed memory There are various architectures of interconnection networks (Generally, the degree of connectivity is not large.) Programming is difficult since comparing with shared common memory the communication style is one to one. It is easy to increase the number of processors. 16

  17. Types of parallel computers with distributed memory Complete connection type Any two processors are connected Features Strong communication ability, but not practical (each processor has to be connected to many processors). Mash connection type Processors are connected as a two-dimension lattice. Features - Connected to few processors. Easily to increase the number of processors. Large distance between processors: Existence of processor communication bottleneck. P1 P2 P5 P4 P3 P0,0 P0,1 P0,2 P0,3 P1,0 P1,1 P1,2 P1,3 n P2,0 P2,1 P2,2 P2,3 P3,0 P3,1 P3,2 P3,3 17

  18. Types of parallel computers with distributed memory Hypercube connection type Processors connected as a hypercube (each processor has a binary number. (Processors are connected if only if one bit of their number are different. Features Small distance between processors: log n. Balanced communication load because of its symmetric structure. Easy to increase the number of processors. 1100 1110 0110 0100 1101 0101 1111 0111 1010 0010 1000 0000 1001 1011 0001 0011 18

  19. Types of parallel computers with distributed memory Other connected type Tree connection type, butterfly connection type, bus connection type. Criterion for selecting an inter-connection network Small diameter (the largest distance between processors) for small communication delay. Symmetric structure for easily increasing the number of processors. The type of inter-connection network depends on application, ability of processors, upper bound of the number of processors and other factors. 19

  20. Real parallel processing system (1) Early days parallel computer (ILLIAC IV) Built in 1972 SIMD type with distributed memory, consisting of 64 processors P0,0 P0,1 P0,2 P0,3 P1,0 P1,1 P1,2 P1,3 Transformed mash connection type, equipped with common data bus, common control bus, and one control Unit. P2,0 P2,1 P2,2 P2,3 P3,0 P3,1 P3,2 P3,3 20

  21. Real parallel processing system (2) Parallel computers in recent 1990s Shared common memory type Workstation, SGI Origin2000 and other with 2-8 processors. Distributed memory type Processing type Network type Name Maker Processor num CM-2 TM 65536 SIMD hypercube CM-5 TM 1024 MIMD fat tree nCUBE2 NCUBE 8192 MIMD hypercube iWarp CMU, Intel 64 MIMD 2D torus Paragon Intel 4096 MIMD 2D torus SP-2 IBM 512 MIMD HP switch AP1000 Fujitsu 1024 MIMD 2D torus SR2201 Hitachi 1024 MIMD crossbar 21

  22. Real parallel processing system (3) Deep blue Developed by IBM for chess game only Defeating the chess champion Based on general parallel computer SP-2 Inter-connection network (Generalized hypercube) RS/6000 RS/6000 RS/6000 memory memory memory 32 nodes node node node Microchannel Bus RS/6000 Bus Interface Deep blue chip Deep blue chip 8 Deep blue chip memory 22

  23. K ()Computer - Fujitsu Architecture: 88,128 SPARC64 VIIIfx 2.0 GHz 8 cores processors, 864 cabinets of each with 96 computing nodes and 6 I/O nodes, 6 dimension Tofu/torus interconnect, Linux-based enhanced operating system, open-source Open MPI libaray,12.6 MW, 10.51 petaflops, ranked as # 3 in 2011. File:Keisoku-Fujitsu.jpg https://asc.llnl.gov/computing_resources/bluegenel/images/torus.jpg 23

  24. Titan Oak Ridge Lab Architecture: 18,688 AMD Opteron 6247 16-cores CPUs, Cray Linux, 8.2 MW, 17.59 petaflops, GPU based, Torus topology, ranked #1 in 2012. An image of the cabinets that make up Titan. 24

  25. Architecture: 14,336 Xeon X5670 processors, 7168 Nvidia Tesla M2050 GPUs, 2048 FeiTeng 1000 SPARC-based processors, 4.7 petaflops. 112 computer cabinets, 8 I/O cabinets, 11D hypercube topology with IB QDR/DDR Linux, #2 in 2011 25

  26. Exercises Compare top three supercomputers in the world. 26

More Related Content