Exploring Computer Architecture: CISC vs. RISC, Pipelining, and More

1 / 36

Embed Share

Dive into the world of computer architecture, comparing complex instruction set computing (CISC) and reduced instruction set computing (RISC) designs. Explore concepts like pipelining, hazards, memory hierarchy, and branch prediction to enhance your understanding of modern computing systems.

pstin Follow

Uploaded on Jun 17, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

A bit of computer architecture

Improve our computer by making it faster/smaller/more dense, then: CISC vs RISC Pipelining hazards branch prediction Memory hierarchy, multi-level cache Superscalar, in-order / out-of-order

CISC vs RISC CISC RISC Emphasis on hardware Emphasis on software Includes multi-clock complex instructions Single-clock, reduced instruction only Memory-to-memory: "LOAD" and "STORE" incorporated in instructions Register to register: "LOAD" and "STORE" are independent instructions Small code sizes, high cycles per second Transistors used for storing complex instructions Low cycles per second, large code sizes Spends more transistors on memory registers

Pipelining All objects go through the same set of stages No sharing of resources between stages Propagation delay through all stages is equal Scheduling of all transactions are independent

https://www.youtube.com/watch?v=doJpguZFTe0 Pipelining https://www.youtube.com/watch?v=eVRdfl4zxfI Instructions/program Source code, compiler, isa Cycle/instruction microarchitecture and ISA Time/cycle base technology, microarchitecture

Pipelining, hazards Data hazards it s about data dependency RAW read after write WAR write after read WAW write after write (out of order)

s some considerations regarding pipelining ome considerations regarding pipelining a) a) dependencies among stages dependencies among stages b) b) pc++, which instruction is the next instruction ? pc++, which instruction is the next instruction ?

Pipelining Pipelining, hazards RAW (Read After Write) R2 <- R1 + R3 R4 <- R2 + R3 1st instruction calculates a value to saved in register R2 2nd instruction wants to use this value to compute a result for register R4

Pipelining Pipelining, hazards WAR Write After Read R4 <- R1 + R5 R5 <- R1 + R2 If the 2ndinstruction can finish first, R5 can t be stored before the first instruction can read it (Say what??? Hang on a moment) (e.g. concurrent execution)

Pipelining Pipelining, hazards WAW - Write After Write R2 <- R4 + R7 R2 <- R1 + R3 I2 tries to write an operand before it is written by I1, also a concurrent execution issue

Pipelining, hazards RAW is obvious WAR and WAW come into play with superscalar architectures, leave them on the back burner for now

Pipelining Pipelining, hazards Say you have a six execution stages with times of 50ns 50ns 60ns 60ns 50ns 50ns What is the instruction latency? (how long does it take to execute one?)

Without Pipelining 50+50+60+60+50+50 = 320ns How long to execute 100 instructions? 320ns * 100 32000ns

With Pipelining Add pipelining 1-1 with our execution stages All stages must have the same length Clock skew same timing signal arrives at Different components at slightly different times

Pipelining What is the instruction latency on the pipelined machine, assuming clock skew of 5ns? Max stage length = 60ns Clock skew = 5ns Length of each stage = 65ns Instruction latency = 65ns How long to execute 100 instructions?

Pipelining 65ns * 6 * 1 + 65 * 1 * 99 == 390 + 6435 == 6825ns What speedup did we get?

Pipelining Avg instruction time without / avg instruction time with 32000ns / 6825ns == 4.69

Pipelining What happens when I increase the number of pipeline stages? More pipeline stages == faster

Pipelining So have an infinite number of pipeline stages And your code will run in zero time Done

Pipelining

Pipelining So where is the crossover point? In modern systems, in the 30-50 range Pipeline depth vs number of stages

Pipelining, branch predictions incorrect

Pipelining, branch predictions Dynamic Branch Prediction Implemented in hardware Prediction changes as the program runs Common algorithm is persistence

Pipelining, branch predictions Static Branch Prediction Analyze the source code Predictions are fixed for the run-time life of the program

Memory Memory hierarchy hierarchy

Memory Memory hierarchy hierarchy 2GHz accessing 100ns DRAM Can execute 800 instructions for 1 memory access Physical size affects latency; bigger = slower signals have further to travel, more fan out Think about that: 800 instructions can execute while waiting for memory What are the implication wrt pipelining and branch prediction?

Memory Memory hierarchy hierarchy But it s only useful if you re actually storing data the processor needs

Memory Memory hierarchy hierarchy Loop Function call/return Array access/ scalar access during array access loop

Memory Memory hierarchy hierarchy How do we exploit the patterns? Temporal locality if I access it now, I will probably access it again in the near future Spatial locality - If I access it now, I will probably access nearby locations in the near future

Memory Memory hierarchy, cache hierarchy, cache heirarchy heirarchy

Direct mapped, use modulo like a hash table 12%8 = 4 2-way set associative, allow to map to a group of locations 12%4 Fully associative can go anywhere

2-way associative 8 byte block 4 line cache Tag check all possibilities in parallel

Superscalar Lower bound is 1 instruction per cycle Or is it? How do we get (or try to get) CPI < 1.0?

Superscalar Superscalar processors Execute multiple instructions at the same time In-order vs out-of-order

Superscalar Simple superscalar pipeline (from Wikipedia)

cisc vs risc pipelining more stages = fatster (to a point, ~20 ish) data hazards RAW the obvious WAR, WAW out of order stall the pipeline, or transmit the info up the pipe flush is expensive, hence try to predict static prediction look at the code dynamic keep track while it executes predictions are surprisingly accurate (~90%+) superscalar architectures memory hierarchy levels of cache replacement algorithms direct mapped associative summary for our RE needs

Exploring Computer Architecture: CISC vs. RISC, Pipelining, and More

Download Presentation

Presentation Transcript

Related

More Related Content