Computer Systems Design & Architecture: Multi-Cycle CPU & Single-Cycle Implementation

cs161 design and architecture of computer systems n.w
1 / 33
Embed
Share

Explore the design and architecture of computer systems focusing on multi-cycle CPU design and single-cycle implementation. Learn about the steps of each instruction, cycle time calculation, and the transition towards a multicycle approach for complex instruction sets like floating point operations.

  • Computer Systems
  • Architecture
  • Multi-Cycle CPU
  • Single-Cycle
  • Instruction Steps

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CS161 Design and Architecture of Computer Systems Multi-Cycle CPU Design

  2. Single Cycle Implementation Calculate cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns) PCSrc 1 M u x Add ALU result 0 4 Add Shift left 2 RegWrite Instruction [25 21] Read register 1 Read register 2 Read data 1 MemWrite Read address PC Instruction [20 16] MemtoReg ALUSrc Zero Instruction [31 0] Read data 2 1 ALU result ALU 1 Read data Write register Write data 1 Address M u x M u x M u x Instruction memory Instruction [15 11] Registers 0 0 Data memory 0 Write data RegDst 16 32 Sign extend Instruction [15 0] ALU control MemRead Instruction [5 0] ALUOp

  3. Single Cycle Steps of each instruction Inst. Type Functional Units Used Instruction fetch Instruction fetch Instruction fetch Instruction fetch Instruction fetch Register read Register read Register read Register read ALU Register write Memory access Memory access R-type ALU Register write Load ALU Store ALU Branch Jump

  4. Single Cycle How long is the cycle? Inst. % Inst. Type Inst. Mem. Reg. File (read) 1 ALU (s) Data Mem. Reg. File (write) 1 Total 44 2 2 0 6 ns R-type 24 2 1 2 2 1 8 ns Load 12 2 1 2 2 0 7 ns Store 2 1 2 0 0 5 ns Branch 18 2 0 0 0 0 2 ns Jump 2 The cycle time must accommodate the longest operation: lw. Cycle time = 8 ns but the CPI = 1. If we can accommodate variable number of cycles for each instruction and a cycle time of 1ns. CPI = 6*44% + 8*24% + 7*12% + 5*18% + 2*2% = 6.3

  5. Where we are headed Single Cycle Problems: what if we had a more complicated instruction like floating point? waste of area One Solution: use a smaller cycle time have different instructions take different numbers of cycles a multicycle datapath: Instruction register Data PC Address A Register # Instruction or data ALU ALUOut Memory Registers Register # Memory data register B Data Register #

  6. Multicycle Approach We will be reusing functional units ALU used to compute address and to increment PC Memory used for instruction and data Our control signals will not be determined solely by instruction e.g., what should the ALU do for a subtract instruction? We ll use a finite state machine for control

  7. Review: finite state machines Finite state machines: a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) Next state Next-state function Current state Clock Inputs Output function Outputs We ll use a Moore machine (output based only on current state)

  8. Multicycle Approach Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional internal registers PC 0 0 Instruction [25 21] Read register 1 M u x 1 M u x 1 Address Read data 1 A Instruction [20 16] Read register 2 Zero Memory ALU ALU result ALUOut 0 MemData Registers Write register Instruction [15 0] Instruction register M u x 1 Read data 2 B 0 Instruction [15 11] Write data 1M 2 4 Write data u x 0 Instruction [15 0] 3 M u x 1 Memory data register 16 32 Shift left 2 Sign extend

  9. Muticyle Datapath PC 0 0 Instruction [25 21] Read register 1 M u x 1 M u x 1 Address Read data 1 A Instruction [20 16] Read register 2 Zero Memory ALU ALU result ALUOut 0 MemData Registers Write register Instruction [15 0] Instruction register M u x 1 Read data 2 B 0 Instruction [15 11] Write data 1M 2 4 Write data u x 0 Instruction [15 0] 3 M u x 1 Memory data register 16 32 Shift left 2 Sign extend

  10. Five Execution Steps Instruction Fetch (F) Instruction Decode and Register Fetch (D) Execution, Memory Address Computation, or Branch Completion (EX) Memory Access or R-type instruction completion (M) Write-back step (W) INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

  11. Step 1: Instruction Fetch Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. Can be described concisely using RTL "Register-Transfer Language" IR = Memory[PC]; PC = PC + 4; Can we figure out the values of the control signals?

  12. Datapath of Multicycle Implementation PCWriteCond PCSource PCWrite ALUOp Outputs IorD ALUSrcB MemRead ALUSrcA Control MemWrite RegWrite MemtoReg Op [5 0] RegDst IRWrite 0 1 M u x Jump address [31-0] 26 28 Instruction [25 0] Shift left 2 2 Instruction [31-26] PC [31-28] PC 0 0 Instruction [25 21] Read register 1 M u x 1 M u x 1 Address Read data 1 A Instruction [20 16] Read register 2 Zero Memory ALU ALU result ALUOut 0 MemData Registers Write register Instruction [15 0] Instruction register M u x 1 Read data 2 B 0 Instruction [15 11] Write data 1M 2 4 Write data u x 0 Instruction [15 0] 3 M u x 1 Memory data register 16 32 ALU control Shift left 2 Sign extend Instruction [5 0]

  13. Step 2: Instruction Decode and Register Fetch Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC + (sign-extend(IR[15-0]) << 2); We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic)

  14. Single Cycle Implementation 14

  15. Datapath of Multicycle Implementation PCWriteCond PCSource PCWrite ALUOp Outputs IorD ALUSrcB MemRead ALUSrcA Control MemWrite RegWrite MemtoReg Op [5 0] RegDst IRWrite 0 1 M u x Jump address [31-0] 26 28 Instruction [25 0] Shift left 2 2 Instruction [31-26] PC [31-28] PC 0 0 Instruction [25 21] Read register 1 M u x 1 M u x 1 Address Read data 1 A Instruction [20 16] Read register 2 Zero Memory ALU ALU result ALUOut 0 MemData Registers Write register Instruction [15 0] Instruction register M u x 1 Read data 2 B 0 Instruction [15 11] Write data 1M 2 4 Write data u x 0 Instruction [15 0] 3 M u x 1 Memory data register 16 32 ALU control Shift left 2 Sign extend Instruction [5 0]

  16. Step 3 (instruction dependent) ALU is performing one of three functions, based on instruction type Memory Reference: ALUOut = A + sign-extend(IR[15-0]); R-type: ALUOut = A op B; Branch: if (A==B) PC = ALUOut;

  17. Datapath of Multicycle Implementation PCWriteCond PCSource PCWrite ALUOp Outputs IorD ALUSrcB MemRead ALUSrcA Control MemWrite RegWrite MemtoReg Op [5 0] RegDst IRWrite 0 1 M u x Jump address [31-0] 26 28 Instruction [25 0] Shift left 2 2 Instruction [31-26] PC [31-28] PC 0 0 Instruction [25 21] Read register 1 M u x 1 M u x 1 Address Read data 1 A Instruction [20 16] Read register 2 Zero Memory ALU ALU result ALUOut 0 MemData Registers Write register Instruction [15 0] Instruction register M u x 1 Read data 2 B 0 Instruction [15 11] Write data 1M 2 4 Write data u x 0 Instruction [15 0] 3 M u x 1 Memory data register 16 32 ALU control Shift left 2 Sign extend Instruction [5 0]

  18. Step 4 (R-type or memory-access) Loads and stores access memory MDR = Memory[ALUOut]; or Memory[ALUOut] = B; R-type instructions finish Reg[IR[15-11]] = ALUOut; The write actually takes place at the end of the cycle on the edge

  19. Datapath of Multicycle Implementation PCWriteCond PCSource PCWrite ALUOp Outputs IorD ALUSrcB MemRead ALUSrcA Control MemWrite RegWrite MemtoReg Op [5 0] RegDst IRWrite 0 1 M u x Jump address [31-0] 26 28 Instruction [25 0] Shift left 2 2 Instruction [31-26] PC [31-28] PC 0 0 Instruction [25 21] Read register 1 M u x 1 M u x 1 Address Read data 1 A Instruction [20 16] Read register 2 Zero Memory ALU ALU result ALUOut 0 MemData Registers Write register Instruction [15 0] Instruction register M u x 1 Read data 2 B 0 Instruction [15 11] Write data 1M 2 4 Write data u x 0 Instruction [15 0] 3 M u x 1 Memory data register 16 32 ALU control Shift left 2 Sign extend Instruction [5 0]

  20. Step 5 Write-back step Reg[IR[20-16]]= MDR; What about all the other instructions?

  21. Summary: Action for R-type instructions Action for memory-reference instructions IR = Memory[PC] Action for branches Action for jumps Step name Instruction fetch PC = PC + 4 A = Reg [IR[25-21]] B = Reg [IR[20-16]] Instruction decode/register fetch ALUOut = PC + (sign-extend (IR[15-0]) << 2) ALUOut = A + sign-extend (IR[15-0]) Execution, address computation, branch/ jump completion Memory access or R-type completion ALUOut = A op B if (A ==B) then PC = ALUOut PC = PC [31-28] II (IR[25-0]<<2) Reg [IR[15-11]] = ALUOut Load: MDR = Memory[ALUOut] or Store: Memory [ALUOut] = B Load: Reg[IR[20-16]] = MDR Memory read completion

  22. Simple Questions How many cycles will it take to execute this code? Label: lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label add $t5, $t2, $t3 sw $t5, 8($t3) ... #assume not What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 takes place?

  23. Implementing the Control Value of control signals is dependent upon: what instruction is being executed which step is being performed Use the information we ve accumulated to specify a finite state machine specify the finite state machine graphically, or use microprogramming Implementation can be derived from specification

  24. Graphical Specification of FSM 1 0 Start Instruction Decode Instruction Fetch Op = J 8 2 6 9 Complete Branch Mem. Address Compute ALU Execute Complete Jump Op = LW 3 5 7 Mem. Access Read Mem. Access Write Complete R-type 4 Write-Back

  25. Detailed FSM Instruction decode/ register fetch Instruction fetch 0 MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 1 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 Start (Op = 'BEQ') (Op = R-type) (Op = 'J') (Op = 'LW') or (Op = 'SW') Memory address computation Branch completion Jump completion Execution 2 6 8 9 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 ALUSrcA =1 ALUSrcB = 00 ALUOp= 10 PCWrite PCSource = 10 (Op = 'SW') (Op = 'LW') Memory access Memory access R-type completion 3 5 7 RegDst = 1 RegWrite MemtoReg = 0 MemRead IorD = 1 MemWrite IorD = 1 Write-back step 4 RegDst=0 RegWrite MemtoReg=1

  26. Finite State Machine for Control Implementation: PCWrite PCWriteCond IorD MemRead MemWrite IRWrite Control logic MemtoReg PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst Outputs NS3 NS2 NS1 NS0 Inputs Op5 Op4 Op3 Op2 Op1 Op0 S3 S2 S1 S0 Instruction register opcode field State register

  27. Inputs Op5 Op4 Op3 Op2 Op1 Op0 Outputs R-format Iw sw beq RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOpO

  28. ALUOp ALU control block ALUOp0 ALUOp1 Operation2 F3 Operation F2 Operation1 F (5 0) F1 Operation0 F0

  29. PLA Implementation Programmable Logic Array If I picked a horizontal or vertical line could you explain it? Op5 Op4 Op3 Op2 Op1 Op0 S3 S2 S1 S0 PCWrite PCWriteCond IorD MemRead MemWrite IRWrite MemtoReg PCSource1 PCSource0 ALUOp1 ALUOp0 ALUSrcB1 ALUSrcB0 ALUSrcA RegWrite RegDst NS3 NS2 NS1 NS0

  30. ROM Implementation ROM = "Read Only Memory" values of memory locations are fixed ahead of time A ROM can be used to implement a truth table if the address is m-bits, we can address 2m entries in the ROM. our outputs are the bits of data that the address points to. 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 m n m is the "heigth", and n is the "width"

  31. ROM Implementation How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 210 = 1024 different addresses) How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs ROM is 210 x 20 = 20K bits Rather wasteful, since for lots of the entries, the outputs are the same i.e., opcode is often ignored

  32. Microprogramming Control unit PCWrite PCWriteCond IorD MemRead MemWrite Microcode memory Datapath IRWrite BWrite Outputs MemtoReg PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst AddrCtl Input 1 Microprogram counter Adder Address select logic Op[5 0] Instruction register opcode field

  33. Microprogramming How all CISC ISAs were built. -store: RAM-memory with N bits/word N = number of control lines in CPU. Execution of each instruction starts IF and ID. Instruction Opcode points to a location in -store Output consecutive control words until DONE. -store is updatable -controller is = a mini-CPU running the main CPU -programming worked when CPU = multiple boards -controller on one board: much faster

Related


More Related Content