Tomasulo's Algorithm for Out-of-Order Execution in Processors

february 24 th 2021 n.w
1 / 24
Embed
Share

Dive into Tomasulo's Algorithm for out-of-order execution in processors, covering in-order pipelining, in-order and out-of-order timelines, Tomasulo datapath, and implications on precise exceptions. Learn key concepts and resources for review.

  • Tomasulo Algorithm
  • Out-of-Order Execution
  • Processor Design
  • Pipeline Stages
  • Precise Exceptions

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. FEBRUARY 24TH, 2021 EE460N LECTURE 11 EE460N LECTURE 11 SPRING 2021 UT AUSTIN 1

  2. Todays Plan Important dates Problem Set 2: March 1 Lab 3: March 7 Exam 1: March 10 Quick recap on pipelining and out-of-order execution HPS: out-of-order execution with in-order retirement Executing conditional branches in a pipeline Compile time methods Dynamic Branch Prediction 2

  3. Recap We started with LC3-b microarchitecture The processor is controlled by a state machine We execute instruction one by one, and each instruction takes a few cycles Then, we introduced in-order pipelining Process each instruction in different stages: fetch, decode, execute, write back, etc. Overlap the processing of instructions Ideal goal: execute one instruction per cycle A problem: unnecessary pipeline stalls Out-of-order execution (Tomasulo Algorithm) Assign a unique tag to the destination registers of in-process instructions Put instructions in reservation station entries Instruction execute as soon as their dependencies are ready Register values may get updated out of program order 3

  4. In-order Timeline 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Inst1: MUL R3,R1,R2 F D R3 Inst2: ADD R4,R2,R3 F D R4 Inst3: MUL R2,R6,R7 F D R2 Inst4: ADD R5,R2,R3 F D R5 Inst5: ADD R3,R3,R5 F D R3 Out-of-order Timeline 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Inst1: MUL R3,R1,R2 F D R3 Inst2: ADD R4,R2,R3 F D R4 Inst3: MUL R2,R6,R7 F D R2 Inst4: ADD R5,R2,R3 F D R5 Inst5: ADD R3,R3,R5 F D R3

  5. Tomasulo Datapath 5

  6. Resources for Reviewing Tomasulos Algorithm Recording of Dr. Patt s office hours on Canvas (under Modules) Slides for a cycle-by-cycle example on the website (under Handouts) 6

  7. Out-of-order Commit Violates Precise Exceptions Precise Exceptions Identity of the faulting instruction is known State recovered to just before the faulting instruction A simple example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Inst1: DIV R1,R2,R3 F D Inst2: ADD R2,R3,R4 F D R2 Exceptional event detected Add takes one cycle and stores result in R2 DIV eventually takes an exception Not possible to recover R2 prior to DIV execution 7

  8. Solution: Update Register Values in Program Order HPS model: after execution finishes, put the result in a temporary buffer, copy value from the buffer to the register file in program order The buffer is called the re-order buffer (originally result buffer or value buffer) Tomasulo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Inst1: DIV R1,R2,R3 F D R1 Inst2: ADD R2,R3,R4 F D R2 HPS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Inst1: DIV R1,R2,R3 F D R1 Inst2: ADD R2,R3,R4 F D R2 8

  9. Pipelining with hiccups Tomasulo (~1956) HPS (~1985) 9

  10. Another Problem with Pipelining: Conditional Branches 10

  11. What should the pipeline fetch after a conditional branch? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Inst1: ADD R1,R2,R3 F D R1 Inst2: BRp label F D Inst3: MUL R4,R4,R4 Inst4: label NOT R4,R4,R4 Branch finishes Fetch stage needs to know what do 11

  12. Solution 1: wait until branch executes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Inst1: ADD R1,R2,R3 F D R1 Inst2: BRp label F D Inst3: MUL R4,R4,R4 Inst4: label NOT R4,R4,R4 F D R4 Wasted fetch cycles 12

  13. Solution 2: predict the outcome of the branch 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Inst1: ADD R1,R2,R3 F D R1 Inst2: BRp label F D Inst3: MUL R4,R4,R4 Inst4: label NOT R4,R4,R4 F D R4 13

  14. In case of a wrong prediction, we need to flush the pipeline 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Inst1: ADD R1,R2,R3 F D R1 Inst2: BRp label F D Inst3: MUL R4,R4,R4 F D Inst4: label NOT R4,R4,R4 F D F D R4 14

  15. Branch Prediction Ways to deal with Conditional Branches Compile time BTFN Hint bits Predication (e.g., the THUMB IT instruction) Delayed branch Combine branches Branch Prediction LT 2bit Counter Two-level Hybrid Perceptron TAGE The HEP (Burton Smith, Donlcor (~1978)) 15

  16. BTFN (backward taken forward not taken) Always predict taken for forward branches Always predict not taken for backward branches It typically works well for loop branches LOOP ADD R0, R0, #0 BRp EXIT ... ... ADD R0, R0, #-1 BR LOOP EXIT ... LOOP ... ... ADD R0, R0, #-1 BRp LOOP Probably taken Probably not taken 16

  17. Hint Bits Use profiling to determine the typical direction of each branch Problems: Profiling relies on representativeness of training data Many branches have dynamic behavior (i.e., a static prediction is fundamentally impossible) 17

  18. Predication Avoid branches by using conditional execution of instructions Conditional execution: all instructions execute, but they can discard the result if some condition is not met With Predication With Conditional Branches Example High-level Code BRnz ELSE if (flags are positive) { ADD R0, R0, R0 ADD R1, R1, R1 } else { ADD R2, R2, R2 ADD R3, R3, R3 } ADDp R0, R0, R0 ADDp R1, R1, R1 ADDnz R2, R2, R2 ADDnz R3, R3, R3 ADD R0, R0, R0 ADD R1, R1, R1 BR MERGE ELSE ADD R2, R2, R2 ADD R3, R3, R3 MERGE ... 18

  19. Implementation of Predication in Thumb ISA In ARM, each 32-bit instruction has 4 bit of conditions per instruction In THUMB, to support predication for 16-bit instructions, IT instructions sets up conditional execution for a block of up to 4 instructions Example 1 (4 instructions) Example 2 (3 instructions) 19

  20. Other compile-time methods Delayed Branches Allow the next few instructions after the branch to execute regardless of the directions of the branch Suppose we have a two-stage pipeline (fetch & execute), we have one delay slot that we need to fill in with an independent instruction after each conditional branch Not a good idea for deep and wide pipelines of today Combine Branches Combine multiple branches into one to reduce the penalty 20

  21. Dynamic Branch Prediction At runtime, use the past behavior of branches to predict the future Main advantage: can adapt to the runtime behavior of branches 21

  22. Dynamic Branch Prediction Last time taken (Store a bit in the I Cache Tag Store) Saturating 2 bit counter (Use high bit to predict) Two-level Predictor Gshare Modification 22

  23. 23

  24. 24

Related


More Related Content