
Advanced RISC-V Processor with F Extension - Implementation Insights
"Explore the comprehensive design and implementation details of a cutting-edge RISC-V processor with F Extension, including background, modifications, and execution strategies for floating-point operations in single-cycle and multi-cycle architectures."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
RISC-V Processor with F Extension Kathy Camenzind and Miguel Gomez
Outline Background Implementation Strategy Combinational Multicycle 4-stage pipeline Microarchitecture Testing Connectal Compliance tests Results Notable bugs & lessons learned
Background RISC-V F extension Single-precision floating point numbers Includes encodings for +- Infinity, quiet/signaling NaN Instructions include: FLW/FSW FMV (between integer and FP registers) FDIV/ADD/MUL/SQRT/MADD/ FMIN/MAX, FEQ/FLT/FLE FSGNJ (sign injection) FCVT FCLASS
Overview of Required Modifications Based on Lab 5 Processor CSR instructions Basic CSR reads and swaps FCSR (rounding mode) Extending the Decode and Execute stages to handle new instructions Integrating Bluespec s built in FloatingPoint library Adding tests that use floating point instructions
Implementation Strategy Combinational Executes floating-point operations in single cycle Very long critical path Multicycle Executes floating-point operations in multiple cycles One instruction passes through processor at a time 4-Stage Pipeline with Bypassing Executes floating-point operations in multiple cycles Pipelining based on ThreeStageBypass from lab 5, with added writeback stage 4-Stage Superscalar Pipeline with Bypassing Can execute multiple instructions at a time in different functional units Preserves commit order within integer operations and floating point operations
Superscalar Execute: Microarchitecture Control Hazards: Branch instructions resolve in one cycle in Execute Data Hazards RAW: Still handled by Scoreboard WAW: Completion buffer reorders same-RF writebacks
Synthesis Results Combinational Multicycle FourStageBypass Superscalar ~190,000 ~187,000 ~368,000 ~397,000 Area ( m2) Critical Path (ps) ~10,200 1,147 2,018 1,656
Performance Results Multicycle (Floating Point) Four Stage Bypass (Floating Point) Four Stage Superscalar Bypass (Floating Point) towers .1033 .1069 0.1069 median .2768 .2872 0.2873 Multiply .4763 .3838 0.3838 Qsort .3817 n/a n/a Vvadd .2229 .2241 0.2242
Notable Bugs Functionally incorrect functions in FloatingPoint.bsv Compliance tests that were non-compliant :) Mostly centered around NaN operations Debugging Implemented showInst for new instructions Printed which stages were executing which instructions Printing the current state of the completion buffer
Conclusion / Things Learned Efficient pipelining is difficult Fixing edge cases should be a second priority, completing the bigger picture is first Simulations can sometimes be unreliable Libraries can be incorrect