Prediction of Embedded Memory Timing Failures in Initial Floorplan Design

learning based prediction of embedded memory n.w
1 / 27
Embed
Share

Learn about a study on predicting embedded memory timing failures during the initial floorplan design phase, aiming to reduce design cost and turnaround time by early detection of slack failures and addressing multiphysics effects. Explore the challenges and methodologies involved in improving timing closure and accurate slack estimation in SRAMs, leveraging multiphysics analysis to enhance design accuracy.

  • Embedded Memory
  • Timing Failures
  • Floorplan Design
  • Multiphysics Analysis
  • SRAMs

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Learning-Based Prediction of Embedded Memory Timing Failures During Initial Floorplan Design Wei-Ting J. Chan, Kun Young Chung, Andrew B. Kahng, Nancy D. MacDonald and Siddhartha Nath UC San Diego / VLSI CAD Laboratory -1-

  2. Outline Motivation Previous Work Our Work Multiphysics Analysis Modeling Methodology Results Conclusions -2-

  3. Early Prediction of Slack Failure in SRAMs Timing closure is time-consuming and complex at advanced nodes significantly increases turnaround time Multiphysics effects (IR drop, thermal, etc.) affect timing closure Floorplanning with SRAMs is complicated Creates placement and routing blockages Makes timing unpredictable at the post-P&R stage Early prediction of post-P&R slack can reduce design cost and turnaround time Post-P&R timing estimation at the floorplan stage is challenging due to many factors Wire delay must be estimated without information on spatial embedding Gate delay must be estimated without information on buffering No tool predicts post-P&R slack at an early design stage -3-

  4. Single vs. Multiple Physics Multiphysics STA: performing STA with more than one physics Examples of multiple physics: IR, thermal, reliability, crosstalk, etc. Design teams can achieve more accurate timing results by closing multiphysics analysis loops But, multiphysics results are non-trivial to predict in early stages No IR Static IR Dynamic IR (1stloop) Dynamic IR (2ndloop) Dynamic IR (3rdloop) Dynamic IR (4thloop) SRAM Slack (ps) SRAM Slack (ps) 29ps 25ps Implementation Index -4- SRAM #1 SRAM #5

  5. Challenge: Sensitivity of Slack to Spacing between Memories The spacing (channel width) between memories is varied in steps of 10 m The difference in slack can be larger than 300ps at a spacing of 10 m due to congestion, buffer placement, etc. Slack values vary in a highly nonobvious and/or noisy manner as the spacing is changed Delta slack > 300ps -0.7 1 2 3 4 5 sram_spacing WNS of SRAMs (ns) -0.8 Blockage -0.9 slack-1 slack-2 slack-3 slack-4 slack-5 -1 Placement region for standard cells -1.1 -1.2 -1.3 Blockage Blockage 0 10 SRAM Pitch ( m) 20 30 -5-

  6. Challenge: Sensitivity of IR Drop Map to Power Pad Locations Distribution density and location choices of power pads affect the IR drop map In (a) IR map has very few IR drop hotspots for uniformly placed pads In (b) and (c) IR maps have more hotspots due to fewer power pads (c) (b) -6- (a)

  7. Challenge: Abstraction of P&R Stages and Tool Noise Modeling must comprehend multiple stages of physical design Our approach: an approximate function f to estimate the combined effects of netlist, constraints, placement, clock network synthesis, routing, extraction and timing ? = ?(?) Gate Netlist Constraints ? = netlist, constraints, floorplan parameters Floorplan, Powerplan Modeling Placement Extraction, Timing Scope Clock network synthesis ? = ??? Routing Costly Iteration Extraction, Timing, Verification ? = Slack (w/, w/o IR) Slack (w/, w/o IR) Signoff -7-

  8. Previous Work Post-P&R timing prediction from netlist adoption of physical synthesis [Alpert07] analytical buffered delay or wire models [Alpert06] [Jones94] [Vujkovic12] detection of congestion during synthesis [Clarke11] models using regression on existing synthesized designs [Karchmer12] Thermal-aware delay model at floorplan [Kim12] Closed-form SRAM latency model w.r.t. process variation [Yaldiz09] P&R outcome prediction with machine learning Defect classification using SVMs [Huang10] Nonlinear ML models for CTS skew [Kahng13] None of the above works answer how to avoid suboptimal decisions at the floorplanning stage -8-

  9. Our Work First to propose a modeling methodology to predict post- P&R slack values at endpoints on SRAMs at the floorplan stage Extend our methodology to predict multiphysics slack values of SRAMs at the floorplan stage Enables early filtering and improvement of floorplans that would lead to timing failures at the post-layout and signoff stages A new implementation of Boosting technique based on SVMs as weak learners and a weighting strategy for negative slack outcomes to avoid critical timing failures -9-

  10. Multiphysics Analysis Flow We consider IR drop (RedHawk) and crosstalk (PTSI) in our work Other multiphysics effects such as thermal and reliability will be explored in the future .sdc, .db, .v, .spef Timing Analysis (PTSI) Not Explored in This Work Timing Windows per Pin (.timing) IR Drop per Instance (.tcl) Temp, Reliability, Other Physics IR Analysis (RedHawk) .lib, .def, .spef, .tech -10-

  11. Floorplanning and SRAM Placement Floorplans are parameterized including core width and height, SRAM spacings, surrounding space, and widths of routing channels core_w sram_w SRAM sram_h Buffer screen sram_spacing Blockages (emulate SRAMs) screen_w screen_w screen_w vc blockage_w hc core_h blockage_h screen_w screen_w -11-

  12. PDN Design We also parameterize PDN stripe pitches and stripe widths VDD Power ring: V = M9, H = M10 (width = 2 m) GND Top mesh: V = M9, H = M10 Power rail: M2 SRAM Secondary mesh: M6 SRAM SRAM: from M1 to M4 Power pad M1, M2, M3, M4, M5, M6, M7, M8: signal routing M6: local meshes M9, M10: top mesh M9, M10: power rings -12-

  13. Parameter Selection Three categories of parameters Netlist structure Floorplan parameters Layout constraints Sensitivity analysis Independent sweeping of each parameter Combined effects of parameters using variance inflation factor (VIF) Aspect ratio Utilization (std cells) PDN stripe width PDN stripe pitch SRAM spacing (channel width) Buffer screen width Routing metal layers Memory placement Parameter Range of Value(s) 0.8~1.2 40%~70% 0.5~3.5 m 7~40 m 6~24 m 10~16 m 7, 8 {Face-to-face, face-to-back} THEIA = 3.0~4.0ns nova = 3.2~4.2ns artificial = 2.0ns 200~280ps 8~10 {LVT}, {LVT, RVT}, {RVT} {X32}, {X32, X24}, {X32, X24, X16} 1W1S, 2W2S, 3W3S, 3W2S, 2W3S Clock period Max transition Max fanout Threshold voltage mixes Clock buffer sizes NDRs on clock nets -13-

  14. List of Parameters Parameter Description Type Per-memory? N1 Max delay across all timing paths at the post-synthesis stage Area of cells in the intersection of startpoint fanout and endpoint fanin cones of max-delay incident path Number of stages in the max-delay incident path Netlist Yes N2 N3 Netlist Netlist Yes Yes Max, min and average product of #transitive fanin and #transitive fanout endpoints Width and height of memory Aspect ratio of floorplan Standard cell utilization PDN stripe width and pitch Size of buffer screen around memories Area of blockage (%) relative to floorplan area Lower-left placement coordinates of memories Width, height of channels for memories #memory pins per channel N4, N5, N6 N7 FP1 FP2 FP3, FP4 FP5 FP6 FP7, FP8 FP9, FP10 FP11 Netlist Netlist Floorplan Floorplan Floorplan Floorplan Floorplan Floorplan Floorplan Floorplan Yes Yes No No No No No Yes Yes Yes Sum of width and spacing of top-three routing layers after applying non- default rules (NDRs) C2 % cells that are LVT Max fanout of any instance in data and clock paths C1 C3, C4 Constraint Constraint No No C5, C6 C7 C8 C9 Max transition time of any instance in data and clock paths Delay of the largest buffer expressed as FO4 delay Clock period used for P&R expressed as FO4 delay Ratio of clock periods used during synthesis and P&R Constraint Constraint Constraint Constraint No No No No -14-

  15. Modeling Techniques and Flow Parameters from sequential graph of netlist Slack reports from P&R, multiphysics STA Parameters from floorplan context, constraints Ground Truth ANN with 1 input, 2 hidden, 1 output layer Boosting with SVM as weak learner LASSO with L1 regularization SVM with RBF kernel Combine using weights Save model and exit -15-

  16. Boosting with SVM Input parameters (netlist, floorplan context, constraints) SVM weak learner 1 W1 SVM weak learner 2 W2 3 P&R, Boosting- predicted output Multiphysics slack reports SVM weak learner k Wk SVM weak learner -16-

  17. Experimental Setup and Testcases Standard cells: 28nm FDSOI foundry technology SRAMs: 28nm FDSOI foundry SRAMs Synthesis: Design Compiler P&R: IC Compiler STA: PrimeTime SI (PTSI) IR drop analysis: APACHE RedHawk SRAM Area ( m2) 347252 40027 48032 64043 80054 25117 14925 Netlist THEIA v0 THEIA v1 THEIA v2 THEIA v3 THEIA v4 nova artificial Clock Period(ns) 3 2.7 3 3 3 2 2 #Std Cells 147274 146505 146914 146243 146606 66031 201015 #SRAMs 40 5 6 8 10 5 6 Logic Area ( m2) 157416 157068 157012 156212 155991 68970 213075 -17-

  18. A General Tic-tac-toe Floorplan A floorplan is divided to a array of tic-tac-toe blocks Three types of blocks are defined as memory, blockage, and standard cells enables generality and parameterizability, enables the ability to explore a discrete design space systematically, and captures how designers tend to floorplan their blocks Memory STD cells Blockage -18-

  19. Example: Memory Placements Implementation examples of tic-tac-toe Implementation of cross / L / T shaped floorplans SRAM -19-

  20. Simple-Minded Modeling Yields Large Errors No apparent correlation between post-P&R and post- synthesis slack values Modeling with only netlist parameters Worst-case error = 358ps ; average error = 42ps Technique Worst-Case Error (ps) 565 412 358 Average Error (ps) 87 55 42 LASSO SVM (linear) SVM (w/ RBF kernel) -20-

  21. Post-P&R Slack Prediction Errors in data points with negative slack are penalized more to avoid critical timing failures Error of Slack Prediction (ns) Worst error = 224ps Average error = 4ps Actual Slack (ns) -21-

  22. Multiphysics Slack Prediction Annotate per-cell IR-drop from RedHawk in PTSI Worst error = 253ps Average error = 9ps -22-

  23. Modeling Fidelity False negatives = 3% pessimistic predictions in which we provide guidance to change a floorplan that is actually not required False positive = 4% our model incorrectly deems a floorplan to be good Positive slack data points: Precision: tp/(tp +fp) = 93.3% Recall: tp/(tp +fn) = 95.0% Actual Pass Fail Precision Negative slack data points: Precision: tn/(tn +fp) = 92.5% Recall: tn/(tn +fn) = 90.1% Pass Predicted Precision False positives 584 42 Fail 31 384 -23- Recall False negatives Recall

  24. Conclusions Early stage timing failure prediction and timing closure with multiphysics analyses are important We present a machine learning-based methodology for the early stage timing failure prediction problem Worst-case error = 224ps (w/o multiphysics) Worst-case error = 253ps (w/ multiphysics) We present a new implementation of Boosting based on SVMs as weak learners Our ongoing works include Applying our methodology to product/test engineering data from an SoC company Predicting defectivity in silicon and providing floorplan guidance to avoid such defectivity -24-

  25. Acknowledgments Work supported by Samsung Electronics We thank P. Agrawal (ANSYS) and J.-A. Desroses (ST Microelectronics) for their help with setup and enablement of iterative DVD analysis and signoff timing flow -25-

  26. Thank You! -26-

  27. Backup -27-

More Related Content