Guardbanding to Mitigate PVT Variations and Aging

Guardbanding to Mitigate PVT Variations and Aging
Slide Note
Embed
Share

The paper introduces a novel approach, Hierarchically Focused Guardbanding, to address the challenges of PVT variations and aging in electronic systems. Developed by Abbas Rahimi, Luca Benini, and Rajesh K. Gupta from UC San Diego and Universit di Bologna, this adaptive method offers a proactive strategy to enhance system reliability and performance in the face of changing environmental conditions. The researchers propose a detailed framework that intelligently allocates guardbands at different hierarchies within the system, effectively managing uncertainties caused by process, voltage, temperature variations, and aging effects. By dynamically adjusting guardband levels based on real-time monitoring and feedback, this approach optimizes system operation while ensuring long-term robustness and resilience.

  • Guardbanding
  • PVT Variations
  • Aging Mitigation
  • Electronic Systems
  • Adaptive Approach

Uploaded on Mar 07, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Hierarchically Focused Guardbanding: An Adaptive Approach to Mitigate PVT Variations and Aging Abbas Rahimi, Luca Benini, Rajesh K. Gupta UC San Diego and Universit di Bologna

  2. Outline Device Variability Process, voltage, and temperature, and aging Resilient Techniques Hierarchically Focused Guardbanding Analysis Flow for Timing Error Rate Parametric Model Fitting Hierarchical Sensors Observability Online Utilization of HFG Throughput improvement Conclusion 7-Mar-25 Rajesh K. Gupta / UC San Diego 1

  3. Ever-increasing PVTA Variations Variability in transistor characteristics is a major challenge in nanoscale CMOS, PVTA Static Process variation: effective transistor channel length and threshold voltage Dynamic variations: Temperature fluctuations, supply Voltage droops, and device Aging (NBTI, HCI) To handle variations designers use conservative guardbands loss of operational efficiency guardband actual circuit delay Clock Temperature VCCDroop Aging Across-wafer Frequency 7-Mar-25 Rajesh K. Gupta / UC San Diego 2

  4. Resilient Techniques I. Observation using in situ monitors (Razor, EDS) with cycle- by-cycle corrections (leveraging CMOS knobs or replay) Sense & Adapt II. Relying on external or replica monitors derive adaptive guardband to prevent error Predict & Prevent Model-based rule Adapt (correct) 1ns 1ns Prevent 4ns 4ns 3ns 3ns 5ns 5ns Sense (detect) Model Sensors 7-Mar-25 Rajesh K. Gupta / UC San Diego 3

  5. Our Resilient View I. We have done cross- layer vulnerability analysis: Manifestation of variability from instruction-level to task-level Sense & Adapt Instruction-level Vulnerability (ILV) Sequence-level Vulnerability (SLV) Procedure-level Vulnerability (PLV) Task-level Vulnerability (TLV) II. In this work, we present Hierarchically Focused Guardbanding (HFG), a model-based rule to derive guardband adaptively, for avoiding PVTA-induced timing error. Model & Prevent [ILV] A. Rahimi, L. Benini, R. K. Gupta, Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations, DATE, 2012. [SLV] A. Rahimi, L. Benini, R. K. Gupta, Application-Adaptive Guardbanding to Mitigate Static and Dynamic Variability, IEEE Tran. on Computer, 2013. [PLV] A. Rahimi, L. Benini, R. K. Gupta, Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters, ISLPED, 2012. [TLV] A. Rahimi, A. Marongiu, P. Burgio, R. K. Gupta, L. Benini, Variation-Tolerant OpenMP Tasking on Tightly-Coupled Processor Clusters, DATE, 2013. 7-Mar-25 Rajesh K. Gupta / UC San Diego 4

  6. Contributions 1. A new high-level model for Timing Error Rate of various integer as well as floating-point functional units (FUs) in presence of PVTA variations. Online: a model-based rule to derive guardband from the PVTA sensor readings Offline: identifying vulnerable FUs Notion of Hierarchically Focused Guardbanding (HFG) which is guided by online utilization of the model in view of monitors, observation granularity, and reaction times. Applying HFG on GPU at two distinct granularities: i. Fine-grained granularity of instruction-by-instruction monitoring and adaptive guardbanding ii. Coarse-grained granularity of kernel-level monitoring and adaptive guardbanding 2. 3. 7-Mar-25 Rajesh K. Gupta / UC San Diego 5

  7. HFG Analysis Flow for TER FUs VERILOG The model takes into account 1. PVTA parameter variations 2. Clock frequency 3. Physical details of Placed-and-Routed FUs in 45nm TSMC technology Analyzed FUs: 10 32-bit integer 15 single precision floating-point (fully compatible with the IEEE 754 standard) A full permutation of PVTA parameters and clock frequency are applied. For each FUiworking with tclkand a given PVTA variations, we defined Timing Error Rate (TER): = Design Ware Libs Design Compiler 45nm Corners Libs Variable Parameters IC Compiler Netlist &SPEF Voltage Temp. Process tclk 45nm Process VA Libs Prime Time SSTA &STA Timing Error Rate Analysis MATLAB Linear Classifier Vth Parametric Model Start Point 0.88V 0 C 0% 0mV 0.2ns End Point 1.10V 120 C 9.6% 100mV 5.0ns # of Points 23 13 4 5 25 Step Voltage Temperature Process ( WID) Aging ( Vth) tclk 0.01V 10 C 3.2% 25mV 0.2ns CriticalPaths (FU,t ,V,T,P,A) 100 Paths (FU) i clk TER (FU,t ,V,T,P,A) i clk i 7-Mar-25 Rajesh K. Gupta / UC San Diego 6

  8. Parametric Model Fitting Linear discriminant analysis PVTAtclk K ( | ) ( | ) P k x C y k = y argmin y 1,...,K = = 1 k 1 M ) 1( 2 T ) x = ( | ) P x k exp( M ( )) x 1 k k 0.5 (2 TER TER Class Classes of TER HFG ASIC Analysis Flow for TER ( | ) ( ) ( ) P x P k x P k ( | ) P k x = K = ( | ) ( | ) P i x C k i cost( ) k = 1 i TER=0% 33%>= TER >0% 66%>= TER >33% 100%>= TER >66% Parametric Model Class0 (C0) ClassLow(CL) ClassMedium(CM) ClassHigh(CH) We used Supervised learning (linear discriminant analysis) to generate a parametric model at the level of FU that relates PVTA parameters variation and tclkto classes of TER. On average, for all FUs the resubstitution error is 0.036, meaning the models classify nearly all data correctly. For extra characterization points, the model makes correct estimates for 97% of out-of-sample data. The remaining 3% is misclassified to the high-error rate class, CH, thus will have safe guardband. 7-Mar-25 Rajesh K. Gupta / UC San Diego 7

  9. Delay Variation and TER Characterization (P,?,?,?) (P,A,?,?)(P,A,T,?)(P,A,T,V) V=1.10V V=0.88V V=1.10V V=0.88V V=1.10V V=0.88V V=1.10V V=0.88V V=1.10V V=0.88V T=0 C A( Vth)= 0mV 100 80 80 Timing Error Rate (%) 100 80 60 60 T=120 C 60 Timing Error Rate (%) 80 40 P( WID)= 0% 40 40 60 20 20 20 40 T=0 C 0 0 20 0.9 0 0 50 0.95 A( Vth)= 100mV 1 0 1.05 100 50 0.9 1.1 Temperature ( C) 0.95 1 100 VDD (V) 1.05 1.1 Temperature ( C) T=120 C VDD (V) (P( WID) = 0%, A( Vth)=100mV) (P( WID) = 0%, A( Vth)=0mV) T=0 C A( Vth)= 0mV 100 V=1.10V V=0.88V V=1.10V V=0.88V V=1.10V V=0.88V Timing Error Rate (%) 80 80 100 T=120 C Timing Error Rate (%) 80 80 60 60 60 P( WID)= 9.6% 60 40 40 40 40 20 20 20 T=0 C 20 0 0 0 0.9 0.9 0 A( Vth)= 100mV 0.95 0.95 0 20 50 1 1 40 60 1.05 80 1.05 100 100 1.1 120 1.1 T=120 C Temperature ( C) Temperature ( C) VDD (V) VDD (V) (P( WID) = 9.6%, A( Vth)=0mV) (P( WID) = 9.6%, A( Vth)=100mV) Delay (ns) During design time the delay of the FP adder has a large uncertainty of [0.73ns,1.32ns], since the actual values of PVTA parameters are unknown. 7-Mar-25 Rajesh K. Gupta / UC San Diego 8

  10. Hierarchical Sensors Observability The question is that mix of monitors that would be useful? The more sensors we provide for a FU, the better conservative guardband reduction for that FU. [Bowman 09] Five replica PVT sensors increase area of by 0.2% [Lefurgy 11] The banks of 96 NBTI aging sensors occupy less than 0.01% of the core's area [Singh 11] Sensor overheads: In-situ PVT sensors impose 1 3% area overhead The guardband of FP adder can be reduced up to 8% (P_sensor), 24% (PA_sensors), 28% (PAT_sensors), 44% (PATV_sensors) PAT_sensors PA_sensors P_sensor PATV_sensors 2.6 2.4 2.2 2.0 FP_exp FP_add INT_mac 1.8 tclk(ns) 1.6 1.4 1.2 1.0 0.8 0.6 7-Mar-25 Rajesh K. Gupta / UC San Diego 9

  11. Online Utilization of HFG The control system tunes the clock frequency through an online model- based rule. To support fast controller's computation, the parametric model generates distinct Look Up Tables (LUTs) for every FUs We apply HFG to architecture at two granularities 1. Fine-grained granularity of instruction-by-instruction monitoring and adaptation that signals of PATV sensors come from individual FUs 2. Coarse-grained granularity of kernel-level monitoring uses a representative PATV sensors for the entire execution stage of pipeline offline TER raw data Parametric Model PATV_config target_TER Classifier P (2-bit) A (3-bit) T (3-bit) V (3-bit) instruction Sensor online PATV P A T V tclk FUi LUTs FUj GPU FUk SIMD IF CLK control tclk(5-bit) max 7-Mar-25 Rajesh K. Gupta / UC San Diego 10

  12. Throughput benefit of HFG P_sensor PA_sensors PAT_sensors PATV_sensors 1. At kernel-level monitoring, on average, the throughput increases by 70%, when the PE moves from only P_sensor to PATV_sensors scenario. The target TER is set to 0 in preference to the error- intolerant applications. 3.5 3.0 Throughput (GIPS) 2.5 2.0 1.5 1.0 0.5 0.0 P_sensor PA_sensors PAT_sensors PATV_sensors 4.5 4.0 3.5 Throughput (GIPS) 2. Instruction-by-instruction monitoring and adaptation improves the throughput by 1.8 2.1 depends to the PATV sensors configuration and kernel's instructions. 3.0 2.5 2.0 1.5 1.0 0.5 0.0 7-Mar-25 Rajesh K. Gupta / UC San Diego 11

  13. Conclusion We present a model and its usage for online variation-aware resource management as well as design time analysis of vulnerable functional units through an accurate 45nm TSMC flow. The model is used as an adaptive resource management technique to proactively prevent timing error by applying a focused guardbanding. We demonstrate the effectiveness of HFG on GPU architecture at two granularities of observation and adaptation: (i) fine-grained instruction-level; and (ii) coarse-grained kernel-level. publicly available for download at: http://mesl.ucsd.edu/site/PVTA_MODELS/models.htm 7-Mar-25 Rajesh K. Gupta / UC San Diego 12

  14. Thank You! ERC MultiTherman NSF Variability Expedition 7-Mar-25 Rajesh K. Gupta / UC San Diego 13

Related


More Related Content