Optimization Framework for Clock Skew Variation Reduction

a global local optimization framework n.w
1 / 20
Embed
Share

Discover a global-local optimization framework for minimizing clock skew variation in multi-mode multi-corner designs in VLSI CAD, addressing the challenges of timing issues across multiple corners in SoCs.

  • Clock Skew
  • Optimization Framework
  • VLSI CAD
  • Multi-Mode Design
  • Multi-Corner Reduction

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo Han, Andrew B. Kahng, Jongpil Lee, Jiajia Li and Siddhartha Nath VLSI CAD LABORATORY, UC San Diego UC San Diego / VLSI CAD Laboratory -1-

  2. Outline Motivation Related Work Our Optimization Framework Experimental Setup and Results Conclusions -2-

  3. Motivation Many signoff PVT corners in modern SoCs Clock skew variation across corners ping-pong effect == fixing timing issues at one corner leads to timing violation at others Our goal: Minimize clock skew variation Skew = -0.1/+0.2 Clock latency Launch Capture 1.0 0.9 datapath Corner Skew SS, 0.7V, -25 C FF, 1.1V, -25 C Low voltage: gate delay dominates High voltage: wire delay dominates Skew reversal Power/area overheads 1.1 0.7 -0.1 +0.2 1.0 1.1 /0.7 /0.7 launch path capture path -3-

  4. Outline Motivation Related Work Our Optimization Framework Experimental Setup and Results Conclusions -4-

  5. Related Work Skew minimization at multiple corners [Cho05] perform temperature-aware skew reduction based on an improved DME [Lung10] minimize the worst clock skew across corners with delay correlation factors Skew variation minimization across corners [Restle01] propose two-level non-tree structure, in which mesh is applied at bottom level [Su01] use mesh for top-level of clock network [Rajaram04] insert crosslinks in a clock tree to minimize skew variation Our work: systematic optimization framework for minimization of clock skew variation in clock tree -5-

  6. Skew Variation Reduction Problem r r r r Clock skew between sink pair (i, j) at corner C ?????,? sinks i and j at corner C ?: difference between delays from r to Skew variation between corner pair (C, C ) ??,??,? = |?? ?????,? ? | ? ?? ?????,? At C : Skewi,jC C C C max j j j j i i i i Maximum skew variation for sink pair (i, j) ??,? = max (?,? )??,??,? C C C At C :Skewi,jC r: root; i, j: sinks Skew variation reduction problem: Given a routed clock tree, minimize the sum over all sink pairs ofmaximumskewvariation Minimize (?,?)??,? -6-

  7. Outline Motivation Related Work Our Optimization Framework Experimental Setup and Results Conclusions -7-

  8. Our Optimization Framework Incremental optimization of a CTS solution Perform both global and local optimization Global optimization uses LP to determine delta delays on arcs Local optimization performs iterative local moves root root root Global Optimization Buffer insertion/removal, routing detour Local Optimization Local moves (e.g., sizing/displacement) Optimized database Routed clock tree database target buffer last-stage buffer sinks Original routed clock tree After local optimization After global optimization -8-

  9. Global Optimization: LP Formulate linear program to minimize skew variation Determine the delta delay on each arc at each corner Based on LUTs to insert/remove buffer and detour wires Discreteness of buffer delays ECO feasibility is important Minimize ?, ?| ? Subject to (?,?)??,? ? (??,?: maximum skew variation) ?????,? ??? ???? (???: clock latency to sink i at corner C) (4) ??? min delay without wire detour (???: arc delay) (5) ??? ??? range of delay ratio from LUTs (6) (1) Minimize number of ECO changes (2) Sweep U for solution with minimum skew variation (3) Ensure no skew degradation (4) Maximum clock latency constraint (1, 5, 6) Improve ECO feasibility ?| ( ??: delta delay of arc k at corner C) (1) (2) (3) ? ????_?????,? ? -9-

  10. Our Optimization Framework Incremental optimization of a CTS solution Perform both global and local optimization Global optimization use LP to determine delta delays on arcs Local optimization perform iterative local moves Global Optimization Buffer insertion/removal, routing detour Local Optimization Local moves (e.g., sizing/displacement) Optimized database Routed clock tree database -10-

  11. Local Optimization: Moves Iterative local moves to minimize skew variation Tree types of local moves 1. Displacement {N, S, E, W, NE, NW, SE, SW} by 10 m x one-step sizing 2. Displacement by 10 m x one-step sizing on child buffer 3. Reassign to a new driver (i) at the same level, (ii) within bounding box of 50 m x 50 m 10 m 10 m ... ... ... ... ... ... ... ... ... ... ... ... ... (1) (2) (3) Each move is expensive (= legalization, ECO routing, RC extraction, STA) Each buffer has ~100 candidate moves Which move is the best? Our solution: learning-based model -11-

  12. Machine Learning-Based Model Predict driver-to-fanout latency change due to local moves 100% Local move %Buffers identified to have the best move 80% Analytical models Routing: FLUTE, STST Cell delay: Liberty LUTs Wire delay: Elmore, D2M 60% Flute+ED Flute+D2M STST+ED STST+D2M Model 40% 20% 0% 0 2 4 6 8 10 12 Delta delays #Attempts Each attempt is a local move 114 buffers 45 candidate moves for each buffer Learning-based model identifies best moves for more buffers with less #attempts Learning-based model Delta delays -12-

  13. Outline Motivation Related Work Our Optimization Framework Experimental Setup and Results Conclusions -13-

  14. Experimental Setup Technology: foundry 28nm LP Initial clock tree from Synopsys IC Compiler Testcases: (a) high-speed application processor, (b) memory controller In yellow are clock nets/cells and sinks Clock ports Clock ports Corners Corner Process Voltage Temperature BEOL Apply to which testcase C0 SS 0.90V -25 C Cmax (a), (b) C1 SS 0.75V -25 C Cmax (a), (b) C2 FF 1.10V 125 C Cmin (b) C3 FF 1.32V 125 C Cmin (a) -14-

  15. Experimental Results (1) Up to 22% reduction on sum of skew variation over all sink pairs No skew degradation at all corners Negligible area and power overhead Skew (ps) C1 530 387 192 192 Variation (ns) Power (mW) Area ( m2) Testcase Flow #Cells C0 214 175 179 176 C2/C3 226 188 282 232 Original Global-local Original Global-local 512 399 972 841 2515 2553 5568 5574 0.355 0.356 0.865 0.866 3615 3706 8556 8557 (a) (b) -15-

  16. Experimental Results (2) Figure shows comparison of skew variation on (a) Our optimization significantly reduces the large skew variation between corner pairs Corner pair = (C0, C3) Corner pair = (C0, C1) Original skew variation (ns) Original skew variation (ns) Optimized skew variation (ns) Optimized skew variation (ns) -16-

  17. Outline Motivation Related Work Our Optimization Framework Experimental Setup and Results Conclusions -17-

  18. Conclusion and Future Works First framework to minimize sum of skew variation over all sink pairs in a clock tree Up to 22% reduction of the sum of skew variation Future works Study resultant power and area benefits Model to predict a buffer location for minimum skew over a continuous range of possible locations Thank You! -18-

  19. Backup Slides -19-

  20. Experimental Results (3) Figure shows distribution of skew ratios between C0 and C1 Our optimization significantly reduces the variation of skew ratios between corner pairs Global-local Original = 2.26 ?2= 2.26 = 1.34 ?2= 3.21 #Sink pairs #Sink pairs Ratio (= skew at C1 / skew at C0) Ratio (= skew at C1 / skew at C0) -20-

More Related Content