
Innovative Power Distribution Networks for GPUs
Explore the innovative approach of multi-story power distribution networks for GPUs to reduce overhead in power delivery. The proposal suggests techniques like dynamic current compensation and on-chip supercapacitors for stabilizing voltage rails in GPUs. By splitting GPU cores across stacked voltage domains, the 2-story PDNs aim to lower off-chip current demand and power losses significantly.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Multi-Story Power Distribution Networks for GPUs Qixiang Zhang, Liangzhen Lai, Mark Gottscho, and Puneet Gupta Electrical Engineering Department nanocad.ee.ucla.edu mgottscho@ucla.edu
Problem: GPU Power Delivery is Expensive GPUs draw large currents due to high power consumption at low voltage Power loss & voltage noise in the power distribution network (PDN) Many supply and ground pins required Design consequences High package cost Reduced I/O pin availability Inefficient PDN Aging & wearout Goal: Reduce overhead of power delivery in GPUs 16-Mar-2016 Mark Gottscho / UCLA 1
Previous Work: Multi-Story PDNs to the Rescue? Idea: Stack multiple voltage planes for logic! [Gu ISLPED 05] Multi-Story PDN concept [Gu ISLPED 05] Challenge: How to partition logic such that current demand of each story is matched? Gate-level, functional units, cores, ? We adapt this idea to GPUs at the core level and improve it further 16-Mar-2016 Mark Gottscho / UCLA 2
Proposal: Multi-Story Approach is Ideal for GPUs We propose multi-story PDNs for GPUs! These low-cost techniques can help stabilize the voltage rails: GPUs are good for current matching at the core level Architectural homogeneity Regular layout SIMT model: single-instruction, multiple-thread Minimal communication between threads Dynamic Current Compensation (DCC) Hardware Auxiliary regulator On-chip supercapacitors NVIDIA Fermi Block Diagram [NVIDIA] Motivational Results: GPGPU-Sim (NVIDIA GTX 480 with 14 cores) + HSPICE Software Static SIMT Thread Scheduling (SSTS) NQU STO LPS RAY 16-Mar-2016 Mark Gottscho / UCLA 3
Conventional 1-Story PDN for GPUs All cores in a voltage domain share common off- chip and VDD GND 16-Mar-2016 1-Story PDN has high off-chip current demand Mark Gottscho / UCLA 4
Proposed 2-Story PDNs for GPUs GPU cores divided across two stacked voltage domains. New node : virtual ground for upper story virtual supply for bottom story ?????? 2-Story: off-chip current demand 1/2X, resistive power losses 1/4X, power pins 1/2X! 16-Mar-2016 Mark Gottscho / UCLA 5
Proposed 2-Story, 1-Regulator GPU Problem: nodes are floating! Sensitive to ?????? 16-Mar-2016 minor current imbalances between cores. Mark Gottscho / UCLA 6
Proposed 2-Story, 2-Regulator GPU ?????? nodes stabilized by the auxiliary regulator, but costs extra pins and power. 16-Mar-2016 Mark Gottscho / UCLA 7
Results: Conventional 1-Story vs. 2-Stories # Req. Power Pins (VDD+GND) Avg. Power Loss in PDN (%) Max. Volt. Swing (%) 500 180 14 420 160 12 400 140 10 120 300 8 100 224 210 80 6 200 60 4 100 40 2 20 0 0 0 RAY LPS NQU STO RAY LPS NQU STO 1-story, 1-regulator 1-story, 1-regulator 2-story, 1-regulator 2-story, 1-regulator 2-story, 2-regulator 2-story, 2-regulator 2-story, 1-regulator design is most efficient and cheapest, BUT is unreliable without fixes: target 10% MVS 16-Mar-2016 Mark Gottscho / UCLA 8
?????? On-Chip Supercapacitors Stabilize for 1-Reg. Supercaps near GPU cores can filter transient voltage noise on instead of aux. regulator ?????? 16-Mar-2016 Mark Gottscho / UCLA 9
Results: 2-Story, 1-Regulator with On-Chip Supercaps RAY Benchmark 38 uF per core required for 10% MVS on 2-story, 1-regulator Assume on-chip supercapacitor density is 23 pF/um2 [Leung 2015, El-Kady 2013] and is not stackable on logic/metal Supercap area overhead est. 1.65 mm2per core, 4.6% for chip Supercaps make the 2-story, 1-regulator design more 16-Mar-2016 reliable and more efficient with low overhead Mark Gottscho / UCLA 10
Dynamic Current Compensation (DCC) DCC can actively balance current demand among cores when supercaps cannot fix steady-state mismatches Voltage-controlled current source (VCCS) Ring oscillator (RO) Control latency critical to stability ?????? 16-Mar-2016 DCC can assist supercaps in stabilizing Mark Gottscho / UCLA 11
Results: 2-Story, 1-Regulator with Supercaps & DCC LPS Benchmark (Csupercap = 8 uF per core) 25 25 Maximum Voltage Swing (%) No VCCS IVCCS = 142 mA IVCCS = 428 mA Average Power Loss (%) 20 20 15 15 10 10 5 5 0 0 100 101 102 103 104 100 101 102 103 104 VCCS Control Latency (ns) VCCS Control Latency (ns) 10% power loss & 10% MVS with approx. 1% supercap 16-Mar-2016 die area overhead and up to 1us VCCS latency Mark Gottscho / UCLA 12 Student Version of MATLAB Student Version of MATLAB
Static SIMT Thread Scheduling (SSTS) Current profiles may be imperfectly matched for different cores Propose software-based solution Given prior knowledge of workload characteristics Minimize average difference in top/bottom story current demand via thread placement We use a greedy thread partitioning algorithm akin to Fiduccia-Mattheyses (FM) SSTS is well suited for compensating static power offsets 16-Mar-2016 Mark Gottscho / UCLA 13
Results: 2-Story, 1-Regulator with Supercaps & SSTS Max. Voltage Swing (%), Csupercap = 10 uF per core 30 25 20 15 10 5 0 RAY LPS NQU STO 2-story, 1-regulator 2-story, 1-regulator (SSTS) SSTS can achieve similar result to DCC without extra 16-Mar-2016 hardware, but cannot manage dynamic variation Mark Gottscho / UCLA 14
Practical Considerations Multiple virtual ground planes required in silicon Triple-well or moat isolation processes between stories [Pei et al. IEDM 14] Boot time: need to control due to gate oxide breakdown Slowly ramp off-chip voltage Process variations & aging cause power mismatches Proposed techniques can compensate Memory/NoC/IO power distribution Use separate domains + level shifters ?????? 16-Mar-2016 Mark Gottscho / UCLA 15
Conclusion: Multi-Story PDNs Promising for GPUs Benefits Fewer required power pins More efficient power delivery Our innovations Application of multi-story to GPU Auxiliary regulator On-chip supercaps DCC SSTS Future Work: DVFS for multi-story GPUs 16-Mar-2016 Mark Gottscho / UCLA 16