
Power-Gating for On-Chip Routers: Challenges and Solutions
Explore the impact of power-gating on on-chip routers, including the use of Node-Router decoupling to optimize energy savings and minimize performance penalties. Discover the conventional application of power-gating, its limitations, and the challenges faced in NoC routers. Learn about the opportunities and effectiveness of power-gating in enhancing on-chip networks.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers Lizhong Chen and Timothy M. Pinkston SMART Interconnects Group University of Southern California December 4, 2012
NoC Power Consumption 100% Static power percentage 80% Buffer_static 21% 60% VA_static 7% 40% SA_static 2% Dynamic 62% Xbar_static 5% 20% Clock_static 4% 0% 1.2V 1.1V 1.0V 1.2V 1.1V 1.0V 1.2V 1.1V 1.0V 65nm 45nm 32nm Canonical router at 45nm and 1.0V Chip power has become a main design constraint High power consumption in the NoC Static power increasing in on-chip routers Various contributors to router static power 2
Use of Power-gating Applications of power-gating Save static power by cutting off power supply to block Have been applied to cores and execution units Few works on applying it to on-chip routers Objectives of power-gating Maximize net energy savings Minimize performance penalty Proposed Node-Router Decoupling Increase power-gating opportunity and effectiveness in on-chip networks Vdd sleep signal Virtual Vdd Power-gated Block GND 3
Conventional Use of Power-gating Applied to NoC Routers Power off the router When the datapath of the router is empty, and After notifying all of its neighbors (PG signal) Router C WU PG Awake the router when Any neighbors assert WU signal Neighbors wait for PG signal to clear WU WU Router A Router D Router B PG PG WU PG Effectiveness subject to Wakeup latency (~12 cycles for router) Breakeven-time (BET) The minimum number of consecutive gated-off idle cycles to offset power-gating energy overhead (~10 cycles for router) Router E 4
Challenges in Conventional Use of Power-gating to NoC Routers BET limitation is intensified Intermittent packet arrivals => fragmented idle intervals 18 cycles 01 Full system simulation on PARSEC shows that 61% of the total number of idle periods has length less than BET! 9 cycles 9 cycles 18 cycles 01 0 10 Cumulative wakeup latency in multi-hop NoCs Worse for larger networks Disconnection problem Idle period is upper bounded by local node s traffic Disconnected network 9 cycles 9 cycles 0 1 2 3 0 10 S D 4 5 6 7 8 9 10 11 Conventional use of power gating to NoC routers can have limited effectiveness 12 13 14 15 5
Node-Router Decoupling in a Nutshell Break node-router dependence through decoupling bypass paths Add two bypass paths to each router On the chip-level: form a bypass ring connecting all nodes Bypass Inport => NI ejection, NI injection => Bypass Outport Mitigate BET limitation Use bypass paths instead of 0 1 1 2 3 3 Router 1 waking up routers Hide wakeup latency Use bypass paths while routers are waking up Eliminate disconnection All nodes are always connected by the bypass ring D Router 3 S Router 2 4 4 5 6 7 NI of Router 2 Node 2 Router 6 8 9 10 11 NI = Network Interface 12 13 14 15 6
Outline Introduction, motivation, basic idea Node-router decoupling implementation Evaluation methodology and results Related work Summary 7
On-chip Networks NoC-based architecture Canonical Router architecture Credit Credit VC Allocator R R R R Route Switch Allocator Computation R R R R Output Unit Input Unit R R R R R R R R Network Interface (NI) Core, Cache, Memory Controller 8
NoRD Bypass Paths Add two bypass paths to each router One bypass from Bypass Inport to the NI ejection One bypass from the NI injection to Bypass Outport Bypass latch VA & SA Ejection Q To Processor Core Eject NI X+ FIFO NI Core X- Y+ Y- Y- Y+ ctrl Inject From Processor Core X- NI FIFO Injection Q X+ Output buffer Network Interface State-transitions On -> off, when the datapath of router is empty Off -> on, when a wakeup metric exceeds a threshold VC request rate at the local NI Low implementation cost of decoupling bypass paths and forwarding logic: 3.1% of router area 9
NoRD Routing Based on Duato s Protocol for fully adaptive routing Minimal path along gated-on routers & gated-off routers 0 1 2 3 S D 4 5 6 7 D 8 9 10 11 12 13 14 15 10
NoRD Routing Based on Duato s Protocol for Fully Adaptive Routing Minimal path along gated-on routers & gated-off routers Limited misroutes possible only if all routers off along min path Bypass Ring serves as escape path 0 1 2 3 S 4 5 6 7 D 8 9 10 11 12 13 14 15 D 11
Increasing NoRD Efficiency Differentiate routers Routers have different impact on performance based on their locations in the NoC 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 12
Increasing NoRD Efficiency Differentiate routers Routers have different impact on performance based on their locations in the NoC Performance-centric class vs. Power-centric class Wake up early a few performance-critical routers to add shortcuts in routing Wake up late the rest (majority) of the routers to save more static power Use an off-line program to classify the routers 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 13
Evaluation Methodology Simulation platform Platform: Simics + Gems (Garnet+Orion2.0) Workloads: PARSEC 2.0 + Synthetic traffic Key parameters for simulations Sun UltraSPARC III+, 3GHz 32KB, 2-way, LRU, 1-cycle latency Shared L2 per bank Cache block size Coherence protocol Network topology Router Virtual channel Input buffer Link bandwidth Memory controllers Memory latency Core model Private I/D L1$ 256KB, 16-way, LRU, 6-cycle latency 64Bytes MOESI 4x4 and 8x8 mesh 4-stage, 3GHz 4 per protocol class 5-flit depth 128 bits/cycle 4, located one at each corner 128 cycles 14
Schemes Under Comparison No power-gating (No_PG) Conventional power-gating (Conv_PG) Apply power-gating technique conventionally to routers Optimized conventional power-gating (Conv_PG_OPT) Conv_PG + early wakeup (hide some wakeup latency) Node-router decoupling (NoRD) Power-gate routers and enable bypass paths when load is low When load becomes high, routers are powered on gradually 15
Static Energy Comparison Static energy saved Conv_PG: 51.2%, Conv_PG_OPT : 47.0% NoRD: 62.9% Relative improvement of NoRD: 23.9% and 29.9% No_PG Conv_PG Conv_PG_OPT NoRD 100% Static energy (norm. to No_PG) 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 16
Power-gating Overhead Reduction NoRD reduces power-gating overhead and number of router wakeups by over 80% Conv_PG Conv_PG_OPT NoRD Conv_PG Conv_PG_OPT NoRD 100% 100% Power-gating overhead energy 90% Reduction in router wakeups 80% 80% 70% 60% 60% 50% 40% 40% 30% 20% 20% 10% 0% 0% Power-gating Overhead Reduction in # of router wakeups 17
Overall NoC Energy 120% 100% Breakdown of power (normalized to No_PG) 80% link static power 60% link dynamic power router dynamic power 40% router static power power-gating overhead 20% 0% Conv_PG_OPT NORD Conv_PG_OPT NORD Conv_PG_OPT NORD Conv_PG_OPT NORD Conv_PG_OPT NORD Conv_PG_OPT NORD Conv_PG_OPT NORD Conv_PG_OPT NORD Conv_PG_OPT NORD Conv_PG_OPT NORD Conv_PG_OPT NORD No_PG No_PG No_PG No_PG No_PG No_PG No_PG No_PG No_PG No_PG No_PG Conv_PG Conv_PG Conv_PG Conv_PG Conv_PG Conv_PG Conv_PG Conv_PG Conv_PG Conv_PG Conv_PG blackscholes bodytrack canneal dedup ferret fluidanimate raytrace swaptions vips x264 AVG Overall NoC energy saved Conv_PG: 9.4%, Conv_PG_OPT: 9.1%, NoRD: 20.6% Static energy savings exceed dynamic energy losses 18
Performance Average packet latency penalty Conv_PG: 63.8%, Conv_PG_OPT: 41.5%, NoRD: 15.2% Execution time penalty Conv_PG: 11.7%, Conv_PG_OPT: 8.1%, NoRD: 3.9% No_PG Conv_PG Conv_PG_OPT NoRD No_PG Conv_PG Conv_PG_OPT NoRD 130% 45 Execution time (norm. to No_PG) 120% Average packet latency (cycles) 40 35 110% 30 100% 25 90% 20 80% 15 70% 10 60% 5 50% 0 Average packet latency Execution time 19
Related Work Applications of power-gating in CMPs Apply to cores and execution units in CMPs (Z. Hu, et al., 2004; A. Lungu, et al., 2009; N. Madan, et al., 2011; others) Apply power-gating conventionally to on-chip routers (H. Matsutani, et al., 2008; S.Jafri, et al., 2010, H. Matsutani, et al., 2010) Effectiveness is limited by the BET requirement, wakeup delay and disconnection problem Other uses of bypass For fault-tolerance: work for infrequent on/off transitions (M. Koibuchi, et al., 2008; J. Kim, et al., 2006; others) For express channels: improve performance and dynamic power (W. Dally, 1991; A. Kumar, et al., 2007; B. Grot, et al., 2009; others) For reducing power consumption in links (E. Kim, et al., 2003; V. Soteriou, et al., 2004; B. Zafar, et al., 2010; others) These techniques are either not suitable for run-time router power-gating or have different targets, thus being orthogonal to this work 20
Summary Node-router dependence severely limits the use of power-gating in on-chip routers BET limitation, wakeup delay and disconnection problem A novel approach, Node-Router Decoupling (NoRD), is proposed based on power-gating bypass paths Significantly reduces the number of power state transitions Increases the length of idle periods Completely hides the wakeup latency from the critical path Eliminates network disconnection problems NoRD increases power-gating opportunity while minimizing performance overhead 21
Thank you! 22
Power-gating Basics Vdd Energy sleep signal cumulative energy savings breakeven time Virtual Vdd Power-gated Block energy overhead 0 t0 t1 t2 t3 t time GND Breakeven-time (BET) The minimum number of consecutive gated-off idle cycles to offset power-gating energy overhead Around 10 cycles for router Wakeup latency Around 10~15 cycles for router 23
NoRD Routing Based on Duato s Protocol Escape resources are comprised of escape VCs of the bypass ring formed by (Bypass Inport, Bypass Outport) pairs Other VCs are adaptive resources Packets on adaptive VCs First routed minimally If not possible, detoured by one May still routed on adaptive VCs If misrouted hops reach threshold Forced to enter escape VCs Packets on escape VCs Confined to bypass ring until destination 0 1 2 3 S D 4 5 6 7 8 9 10 11 12 13 14 15 D 24