Identifying Power Inefficiencies in Chips

Slide Note

Most modern systems, such as cell phones and servers, have power-critical components. This study focuses on the sources of inefficiencies in general-purpose chips, comparing Application-Specific Integrated Circuits (ASICs) and Chip Multiprocessors (CMP). By exploring the energy gap between these two types of chips and examining the power breakdown in CMP, the research aims to minimize power consumption while maximizing flexibility and efficiency. The analysis delves into the H.264 encoding process as a case study, highlighting the significant power gap between ASICs and CMP. Through an in-depth investigation of tasks involved in H.264 encoding, the study sheds light on the key steps and energy distribution in CMP systems.

merl_so Follow

Uploaded on Mar 09, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Written by: Hameed et. Al ISCA 2010 Presented by: David Schlais Understanding Sources of Inefficiency in General-Purpose Chips

Motivations Most modern systems are power critical Cell phones, servers, tablets, etc. ASIC Low power / Low flexibility Chip Multiprocessors (CMP) Higher Power / High Flexibility Goal: Low power / High Flexibility Paper attempts to identify what causes CMP power inefficiencies

Application-Specific Integrated Circuits (ASICs) ASIC Application Specific Performs specific task efficiently Power Area Low overhead in order to execute tasks

Chip Multiprocessors General-Purpose High overhead Execution units in red Rest is overhead High Flexibility Can perform many instructions http://pages.cs.wisc.edu/~david/courses/cs752/Fall2014/handouts/lecture/03_pipeline.pdf

What is the energy gap between these two? And how do we minimize it?

Exploring CMP vs ASIC gap H.264 encoding (MPEG-4 advanced video coding) Compared ASIC for H.264 encoding vs TensilicaCMP Why H.264 encoding? Large ASIC vs CMP gap Easier to see inefficiencies in CMP ASICs for H.264 are commercially available Commercial products serve as benchmark Contains variety of computational motifs Results are applicable to larger set of applications 150-500x power gap

H.264 (MPEG-4 AVC) 99% of tasks are summed up in 5 steps: Integer Motion Estimation (IME) Computes vector of image-block motion Fractional Motion Estimation (FME) Refines initial match to quarter-pixel resolution Intra Prediction (IP) Based on surrounding image-blocks, makes prediction Transform and Quantization (DCT/Quant) Determines difference between prediction and current Context Adaptive Binary Arithmetic Coding (CABAC) Encodes coefficients

CMP power breakdown Only ~6% of total energy Goes towards computation

Instruction Decode Logic Decoding is ~30% of overall power Repetitive decoding of similar instructions Solution? 16 and 18-way SIMD datapaths Many Instructions can be done in parallel Solution? 2 and 3-slot VLIW instructions Impact: 10x performance increase 10x energy decrease Still ~30% of power to decoding

Work done per instruction Application Specific instructions RISC instructions

Magic Instructions Desire: Single instruction executes 100s of operations Reuse/shift pixels to prevent repetitive x-1, x0, x1, x2, x3loads Requirements: Hardware support Custom data storage elements Links to provide large amounts of data to these storage units Path from storage units to functional units Instruction support New instruction to dictate when to do these operations

Results

Results

Takaways Fetching instructions is expensive Current Techniques only provide partial improvement SIMD VLIW Processors have large overheads and small % computing Power of FU alone exceeds ASIC designs Purpose for automatic tools to create modified instructions

Takeaways Cont. Application specific instructions and hardware reduce inefficiencies Instructions performing hundreds of operations is ideal Extensible processor is not enough Current RISC architectures contain inefficiencies not present in ASIC designs Still many areas to improve processor power & performance

Progress since 2010 Neural acceleration Esmaeilzadeh, Hadi, et al. "Neural acceleration for general-purpose approximate programs."Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2012. Dynamically Specialized datapaths Govindaraju, Venkatraman, Chen-Han Ho, and KarthikeyanSankaralingam. "Dynamically specialized datapaths for energy efficient computing."High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on. IEEE, 2011. Power-Efficient Compute-intensive GPGPU Gilani, Syed Zulqarnain, Nam Sung Kim, and Michael J. Schulte. "Power-efficient computing for compute-intensive GPGPU applications."High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 2013.

Thank you Questions?

Identifying Power Inefficiencies in Chips

Download Presentation

Presentation Transcript

Related

More Related Content