
Efficient Task Mapping on Many-Core Processors
Explore quadrisection-based task mapping on many-core processors for energy-efficient on-chip communication. The research focuses on minimizing communication power consumption by mapping communicating application tasks to cores.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Quadrisection-Based Task Mapping on Many-Core Processors for Energy-Efficient On-Chip Communication Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 1
~Introduction~ Problem Formulation Quadrisection Evaluation Conclusions Task Mapping Problem Task mapping for many-core processor DSP Our focus mapping communicating application tasks to cores to minimize communication power consumption N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 2
Introduction ~Problem Formulation~ Quadrisection Evaluation Conclusions Assumptions Applications of interest consist of multiple tasks communicating via message passing, represented by communication task graph (CTG) CTG for DSP Number of tasks are less than number of cores Assume mesh network and dimension-ordered routing N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 4
Introduction ~Problem Formulation~ Quadrisection Evaluation Conclusions Problem Difficulty A(i, j), communication rate from task i to task j B(i, j), Manhattan-distance between node i and node j of the NoC p(i), p(j), nodes to which task i and task j are mapped Represent the dynamic power consumption Task mapping is exactly the Quadratic Assignment Problem NP-Hard Difficult to approximate N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 4
Introduction ~Problem Formulation~ Quadrisection Evaluation Conclusions State of the Art NMAP Initial greedy map followed by pair wise swaps Computation intensive, slow BMAP Clustering based approach merging communicating tasks Poor quality (High cost value) Bisection Recursively divide the task graph into equal halves to minimize the communication between them [1] [1] K. Srinivasan and K. S. Chatha. A Technique for Low Energy Mapping and Routing in Network-on-Chip Architectures. ISLPED 2005 N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 5
Introduction Problem Formulation ~Quadrisection~ Evaluation Conclusions Can we improve Bisection? Key observation: Mapping is a 2D problem Bisection uses vertical and horizontal min-cut based on Fiduccia- Mattheyses (FM) algorithm FM FM FM FM is applied to 1D, the derived local optimum doesn t consider the 2D mapping N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 6
Introduction Problem Formulation ~Quadrisection~ Evaluation Conclusions Quadrisection Main idea: apply FM algorithm to 2D mapping 1 hop FM FM 2 hops Explore a different part of design space The local optimal considers 2D mapping Further optimize subpanel placement by brute force search N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 6
Introduction Problem Formulation Quadrisection ~Evaluation~ Conclusions Evaluation Methodology Evaluations were performed numerically and using simulation Performance was evaluated for mesh networks for Synthetically generated random task graphs Multimedia benchmarks Simulations were performed using cycle level simulator and power consumption were evaluated using Orion2 N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 8
Introduction Problem Formulation Quadrisection ~Evaluation~ Conclusions Cost: Synthetic Task Graph 100 randomly generated CTGs were mapped to 4-by-4 mesh network Average Cost (Normalized to Quadrisection) 1.3 1.2 1.1 1 0.9 0.8 NMAP BMAP Bisection Quadrisection N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 9
Introduction Problem Formulation Quadrisection ~Evaluation~ Conclusions Power: Multimedia Benchmarks NMAP BMAP Bisection Quadrisection Optimal 1.4 Communication Power 1.3 1.2 1.1 1 0.9 0.8 DSP MWD MPEG4 VOPD Radio A/V Avg N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 10
Introduction Problem Formulation Quadrisection ~Evaluation~ Conclusions Execution Time Quadrisection s runtime is comparable with previous approaches N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 10
Introduction Problem Formulation Quadrisection Evaluation ~Conclusions~ Conclusions Quadrisection outperforms commonly used mapping approaches by exploiting 2D structure Run-time of Quadrisection is comparable to the state-of-art Useful addition to NoC designer s toolkit. N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 11
Introduction Problem Formulation Quadrisection ~Evaluation~ Conclusions Cost: Multimedia Benchmarks NMAP BMAP Bisection Quadrisection Optimal 1.6 1.4 1.2 1 Cost 0.8 0.6 0.4 0.2 0 DSP MWD MPEG4 VOPD Radio A/V Avg N. Michael, Y. Wang, G. E. Suh & A. Tang Quadrisection-Based Task Mapping 10