Grainsize in Programming
Learn about the concept of grainsize in programming and how it can impact performance. Discover tips for choosing the right grainsize to optimize your program's efficiency and overhead management.
Uploaded on | 0 Views
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
GRAINSIZE GRAINSIZE 1
Grainsize Charm++ philosophy: Let the programmer decompose their work and data into coarse- grained entities It is important to understand what I mean by coarse-grained entities You don t write sequential programs that some system will auto- decompose You don t write programs when there is one object for each float You consciously choose a grainsize, but the number of processors Or parameterize it, so you can tune later but choose it independently of 2 Charm Tutorial
Crack Propagation This is 2D, circa 2002... but shows overdecomposition for unstructured meshes Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Both decompositions obtained using Metis. Pictures: S. Breitenfeld, and P. Geubelle Charm Tutorial 3
Working definition of grainsize: amount of computation per remote interaction Choose grainsize to be just large enough to amortize the overhead 4 Charm Tutorial
Rules of Thumb for Grainsize Make it as small as possible, as long as it amortizes the overhead More specifically, ensure: Average grainsize is greater than k v (for some k, say 10v) v: overhead per message No single grain should be allowed to be too large Must be smaller than T/p, where p: number of processors, T: sequential execution time Can generalize by saying must be smaller than k m v (say 100v) Important corollary: You can be at close to optimal grainsize without having to think about p, the number of processors 5 Charm Tutorial
Grainsize in a common setting 2 MB/chare, 256 objects per core number of points per chare 6
Grainsize: Weather Forecasting in BRAMS BRAMS: Brazillian weather code (based on RAMS) AMPI version (Eduardo Rodrigues, with Mendes, J. Panetta, ..) Instead of using 64 work units on 64 cores, used 1024 on 64 7 Charm Tutorial
Baseline: 64 Objects Profile of Usage for Processors 0- -63 Time per Step: 46s Profile of Usage for Processors 0 Time per Step: 46s 63 100 90 80 70 Usage Percent (%) Usage Percent (%) 60 50 40 30 20 10 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 PE 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 Avg PE 8 Charm Tutorial
Overdecomposition: 1024 Objects Profile of Usage for Processors 0 Time per Step: 33s Profile of Usage for Processors 0- -63 Time per Step: 33s 63 100 90 80 70 Usage Percent (%) Usage Percent (%) 60 50 40 30 Benefits from 20 communication/computation overlap 10 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 PE 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 Avg PE 9 Charm Tutorial
With Load Balancing: 1024 objects Usage Profile for Processors 0 Time per Step: 27s Usage Profile for Processors 0- -63 Time per Step: 27s 63 100 90 80 70 Usage Percent (%) Usage Percent (%) 60 50 No overdecomp (64 threads) + Overdecomposition (1024 threads) + Load balancing (1024 threads) 46 sec 33 sec 27 sec 40 30 20 10 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 PE 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 Avg PE 10 Charm Tutorial