Performance Analysis and Scaling on Cori-KNL Supercomputer

initial performance of ne1024 ne512 on cori n.w
1 / 5
Embed
Share

"Explore the initial performance and scaling capabilities of ne1024 and ne512 on Cori-KNL supercomputer using various elements per node configurations. The study demonstrates strong scaling, consistent per-step timings, and potential optimizations for improved performance." (298 characters)

  • Performance Analysis
  • Cori-KNL
  • Scaling
  • Supercomputer
  • Optimization

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Initial Performance of ne1024/ne512 on Cori-KNL Using scream repo, CAM_TARGET=theta-l, 5 min timesteps Noel Keen LBL

  2. With ne1024, need to run with many more elements per node than with ne120. Cori has 9668 KNL nodes. resolution # elements # nodes required for 2048 elements/node # nodes required for 1024 elements/node # nodes required for 512 elements/node ne120 (28km) 86400 43 85 169 ne512 (6.5km) 1572864 (18x) 768 1536 3072 ne1024 (3.25km) 6291456 (73x) 3072 6144 12288 nodes Elements per node with ne1024 1024 6144 1536 4096 2048 3072 3072 2048 4096 1536

  3. No I/O. Per step timing remarkably consistent (outside of first & last steps) and can be used to make estimates

  4. Estimated SDPD (simulated day per day) on Cori- KNL based on 12 steps runs with 5min timesteps ne1204 strong scales well nodes PPN PIO_VERSION With Restart at end of Day Without Restart 1024 8 1 4.6 5.5 2048 8 1 6.2 11.5 3072 8 1 16.1 ne512 -- PIO2 allows about 2x write improvement and using 16 PPN about 24% faster than 8 PPN Nodes PPN PIO_VERSION With Restart at end of Day Without Restart 1024 8 1 24.3 34.9 1024 8 2 29.4 36.8 1024 16 2 30.2 45.5 All runs with shoc. Some runs included rrtmgp and p3 which had small effect on performance

  5. Summary Demonstrate strong scaling with ne1024 Tested PIO_VERSION=2 with ne512. Experimenting with PIO stride and PIO buffer size. Does not yet work with ne1024 Iterating on optimal ways to run with 1024 Cori KNL nodes to take advantage of MPP discount & Q priority. Tested running with shoc, rrtmgp, and p3 with ne1024 Using more MPI s per node should improve performance if we can avoid OOM Writing timing checkpoints per step provides interesting performance analysis with shorter run times Need to improve initialization time (30-35 min for ne1024)

Related


More Related Content