High Performance Fortran: Understanding Concurrent Execution Features

High Performance Fortran: Understanding Concurrent Execution Features
Slide Note
Embed
Share

In this presentation, delve into the history and key directives of High Performance Fortran (HPF). Explore how HPF enables data parallelism in Fortran 90, focusing on concepts like the PROCESSORS directive, DISTRIBUTE directive, ALIGN directive, and more. Learn about the motivations behind HPF development, its current status, and examples like Jacobi relaxation. Discover how HPF simplifies parallel programming and supports performance optimization on distributed memory machines.

  • High Performance Fortran
  • Concurrent Execution
  • Data Parallelism
  • Fortran 90
  • Directives

Uploaded on Feb 28, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. High Performance Fortran Slides based on John Merlin s slides on HPF

  2. Motivation OpenMP: Performance issues are not visible to the programmer Jacobi Relaxation: The ghost layers are brought to a processor s cache through communication Solutions for a distributed memory machine? 2/28/2025

  3. HPF History Developed on CMFORTRAN and FORTRAN90 Became extinct: Was not powerful enough Ideas/compilers were not clear Became extinct before Earth simulator brought it back Being worked on again at Rice, and in Japan 2/28/2025

  4. PROCESSORS directive Declare processor arrangements: !HPF$ PROCESSORS p (4), q (NUMBER_OF_PROCESSORS()/2, 2) !HPF$ PROCESSORS s PROCESSORS declares abstract processors They may actually be processes and can have > 1 process on a physical processor 2/28/2025

  5. DISTRIBUTE directive Distributes an array over a processor array 2/28/2025

  6. 2/28/2025

  7. ALIGN directive Relates an elements of an array to elements of another array such that the aligned elements are on the same processor 2/28/2025

  8. 2/28/2025

  9. Concurrent execution features Features that express the potential for data parallelism and concurrency FORTRAN90 array syntax The concurrent execution features introduced by HPF 1.0: FORALL PURE procedures The INDEPENDENT directive 2/28/2025

  10. Data parallelism in Fortran 90 Fortran 90 has no explicit parallelism but its array features have implicit data parallelism 2/28/2025

  11. Example: Jacobi relaxation 2/28/2025

  12. FORALL statement FORALL is a data parallel loop. E.g. FORALL (i = 2:9) A(i) = 0.5*(A(i-1) + A(i+1)) Is equivalent to A (2:9) = 0.5 * ( A (1:8) + A (3:10) ) 2/28/2025

  13. PURE functions and subroutines There are no side-effects, so can be executed concurrently 2/28/2025

  14. INDEPENDENT directive This asserts that iterations of a DO-loop or FORALL are data independent and can be executed concurrently INDEPENDENT is an assertion about the behavior of the program. If it is false, the program may fail 2/28/2025

  15. HPF and MPI To support generality, we can make MPI calls through extrinsic (non- HPF) procedures EXTRINSIC (HPF_SERIAL) to call Fortran95 on one processor EXTRINSIC (HPF_LOCAL) to call Fortran95 concurrently on all processors 2/28/2025

  16. Co-Array Fortran Laxmikant V. Kale

  17. Co-array Fortran Idea: provide the familiar Fortran-95 programming model with minimal extensions to refer to data on remote nodes To synchronize (wait for other processors to arrive at some point in their program) Syntax extension: Normal Fortan arrays use (i) to indicate index. CAF uses, in addition, a [p] to indicate data from processor p More info at: www.co-array.org http://www.hipersoft.rice.edu/caf Esp. See first 8-10 slides of the Co-array Fortran presentation at: http://www.hipersoft.rice.edu/caf/publications/index.html 2/28/2025 Co-Array Fortran 17

  18. REAL, DIMENSION(N)[*] :: X,Y X(:) = Y(:)[Q] X = Y[PE] ! get from Y[PE] Y[PE] = X ! put into Y[PE] Y[:] = X ! broadcast X Y[LIST] = X ! broadcast X over subset of PE's in array LIST Z(:) = Y[:] ! collect all Y S = MINVAL(Y[:]) ! min (reduce) all Y B(1:M)[1:N] = S ! S scalar, promoted to array of shape (1:M,1:N) 2/28/2025 Co-Array Fortran 18

  19. One-sided Communication with Co-Arrays integer a(10,20)[*] a(10,20) a(10,20) a(10,20) image 1 image 2 image N if (this_image() > 1) a(1:10,1:2) = a(1:10,19:20)[this_image()-1] image 1 image 2 image N 2/28/2025 Co-Array Fortran 19

  20. Synchronization and Misc. SYNC_ALL() barrier SYNC_ALL(LIST) everyone waits for their own list SYNC_TEAM(TEAM) only team is involved in sync SYNC_TEAM(TEAM, LIST) combination of above NUM_IMAGES(), THIS_IMAGE() 2/28/2025 Co-Array Fortran 20

  21. CAF Programming Model Features Synchronization intrinsic functions sync_all a barrier and a memory fence sync_mem a memory fence sync_team([notify], [wait]) notify = a vector of process ids to signal wait = a vector of process ids to wait for, a subset of notify Pointers and (perhaps asymmetric) dynamic allocation Parallel I/O 2/28/2025 Co-Array Fortran 21

  22. Shmem, ARMCI, and Global arrays SHMEM (shared memory) on Cray T3D Basic idea: non-cache-coherent shared memory across nodes primitives: Get and Put: copy data between calling processor and remote processor, synchronously SGI s version: http://www.shmem.org/ Open SHMMEM http://sc08.supercomputing.org/scyourway/conference/view/bof140.html 2/28/2025 Co-Array Fortran 22

  23. Rice Implementation See: John Mellor-Crummey s presentation History of shmem Cray T3D: alpha 21064 Put and get primitives (shmem_get..) Non-cache coherent shared memory.. That s the idea that stayed with us.. In hardware In software 2/28/2025 Co-Array Fortran 23

  24. Other partitioned Global Adress Space systems Global Arrays Library (no compiler magic) Declare global arays and partition them across nodes Get and put primitives for tiles (sub-sections) of global array NWChem is a major application developed using it UPC and UPC++ (Unified Parallel C) Take the idea of a pointer with pointer arithmetic to the distributed memory world Berkeley implementation and GWU implementation An important system still in use 2/28/2025 Co-Array Fortran 24

  25. Other Parallel Languages

  26. PGAS: Partitioned Global Address Space PGAS languages and libraries support global view of data Global Arrays Library (no compiler magic) Declare global arrays and partition them across nodes Get and put primitives for tiles (sub-sections) of global array NWChem is a major application developed using it UPC and UPC++ (Unified Parallel C) Takes the idea of a pointer with pointer arithmetic to the distributed memory world Berkeley implementation and GWU implementation CAF (Co-array Fortran): is now incorporated in Fortran standard L.V.Kale 26

  27. Other programming models Task Based Parallel Programming Models ParSEC Legion Higher level compiled languages: Chapel, Regent, .. C++ based languages STAPL, DARMA, FleCSI L.V.Kale 27

Related


More Related Content