Unlocking the Potential of NSF's Terascale Computing Systems
Explore the significance of NSF's Terascale Computing System and Teragrid in supporting scientific research and maximizing its benefits. Discover how these advanced systems cater to users' needs for efficient work processes and problem-solving on a large scale, shaping the future of scientific endeavors.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
What the Users Want or The NSF s Terascale Computing System and Teragrid: Support for Scientific Research or Making the Best of It Mike Levine Scientific Director, PSC SOS7 Durango 5 Mar 2003 P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 1 T T T T T E E E E E R R R R R
Choice of a Title I. The first title, What do the Users Want, was proposed by Neil. Thank you, Neil. It is a lovely title and a very important question, but I don t really understand things, here, that most of you do not already understand: The users want to be able to get their work done! As efficiently as possible in their time and machine time On ever increasing problem sizes. (Usually, they are not picky about the method.) P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 2 T T T T T E E E E E R R R R R
Choice of a Title II The second title, The NSF s Terascale Computing System and Teragrid: Support for Scientific Research, was my choice. Dan Reed might do a better job on this subject but went for something more relevant to this meeting. Perhaps not being as wise as he, I will say a few words on this subject. Then, I will abuse Neil s hospitality and move to another topic. P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 3 T T T T T E E E E E R R R R R
NSFs Terascale Computing Systems I have already introduced the TCS in the panel Machines Already Operational . TCS was meant to very substantially increase the size of machine open to US scientists. This it has done. Soon to be joined by DTF (the Distributed Terascale Facility) Described, yesterday, by Dan Reed. A very large IBM/IA64 - Linux cluster Distributed between NCSA and SDSC with Additional capabilities at ANL and Caltech. Interconnected by a multi-lamba network, 30 Gb/s to each site. P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 4 T T T T T E E E E E R R R R R
Teragrid Join TCS to DTF Upgrade network to be routed and extensible Extend it to PSC and into TCS Begin, shortly, to incorporate additional sites & resources. A basis for a National CyberInfrastructure To revolutionize our efforts in scientific research Incorporate computation, data intensive work, visualization, instruments, diverse facilities. Lots of software effort to provide a uniform, distributed environment for users. One of the new resources is EV7/Marvel. P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 5 T T T T T E E E E E R R R R R
EV7 (cpu)/Marvel (system) the greatest scientific processor, ever (Bill Camp, yesterday) Pre-production systems at PSC, CEA, for several months. Jean Gonnord, yesterday, mentioned CEA s work on EV7 (He has a substantial body of benchmark information to be summarized, below) Production systems are now shipping (2 at PSC) PSC is building up towards 2 systems ~250 processors ~1/2 TB memory/system. #1: NSF: large memory, high bw, SMP; ETF resource. #2: NIH: all of the above, specific data-intensive applications. We believe that Marvel will supply a good deal of What the Users Want. In addition to science output, we hope to Learn more about the application value of high bandwidth systems. Encourage vendors to match it or do better. P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 6 T T T T T E E E E E R R R R R
NIH Marvel: Partner with four world leaders in three diverse fields: Eric Lander and the Whitehead group (genomics), Michael Klein et al from the University of Pennsylvania (structural biology) Klaus Schulten et al from the University of Illinois (structural biology) Terrence Sejnowski et al, UCSD and PSC (neuroscience). They present compelling examples of data-, memory-, and compute-intensive problems that can only realistically be attacked with the proposed architecture. P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 7 T T T T T E E E E E R R R R R
Alpha EV7/Marvel Alpha EV7 = Alpha EV68 core (1 GHz) + on-chip 2D torus SMP interconnect + huge IO & memory bandwidth (per CPU) 12.8 GB/s (=6B/f! recall Buddy s chart; ES45=2 GB/s) + low memory latency 80ns, local (ES45=140ns) Marvel = 2-128 proc SMP s [HP has yet to promise systems >64p] Low intra-SMP memory latency ( ~350 ns, furthest node) Large aggregate memory (global, up to 8 GB/proc) 8P early test system testing at PSC Multi-week tests (local & remote users, incl ORNL) Multiple applications (& OS s) Excellent McCalpin Streams performance 2*16P production systems now at PSC P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 8 T T T T T E E E E E R R R R R
EV7 The System is the Silicon. Building a System . Memory Memory EV7 + I/O + Memory = SYSTEM ! P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 9 T T T T T E E E E E R R R R R
Direct processor-processor interconnects (16P Torus) P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 10 T T T T T E E E E E R R R R R
128P Partitionable System using 8P Building Block Drawers Up to 128 processors Up to 4TB memory Loads of IO (PCI-X & AGP) P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 11 T T T T T E E E E E R R R R R
EV7 tests (November 2001) 1.1 Ghz, 1Ghz memory TERA 1753 : 640 Mflops (29% of peak) 1003 : 660 Mflops PUMA 272s (EV68@833 : 405s) 1.48 times better than ES45 (clock ratio is 1.32) PUMA with MPI, 16 processors (800 MHz) ES45 : 51.384 s EV7 : 33s (ratio : 1.56) P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N Jean Gonnord, CEA/DAM 12 T T T T T E E E E E R R R R R
EV7 tests A 500x192 1000x384 : EV7@1100 EV68@833 ratio A 45.11 198.57 77.633 442.1 1.72 2.23 EV7@1100 EV68@833 ratio D 700 P 700 D 1000 P 1000 759 677 755 668 272 266 236 230 2.79 2.55 3.20 2.90 V P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N Jean Gonnord, CEA/DAM 13 T T T T T E E E E E R R R R R
Choice of a Title III: Let s Make the Best of It An important topic drifted in and out of several talks, yesterday but was not given direct attention. Bill Camp mentioned it as reliability Dan Reed mentioned it as carefully engineered clusters Dieter said we had heard enough of fault tolerance . (He, I think, was wrong.) Dan Katz mentioned it under software management and configuration . It is directly supportive of What the Users Want. P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 14 T T T T T E E E E E R R R R R
Lets Make the Best of It The issue is system reliability & availability , hardware and software. It , in the title,refers to our lovely, expensive systems. More specifically, I refer to issues that I would characterize as good engineering and not to theoretical issues (which are however more fascinating). In contrast to some of the fault tolerance discussion, I suggest there is a fair amount of low-hanging fruit requiring but small amounts of effort and having little or no impact on performance. Many of the comments in the panel of Machines Already Operational implied clear lack of sufficient attention to these types of issues. P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 15 T T T T T E E E E E R R R R R
(at this point I wish I had) The Cartoon from TheNew Yorker A picture of a man leaving Church, pausing at the door to say to the Minister: Thank you, Reverend, for not mentioning me by name in your sermon (but, I was preparing this during Thomas talk, yesterday evening) P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 16 T T T T T E E E E E R R R R R
Whose Baby is This? We save bundles by buying commodity components, either raw or from large vendors. At the component level, we still benefit from the substantial engineering that went into their design. At the system level, however, multiple forms of danger lurk. When no one else claims ownership of dealing with these dangers, it is our baby . (Applause for the efforts, described today, to strongly influence vendors.) P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 17 T T T T T E E E E E R R R R R
What Forms of Danger? Issues of scale: We are integrating these components into systems in many cases well beyond anything imagined by the original designers. Issues of style of usage: We are often using these systems in modes different from that intended by the original designers and not understood by them. (e.g. very large-scale, highly synchronized applications may be peculiar to HPTC.) In addition to needing some things that they do not provide, we also often do not need things that they do provide and for which other system compromises have been made. These issues are terribly atypical of the vast majority of their customer base. (as has been mentioned, frequently) Not dealing with them limits our system scalability. P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 18 T T T T T E E E E E R R R R R
What Dont We Need? We do not run a life support system. We can get along pretty well with the temporary failure of a fair amount of hardware. With increasing scale, attempts to totally prevent failure are insufficient. Then, work to prevent splatter is more important than slightly reducing the frequency of failure. We do not need rapid response to most failures. Once a node goes, the immediate job is toast. Here, too, containment is very important. P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 19 T T T T T E E E E E R R R R R
Examples of Dealing With Such Issues If you have not, you should read the RAS requirements in the Red Storm solicitation. (Jim Tompkins mentioned that, briefly) From those requirements, you can readily infer the kinds of problems that Camp, Tompkins etal are working to avoid. I, at least, was impressed by the level of effort SNL is prepared to expend in this domain. At PSC, working with HP, we have implemented continuous monitoring of soft-fault errors. We are now doing true preventative maintenance . Contrary to historical practice, analysis need not be done on nodes optimized for computation. Dan Reed mentioned this sort of thing, yesterday. Particularly applicable to disks, memory & network. P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 20 T T T T T E E E E E R R R R R
Solutions I am not here to propose a solution. My immediate goal is to call attention to the problem. Just as it has become clear that HPC makes special demands on Linux, it also makes special demands on system configuration and operation. Both might benefit from more coordinated attention. Perhaps at the next conference, Neil might consider some further attention to this issue. P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H P I T T S B U R G H SU PERCOMP UTING SU PERCOMPU TING SU PERCOMP UTING SU PERCOM PUT ING SU PERCOMP UTING C E N C E N C E N C E N C E N 21 T T T T T E E E E E R R R R R