Unleashing the Potential of Extreme-Scale Computing Systems

1 / 11

Embed Share

Dive into the realm of extreme-scale computing systems, where supercomputers like Tianhe-2 lead the pack at 33.86 petaflops, propelling towards exa-scale computing by 2020. Explore the challenges and advancements in high-performance computing, cloud computing, networking infrastructure, and interconnection networks. Discover how HPC clusters are evolving, pushing boundaries for more efficient and reliable solutions. Stay ahead with insights on network technology, system interconnects, and the future of extreme-scale systems.

farzanmo Follow

Uploaded on May 01, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Extreme-scale computing systems High performance computing systems Current No. 1 supercomputer Tianhe-2 at 33.86 petaflops Pushing toward exa-scale computing by 2020, 32 times bigger than Tianhe-2 (almost need to double the speed every year). Many issues ranging from applications to systems such power, resilience, networking, applications.

Extreme scale computing systems Cloud computing data centers: Amazon EC2 Hugh push to move computing/storage to the cloud computing infrastructure Extreme scale to achieve the scale of economics Applications are more diverse Networking infrastructure needs significant improvement Security

Extreme scale computing systems Hadoop cluster with huge IO bandwidth Beyond traditional HPC May fit into cloud computing infrastructure

Interconnection Networks The networking system that connects all nodes in these extreme scale systems. Can easily be the main performance limiting factor in this type of systems. Nodes are getting bigger (64 cores, 128 cores) Total cores counts increases. Network capacity needs to increase at least proportionally. Network complexity is super-linear to the total port count. Reaching a stage that drastic changes are needed.

Extreme-scale systems are getting bigger every year HPC clusters are pushing towards exa-scale computing (from 10 peta-scale) A lot of pressure to build more efficient, more reliable, and power-efficient interconnects. Many new proposals are showing up at this stage.

interconnects Extreme-scale PDSs are Internet-in-a-building Traditional networking issues: topology, routing, flow control, congestion control Recent topology/routing proposals for extreme scale systems Achieving performance requirement with the budget constraints.

Network technology Open standards 1/10/100-G Ethernet InfiniBand low latency communication Openflow and software defined networks Proprietary technology IBM Bluegene Cray Aries

System software, communication sub-systems, and applications Parallel IO systems Topology aware job allocation and node mapping Communication protocols One-sided .vs. two-sided communications Collective communication algorithms All of these can affect the traffic in the networks must be considered in the interconnect design.

Performance models and evaluation methods Performance modeling techniques for networks/systems/applications. Workload characterization. Application tracing Simulating and modeling of large scale systems using realistic workloads is very challenging.

Resilience and power-awareness System and application resilience techniques and analysis Fault tolerance techniques in hardware and software Resource management for system resilience and availability. Energy efficient HPC Energy efficient data centers Trade-offs among performance, power, and resilience is the key for the future interconnect design insufficient tools to investigate the trade-offs.

This course Targets students who are interested in research in the interconnection networks area Go through a large amount of recent papers to bring the students up-to-date in research in this area in general. Practice network simulation and modeling. Introduce necessary techniques, algorithms, math background to perform research in this area.

Unleashing the Potential of Extreme-Scale Computing Systems

Download Presentation

Presentation Transcript

Related

More Related Content