Cloud Computing for an AI-First Future Panel Discussion
Artificial Intelligence (AI) is driving a revolution in various industries, requiring massive computing resources like cloud technology and supercomputers. This panel explores the implications and infrastructure needed for AI-first clouds, examining hardware suitability, driving applications, and more.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Discussion: Cloud Computing for an AI First Future 13th Cloud Control Workshop, June 13-15, 2018 Sk vsj holm in the Stockholm Archipelago Geoffrey Fox, June 14, 2018 Department of Intelligent Systems Engineering gcf@indiana.edu, http://www.dsc.soic.indiana.edu/, http://spidal.org/ `, Work with Judy Qiu, Supun Kamburugamuva, Shantenu Jha, Kannan Govindarajan, Pulasthi Wickramasinghe, Gurhan Gunduz, Ahmet Uyar 4/28/2025 1
Cloud Computing for an AI First Future Artificial Intelligence is a dominant disruptive technology affecting all our activities including business, education, research, and society. Further, several companies have proposed AI first strategies. The AI disruption is typically associated with big data coming from edge, repositories or sophisticated scientific instruments such as telescopes, light sources and gene sequencers. AI First requires mammoth computing resources such as clouds, supercomputers, hyperscale systems and their distributed integration. AI First clouds are related to High Performance Computing HPC -- Cloud or Big Data integration/convergence This panel could examine the driving applications and their implication for hardware and software infrastructure. Panel can look at type of hardware suitable for AI First clouds 4/28/2025 2
Summit Supercomputer Oak Ridge Architecture: 4,608 compute servers, each containing two 22-core IBM Power9 processors and six NVIDIA Tesla V100 graphics processing unit accelerators, Power 15 MW Storage 250 PB Infiniband NVLINK Note AI First smartest adjective 4/28/2025 3
Big Data and Simulation Difficulty in Parallelism Size of Synchronization constraints Loosely Coupled Tightly Coupled HPC Clouds/Supercomputers Memory access also critical HPC Clouds: Accelerators High Performance Interconnect Commodity Clouds Size of Disk I/O MapReduce as in scalable databases Graph Analytics e.g. subgraph mining Global Machine Learning e.g. parallel clustering Deep Learning LDA Pleasingly Parallel Often independent events Unstructured Adaptive Sparse Linear Algebra at core (often not sparse) Current major Big Data category Structured Adaptive Sparse Parameter sweep simulations Largest scale simulations Just two problem characteristics Exascale Supercomputers There is also data/compute distribution seen in grid/edge computing 4/28/2025 4
Discussion at Workshop I Situation Situation in Now moving to AI First Future The discussion was animated with however agreement that The current big data focus is to left of previous slide with for example Flink and Spark having workflow and SQL special thrusts Over next five years, we can expect a greater focus of the big data community on the right side with deep learning, LDA and graph analytics having features quite similar to classic HPC problems that require high performance communication There was also agreement that are AI First cloud was a plausible vision However there was significant disagreement as to how to realize an AI First cloud and if HPC was relevant It was felt that clouds would always chose a sweet spot too far from HPC choices for HPC community to be able to use. Big Data and Simulation Difficulty in Parallelism Size of Synchronization constraints Loosely Coupled Tightly Coupled HPC Clouds/Supercomputers Memory access also critical HPC Clouds: Accelerators High Performance Interconnect Commodity Clouds Size of Disk I/O MapReduce as in scalable databases Graph Analytics e.g. subgraph mining Global Machine Learning e.g. parallel clustering Deep Learning LDA Pleasingly Parallel Often independent events Unstructured Adaptive Sparse Linear Algebra at core (often not sparse) Current major Big Data category Structured Adaptive Sparse Parameter sweep simulations Largest scale simulations Just two problem characteristics Exascale Supercomputers There is also data/compute distribution seen in grid/edge computing 6/14/2018 4 Time 4/28/2025 5
Discussion at Workshop II The slow rate of change in the HPC community with many legacy applications using technologies like Fortran was noted Would Cray survive in a converged world ? The storage differences between HDFS style in clouds and Lustre in HPC was noted However S3 (AWS) is nearer Lustre architecture (with storage separate from compute) than HDFS. The interconnect difference between Ethernet (Clouds) and Infiniband (HPC) was noted The HPC community could surely adapt to chosen public cloud communication technology as this does not affect programming model just details of implementation. If one wants to run graph analytics on public clouds one will need excellent communication as the irregular structure and low compute/communication of big data graphs probably makes them harder for efficient implementation than big simulation problems. 4/28/2025 6
Discussion at Workshop III Another point of view was that the movement to the right of slide 4 offers a natural chance for convergence of HPC and public cloud hardware and software. The expectation of importance of accelerators in both big data and HPC was stressed Maybe those accelerators could be different (TPU v. GPU) but the same accelerators could drive HPC and public clouds Probably one would always use specialized machines (supercomputers) for the really large jobs with say 100,000 cores or more as not easy to get co-location at their level on public clouds Note NASA evaluation of public clouds for workloads on NASA supercomputers https://www.nas.nasa.gov/assets/pdf/papers/NAS_Technical_Report_NAS- 2018-01.pdf Note BDEC meeting on following slide. Ask Geoffrey Fox for more information 4/28/2025 7
Big Data and Extreme-scale Computing (BDEC) http://www.exascale.org/bdec/ BDEC Pathways to Convergence Report http://www.exascale.org/bdec/sites/www.exascale. org.bdec/files/whitepapers/bdec2017pathways.pdf Next Meeting October 3-5, 2018 Bloomington Indiana USA. October 3 is evening reception with meeting focus Defining application requirements for a data intensive computing continuum Later meeting February 19-21 Kobe, Japan (National infrastructure visions); Q2 2019 Europe (Exploring alternative platform architectures); Q4, 2019 USA (Vendor/Provider perspectives); Q2, 2020 Europe (? Focus); Q3-4, 2020 Final meeting Asia (write report) 4/28/2025 8