Challenges and Opportunities in Scalable Data Science Software

nsf 1443054 cif21 dibbs middleware and high n.w
1 / 5
Embed
Share

Explore the current challenges and opportunities in big data research and development, including the importance of deep learning, infrastructure requirements, use of cloud technology vs. high performance computing, and integration of distributed instruments. Learn about the status of NSF projects focusing on middleware for linking HPC and ABDS, as well as the development of scalable data analytics libraries. Gain insights into the evolving landscape of big data applications and the convergence of HPC, simulations, and big data technologies.

  • Scalable Data Science
  • Big Data Research
  • High Performance Computing
  • Data Analytics
  • Middleware

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. NSF 1443054: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Software: MIDAS HPC-ABDS Geoffrey Fox Panel Talk: February 15 2017 1 Spidal.org

  2. Current challenges and opportunities for big data research and development 2 Spidal.org

  3. Some Questions? Discover what data analytics is being used or could be used in different research areas. e.g. How important is deep learning in a general science field? e.g. what data analytics is needed for Precision Medicine, SKA, explosion of light source image data This is non-trivial as need training as not so many experts in both application fields and modern big data approaches Get a better consensus as to requirements of infrastructure What fraction of data analysis can use modern cloud technology What fraction require high performance computing How important is streaming data How important is interactive use Integrate the many distributed instruments (microscopes, sequencers) Implications of exascale technologies and End of Moore s Law Research replacement of O(N2) algorithms systematically to O(NlogN) 3 Spidal.org

  4. Status of NSF 1443054 Project Big Data Application Analysis identifies features of data intensive applications that need to be supported in software and represented in benchmarks. This analysis was started for proposal and has been extended to support HPC-Simulations-Big Data convergence. The project is a collaboration between computer and domain scientists in application areas in Biomolecular Simulations, Network Science, Epidemiology, Computer Vision, Spatial Geographical Information Systems, Remote Sensing for Polar Science and Pathology Informatics. HPC-ABDS as Cloud-HPC interoperable software with performance of HPC (High Performance Computing) and the rich functionality of the commodity Apache Big Data Stack was an idea developed for proposal. We have successfully delivered and extended this approach, which is one of ideas described in Exascale Big Data report. 4 Spidal.org

  5. Status of NSF 1443054 Project MIDAS integrating middleware that links HPC and ABDS now has several components including an architecture for Big Data analytics, an integration of HPC in communication and scheduling on ABDS; it also has rules to get high performance Java scientific code. SPIDAL (Scalable Parallel Interoperable Data Analytics Library) now has 20 members with domain specific (general) and core algorithms. Benchmarks. Need to develop benchmarks covering these issues as most big data benchmarks have a commercial focus Language: SPIDAL Java runs as fast as C++ Designed and Proposed HPCCloud as hardware-software infrastructure supporting Big Data Big Simulation Convergence Big Data Management via Apache Stack ABDS Big Data Analytics using SPIDAL and other libraries 5 Spidal.org

Related


More Related Content