
Accelerating Scientific Data Access and Insight through NSF NCAR Collaboration
Unify scientific data for seamless access and insight with the collaboration between NSF NCAR and Data Commons. Enhance data infrastructure, broaden community access capabilities, and support climate research. Access data, run workflows, and train researchers effectively. Overcome challenges in data services for remote users. Realize the potential of NSF NCAR's data ecosystem for accelerated scientific insights.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Integrating NSF NCARs data infrastructure with the OSDF HTC25 Harsha Hampapura, Associate Scientist II NSF National Center for Atmospheric Research
Outline Introduction - National Discovery Cloud (NDC) for Climate - NSF NCAR and Data Commons NCAR-OSDF collaboration - Data access and Workflows Compute Challenges - Dask on the OSPool
NDC: Integrating NCARs data infrastructure with OSDF Broaden community access capabilities to NCAR s data - model generated outputs (future projections and historical reanalysis) - observations - support climate/extreme weather impacts research. Train researchers on how their research workflows can leverage the capabilities of the NDC, - Run workflows on a variety of computational resources - Access data from both NCAR and other `data origins
About NSF National Center for Atmospheric Research - - - 8 Labs and Programs Research: Broad range of topics Sun s effect on Earth s atmosphere, Oceans role in weather and climate prediction Educating next generation of scientists -
Search NSF NCAR Data Catalog https://data.ucar.edu
Current Data Services Challenges Current Data Services Challenges Remote Users -> Legacy download and analyze workflow model from multiple repositories Time consuming & inefficient User experience and data transfer service options differ by repository Image Credit: Ryan Abernathy NCAR HPC users -> Direct access to selected datasets Limited to users with NCAR HPC accounts Only a subset of NCAR datasets accessible No good search capability outside of ask where a dataset is located on storage NSF NCAR NG-GDEX Project Plan Responsibilities
Realizing the Potential of NSF NCARs Research Data Ecosystem Unifying Scientific Data for Seamless Access and Insight Vision: Data Commons Accessible to All - Simple, consistent access for researchers from novice to expert. Integrated for Insight - Seamless tools for exploration, analysis, and AI/ML workflows Built for Community Leadership - Enable open, collaborative, next- gen cyberinfrastructure A unified data infrastructure enables seamless access to scientific data accelerating insight through both traditional and next-generation workflows. Grossman, R.L. Ten lessons for data sharing with a data commons. Sci Data 10, 120 (2023). https://doi.org/10.1038/s41597-023-02029-x Pivoting to NSF NCAR s Integrated Research Data Commons -Forget about the data format/location, do the science
Completed: RDA Integrated with the Open Science Data Federation Completed: RDA Integrated with the Open Science Data Federation The Open Science Data Federation is a federated platform for delivering datasets from repositories to compute in an effective, scalable manner. Democratize access to data : Performant distributed access Pivoting to NSF NCAR s Integrated Research Data Commons -Forget about the data format/location, do the science
NCARs data + OSDF: Current Status Progress in the past year: 2024: NCAR origins and cache deployed March 2025: Publicly available data (> 1.67 PB ) from NCAR s Research Data Archive (RDA) are accessible to the OSDF. https://rda.ucar.edu/datasets/d010092/dataaccess/# Several intake-ESM catalogs published on the Research Data Archive. Access metrics reported to NCAR.
NSF NCAR Research Data Archive: Web File Access HTTPS access powered by OSDF https://rda.ucar.edu/datasets/d010092/dataaccess/# https://data-osdf.rda.ucar.edu/ncar/rda/d633000/e5.oper.an.pl/ 194012/e5.oper.an.pl.128_060_pv.ll025sc.1940120100_1940120123.nc
NSF NCAR Research Data Archive: Catalog based access Analysis-ready Access Example https://rda.ucar.edu/datasets/d010092/dataaccess/#
NCAR Data Commons OSDF/Direct read User Intake-ESM (POSIX) Direct reads NCAR User NSF NCAR Data Commons Streaming data Intake-ESM (OSDF) PelicanFS 13
PelicanFS: fsspec for the Pelican Platform Questions about PelicanFS? Talk to
NCARs data + OSDF: Workflows 16 workflows spanning 12 datasets published Available at https://github.com/NCAR/osdf_examples For example: Access data from multiple data origins - CMIP6 models (AWS opendata origin) v/s - Observation (NCAR origin) - Global Mean Surface Temperature Anomaly (GMSTA) comparison The example jupyter notebook can be found at: https://github.com/NCAR/osdf_examples/blob/main/notebooks/cmip6_tem ps_zarr.ipynb
GMSTA example: Users perspective AWS Open Data NCAR Data Commons OSDF Director OSDF origin (aws- open-data/us-west-2) OSDF origin (ncar) OSDF Cache
GMSTA example: Users perspective AWS Open Data NCAR Data Commons OSDF Director OSDF origin (aws- open-data/us-west-2) OSDF origin (ncar) OSDF Cache
Geoscience on the OSPool What does it take to run the same workflow on the OSPool ? Checklist
Geoscience on the OSPool: Dask Challenges Open-source python library for parallel computing Dask creates a (DAG) Directed Acyclic Graph of tasks Dask-jobqueue launches dask workers as batch jobs, submits to user s queue Task graph is then executed by dask workers Problem: Two-way communication needed between workers and scheduler
Solution: Expanding the reach of OSPool TaskVine + Floability backpack Check out Douglas Thain s talk: `Wrangling Complex Notebook Workflows with Floability Jun 4, 2025, 11:40 AM
Thank you ! Questions ? Let s Connect harshah@ucar.edu