DKRZ Data Center Services and Infrastructure Overview
DKRZ Data Center provides a comprehensive range of data life cycle services, including long-term archival, quality assurance, and support for CMIP6 projects. The center is equipped with advanced infrastructure for data processing, replication, and integration, ensuring reliable data management and accessibility. Explore the detailed updates and operational activities at DKRZ to understand the evolving data environment and services offered.
Uploaded on Mar 17, 2025 | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
DKRZ Data center requirements and services Stephan Kindermann, Michael Lautenschlager, Katharina Berger, Tobias Weigel, Hans Dieter Hollweg Deutsches Klimarechenzentrum (DKRZ) S. Kindermann (DKRZ)
Overview Update: the new data infrastructure hosting environment at DKRZ ESGF: DKRZ data life cycle services LTA / WDCC ESGF integration Quality assurance Data near processing Towards PID based services CMIP6 at DKRZ S. Kindermann (DKRZ) 2 17.03.2025
DKRZ data center update Migration to new integrated HPC / data system separate DTNs (starting 2016) establishment of a national MIP data analysis cache data cloud to support data ingest process until end 2015 from 2016 from mid 2015 (pre-shutdown) ESGF infrastructure HH: 2x10 GB DFN: 2x3..5 GB 4 data nodes all: behind firewall Openstack cloud Index node G P F S Mistral 2 DTNs No separate DTNs 2 data nodes L U S T R E 1 CERA/ESGF data node VMs (XEN) HPC + Interactive nodes + visualization nodes ro NFS CERA / ESGF portal CERA LTA infrastructure LTA data node CERA portal LTA (Oracle) Oracle DB cluster HPSS national MIP data cache management etc. tbd S. Kindermann (DKRZ) 3 17.03.2025
DKRZ long term archival and data citation Mayor use case Replication Support data evaluation Quality Assurance Long Term Archival DOI assignment Exposure as ESGF data node ESGF shutdown CERA Portal / DDC.. ESGF WPS COG portal Data node Data near processing replication versioning container server CERA (Oracle) QA DOI National climate data node (MIP cache ) ESGF Process LTA (HPSS) container cache ingest S. Kindermann (DKRZ) 4 17.03.2025
WDCC / CERA / HPSS ESGF integration Operational for CMIP5 CERA metadata (Oracle) ESGF index Thredds server with ESGF security filter + HPSS data container server ESGF data node Improved system for CMIP6: FUSE based mounting of DKRZ HPSS/cache legacy system Extraction of CERA metadata for ESGF mapfile standard standard ESGF publication in an offline mode LTA ESGF Datanode Postgres/ THREDDS ESGFIndexnode ESGF Solr index COG portal ESGF Publisher Mapfile generation FUSE container server CERA (Oracle) Future COG portal visibility of (non CMIP) WDCC LTA project data LTA (HPSS) container cache S. Kindermann (DKRZ) 5 17.03.2025
(CMIP data) Quality Assurance Software Completely re-structured and modularized: Flexible configuration Used heavily for CORDEX will support CMIP6 Separate cf-checker module NetCDF File main File NC-API M-D Store Annotations User-modified Directives CF Conventions Tables CF Conv. Checks Consistency between sub- temporal files QA CF Conventions Check Versions: 1.4 - 1.6 Project Rules Data DRS CV 8-9 Chapters of rules Variable Requirements (CMOR) Time table based config (area-type, cf-standard- name, stand-region-name, ..) Project Configuration & Tables Source code: https://github.com/h-dh/QA-DKRZ Pre-packaged versions: conda based, docker based Documentation: http://qa-dkrz.readthedocs.org/en/latest/qa-user-manual.html S. Kindermann (DKRZ) 6 17.03.2025
National MIP data analysis cache / node Ad hoc approach Data needed help desk data manager RO mounted on HPC data analysis nodes Support for data analysis VM deployment Support for tool dependency management (install recipes, conda, docker) WPS framework to support web service deployments Birdhouse (https://github.com/bird-house ) conda/docker support Support for home institution (test-) deployments transparent solution: WPS Data near processing replication versioning National climate data node (MIP cache ) ESGF ingest S. Kindermann (DKRZ) 7 17.03.2025
Stable file/collection management !? ESGF WPS COG portal CERA Portal / .. Data node Data near processing replication versioning container server CERA (Oracle) QA DOI National climate data node (MIP cache ) ESGF Process LTA (HPSS) container cache ingest S. Kindermann (DKRZ) 8 8 17.03.2025 17.03.2025
Towards PID based services Motivation: Stable ESGF data space based on PID infrastructure Collaborations: ePIC: DKRZ partner prefix registration EUDAT: DKRZ leads PID task API RDA: DKRZ co-chairs PIT and collections WGs Envri+: PIDs in environmental sciences Next ESGF steps: Test-Environment (PID system + publisher) Scalable, stable PID assigment: CMOR integration, CDNOT involvement PID API / ESGF publisher integration High available message queuing system integration S. Kindermann (DKRZ) 9 17.03.2025
Summary Long term archival use case ESGF integration Quality Assurance PID assigment early in data life cycle early citation and DOI assignment future PID based data management services future PID based end user services future PID based provenance support S. Kindermann (DKRZ) 10 17.03.2025
.. Thank You S. Kindermann (DKRZ) 11 17.03.2025
DKRZ services New developments New integrated HPC/Data System installed in 2015, ~ 50 PByte Lustre Storage cloud (openstack) Community data analysis cache and platform ESGF: WDCC/HPSS/ESGF data node WPS compute platform birdhouse data ingest Towards PID / early citation services S. Kindermann (DKRZ) 12 17.03.2025
(Early) Data Citation (DM + ESGF) Impact on CMIP6 data management (DM) and ESGF governance (ESGF) Request from modelling groups for a data citation reference just after ESGF data publication CMIP6 data publication workflow: CMIP6 citation granularities are collection levels: Simulation Model S. Kindermann (DKRZ) iCAS2015 13 13 17.03.2025 17.03.2025