
Data Fabric Analysis and Use Cases in Science and Research
Explore the analysis of data fabric implementation and essential components in various scientific disciplines like environmental science, life science, and more. Discover use cases and institutions involved in the data federation and observation platforms.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Data Fabric IG Use Case Analysis
Data Fabric Analysis 2 how to come to essential components & services? Analyze Data Practices
Data Practices II EUDAT federation 4 Community Centers Common Data Centers projects to push limits and raise awareness
Data Practices II split of functions 5 physical layer operations are trivial know how to do it logical layer operations are complex due to relations, etc. all LL information needs to be aggregated and we need to have a secure access layer around it
Data Fabric Analysis 6 how to come to essential components & services? Analyze Use Cases
10 (+5) Use Cases so far (2 in development, others mature) 7 environmental science all indicated nodes are centers of national, regional and even worldwide federations natural science life science humanities, soc. sciences IT, various
10 (+5) Use Cases so far (2 in development, others mature) 8 Name Institute state 1 2 3 4 5 6 7 8 9 10 11 12 Language Archive Geodata Sharing Platform Datanet Federation Concortium ADCIRC Storm Forcasting EPOS Plate Observation ENVRI Environment Observation Nanoscopy Repository Cell structures Human Brain Neuroinformatics ENES Climate Modeling LIGO Gravitation Physics ECRIN Medical Trial Interoperation VPH Physiology Simulation Max Planck Institute NL Academy of China RENCI US RENCI US INGV/CINECA Italy U Helsinki, Finland KIT, Germany EPFL Switzerland DKRZ Germany NCSA US U D sseldorf Germany U London UK in operation In operation In operation In operation In operation In design In design in testing In operation In operation In testing In operation 13 Species Archive Nature Museum Germany In operation all indicated nodes are centers of national, regional and even worldwide federations International NeuroI Facility 14 INCF Sweden In operation 15 Molecular Genetics MPI Germany In operation
10 (+5) Use Cases so far (2 in development, others mature) 9 Name Institute state 1 2 3 4 5 6 7 8 9 10 11 12 Language Archive Geodata Sharing Platform Datanet Federation Concortium ADCIRC Storm Forcasting EPOS Plate Observation ENVRI Environment Observation Nanoscopy Repository Cell structures Human Brain Neuroinformatics ENES Climate Modeling LIGO Gravitation Physics ECRIN Medical Trial Interoperation VPH Physiology Simulation Max Planck Institute NL Academy of China RENCI US RENCI US INGV/CINECA Italy U Helsinki, Finland KIT, Germany EPFL Switzerland DKRZ Germany NCSA US U D sseldorf Germany U London UK in operation In operation In operation In operation In operation In design In design in testing In operation In operation In testing In operation 13 Species Archive Nature Museum Germany In operation all indicated nodes are centers of national, regional and even worldwide federations International NeuroI Facility 14 INCF Sweden In operation 15 Molecular Genetics MPI Germany In operation
Issues of Relevance 10 management, analytics, conversion provenance reproducibility workflows, policies, deployment virtual collection builder new collection new metadata temp store AAI/FIM highly distributed in federations FS, Cloud, DB Repository System PID, Metadata Rights Syntax, Types Semantics Relations sensors simulations crowd etc.
How do WGs/IGs fit? 11 REPRO PROV PP BDA BROK CERT FIM REP DMP DOM CERT CITDD
Components I 12 domain of registered digital objects (DO) incl. basic organization principles (data, code, knowledge) -> worldwide PID system (Handles/DOI) domain of registered actors -> worldwide ID system (ORCID) domain of trusted repositories for DOs -> worldwide Rep Registry proper DFT/DSA/WDS compliant repository systems accepted policy commons (proper organization support, self-documenting, tested/certified, etc.) -> policy component registry policy/services -> service registry authentication system -> various in place (ORCID just number) authorization system -> authorization registry
Components II 13 MD components/schemas -> metadata schema registry data types /schemas/formats -> data type registry semantic categories -> category registry vocabularies -> vocabulary registry what about complex ontologies (thesauri, ontologies, etc.) what about mapping relations?
Components II 14 MD components/schemas -> metadata schema registry data types /schemas/formats -> data type registry semantic categories -> category registry vocabularies -> vocabulary registry what about complex ontologies (thesauri, ontologies, etc.) what about mapping relations?
What to do today 15 4 use cases (max 10 min) with the following goals understand whether we get what we want to get (common components/services) discuss whether we need to adapt the template Zhu Dieter Sean Giuseppe Ed discuss how to move on with use cases & analysis discuss my first look on C/S (?) update of existing and appearance on wiki (deadline) deadline for first round (when, whom to motivate, ?) virtual meeting for a discussion on analysis (when?) at P6 (September) a first document with analysis
16 Did we forget something?
Data Practices I Survey 17 ~120 Interviews/Interactions 2 Workshops with Leading Scientists (EU, US) too much manual or via ad hoc scripts too much in Legacy formats (no PID & MD) there are lighthouse projects etc. but ... DM and DP not efficient and too expensive (Biologist for 75% of his time data manager) federating data incl. logical information much too expensive hardly usage of automated workflows and lack of reproducibility
Data Practices I Survey 18 ~120 Interviews/Interactions 2 Workshops with Leading Scientists (EU, US) too much manual or via ad hoc scripts too much in Legacy formats (no PID & MD) there are lighthouse projects etc. but ... DM and DP not efficient and too expensive (Biologist for 75% of his time data manager) federating data incl. logical information much too expensive hardly usage of automated workflows and lack of reproducibility