Data Fabric Interest Group Plenary Core Session

Data Fabric Interest Group Plenary Core Session
Slide Note
Embed
Share

This session covers topics such as election of new Co-Chair, review of activities, global digital object cloud update, discussions on gaps/opportunities/next steps, and current activities of the Data Fabric Interest Group. It also explores types of data fabrics, the nature of a data fabric, and defining core components for data fabric infrastructure.

  • Data fabric
  • Plenary session
  • Core components
  • Digital object cloud
  • Data workflows

Uploaded on Mar 10, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. 1 Data Fabric Interest Group Plenary 9 Core Session Barcelona

  2. 2 Agenda Welcome, Introduction (5 minutes) Election of new Co-Chair (5 minutes) Review of Activities (30 minutes) Global Digital Object Cloud Update (15 minutes) Discussion/Questions(20 minutes) Gaps/Opportunities/Next Steps (15 minutes)

  3. 3 Co-Chair Election

  4. DFIG Current Activities 4 Ecosystems and Core Components Recommendations (https://hdl.handle.net/11304/a3d012ca-4e23-425e- 9e2a-1e6a195b966f) Aggregate slide deck (DF IG Documents RDA wiki page) Common Governance and Operating Procedures GEDE group Metadata and PIDs PID Kernel Subgroup -> PID Kernel WG proposal Brokering Services DFIG/Brokering Workflows Training and Education DFIG/ETHRD workshop planning session

  5. Types of Data Fabrics 5 We can differentiate between user data fabrics to support discovery and access to published data collaboration data fabrics that support processing of shared collections repository data fabrics that are focusing on preserving data Supported virtualized entities in these DFs are data collections that include the context of DOs workflows encapsulating analyses data flows managing data transport Essential capabilities are interoperability, federation, interaction control * Source: Reagan Moore

  6. Nature of a Data Fabric 6 Data Fabrics in the above sense are blueprints to create generic infrastructures that support virtualisation of collections, workflows and data flows Instantiations of Data Fabrics will offer a set of services some of which are core and others are optional Data Fabrics are NOT instantiations of a specific collection, workflow or data flow.

  7. 7 Defining Core Components configuration A configuration B Task to solve: Identify and specify Common Components (CoCo) Recommend CoCo Put CoCo in place Not ONE architecture: Identify CoCos that could cooperate in specific configurations to solve a function (infra, VRE, etc.), Common Components & Services Specific Components & Services

  8. 8 Identifying Core Components Core Data Type Definitions, Metadata Standards and Vocabularies Trustworthy Data Repositories Trustworthy, Machine-Actionable Registries of Repositories, Data Types, Metadata Standards, Vocabularies, Authorization Records, Licenses PID Services Collection Services Brokering Services Common Governance and Operating Procedures Training and Education

  9. 9 From Core Components to Data Fabrics Configurations must be driven by workflows and use cases Increasing scale requires moving away from Human Controlled Processing to Type-Triggered Automatic Processing Component configurations should enable an ecosystem of tools and services

  10. Human Controlled Processing (HCP) 10 Observations Experiments Simulations etc. Cycle can be manually controlled or semi-automatically via pre-set pipelines. Even in case of semi-automatic pipelines humans are close-in "designers

  11. Type-Triggered Automatic Processing (T-TAP) 11 New feature: cycles run highly autonomously - precise steps depend on the types of data entering the workflow Data Events exposing new DOs Structured Data Markets adding new data some kind of profile matching Researchers are not in direct control Data Type Registry Data Federation Agents Processing services Brokering & Mediation services result scripts

  12. Use Cases 12 A neurologist wants to research the causal relation between Alzheimer phenomena and specific genes, proteins, neural activity, etc., using machine learning algorithms on confidential data from a federation of hospitals and labs. A linguist researches theories about economy of languages finding objective patterns that make languages more or less easy to process and learn by applying machine learning algorithms on open data from a variety of sources filtered by languages and feature The data manager of a large data centre must continuously and asynchronously check the quality of new data of specific types, transform it according to certain rules, and create n replications in a federation

  13. Recommendations Update 13 PID Focus Area work is progressing GEDE Europe (https://www.rd-alliance.org/groups/gede-group-european-data-experts- rda) was highly active with f2f and virtual meetings Result is a new report: Grouped List of Assertions (also uploaded to DFIG pages) consultation of in total 25 reports and papers suggested by participants extraction of <60 assertions from all documents then classification of these assertions into sections (1. nature of PIDs and PID systems, 2. their relevance, 3. assigning PIDs, 4. using PIDs, 5. Handles and DOIs, 6. others) much agreement in core assertions some variety in way of assigning and using PIDs

  14. Areas of discussion 14 PID in binding role, which type of attribute to add to PID record or to landing page type of attributes need to be machine readable and specified how to indicate versions time of assignment of PIDs granularity of PID assignment role of repositories (trustworthy) in assigning use of fragment indicators how to add life cycle statements (deletion, splitting, merging, etc.) when Handles and when DOIs

  15. Next Steps 15 broad commenting on summary assertions by RDA/DFIG and GEDE people within April 17 via web pages and P9 sessions virtual meeting in May (DFIG and GEDE groups) f2f meeting in June/July to finish the main summary assertions afterwards a final report on agreements and identifying areas of disagreements start interacting about next topic area primary areas of interest could be repositories (tasks, interfaces, data organisation, etc.) and data processing (workflows, type triggered, etc.)

  16. PIDs remain central 16 PID Record PID PID CKSM PID PID paths PID Metadata Rights Relations Provenance

  17. PID Kernel Update 17 Worked started in Denver at P8 Working groups met over the last 6 months Draft profile created PID Kernel Working Group Case Statement Submitted Work completes at P11

  18. Global Digital Object Cloud (GDOC) ID: 843 G (object:publication) Identifier Service Identifier Service ID: 987/ Repo/Registry Repo/Registry Repo/Registry 101110010101001010 010101010101010100 010101010101010100 111110101101010111 (object:dataset) Identifier Service Repo/Registry Repo/Registry ID: 123 ID: XZY A ID: 876 A ID: HGY A (object:collection) End users, developers, and automated processes deal with persistently identified, virtually aggregated digital objects, including collections which are overlays on multiple network services which in turn are overlays on existing or future information storage systems.

  19. GDOC Is it Real? Storage not our problem, but Latency is an issue Changing interfaces can be a problem Services Identifier Common resolution systems PID Kernel, Profiles Repo/Registry Common APIs Confusion: Repository not equal to Storage Confusion: Registry is a Repository of metadata objects Object Level Common Object Interface must be provided by Repo/Registry Collections ARE Objects Clients Good News / Bad News web browser remains universal client Corporation for National Research Initiatives

  20. GDOC Is it Real? CONCLUSION: Evolution needed & inevitable; RDA can help drive it DFIG, Brokering, PID Kernel, Collections, DTR, . Corporation for National Research Initiatives

  21. 21 Gaps/Opportunities Further progress on Machine-Actionable Registries DFT for vocabulary - needs population and use Have DTR for data types - needs testing and iteration R3Data for Repositories - need a machine-actionable equivalent Metadata Catalog - machine actionable catalog is a pending RDA WG Not sure if anyone is working on Authorization and License registries Governance and Operating Procedures Need for this will become critical as soon as test beds and functional ecosystems are available PIDs Linked Open Data community needs Recommendations for workflow vs publication

Related


More Related Content