
Advancing DDI-CDI for Enhanced Data Integration
Explore the role of DDI-CDI in EOSC and its potential applications. Understand the functional description and implications for data sharing. Delve into EOSC data-sharing requirements and the need for FAIR data practices to meet evolving research demands.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
The Role of DDI-CDI in EOSC: Possible Uses and Applications 11 June 2021 Arofan Gregory, Simon Hodson, Joachim Wackerow CODATA DDI Alliance
Background/Goals of the Meeting EOSC Co-Creation Project CODATA project lead, with support from DDI Alliance Aligns with the goals of both organizations: The CODATA Decadal Programme for Cross-Domain Research DDI interest in complex data integration and data sharing Aligns with the objectives of EOSC Discuss the findings of the report, based on a series of detailed meetings with interested parties (https://us02web.zoom.us/j/6334311413) This meeting is to identify concrete next steps In line with further developments in FAIR, EOSC, and globally In practical terms for standards developers and systems implementers Projects involving real-world implementations to explore real solutions
Outline DDI-CDI Functional Description EOSC Requirements Use Cases/Examples Granular Metadata for Cross-Domain Integration Dataverse Repository Metadata Capture European Social Survey Multi-Level Application ALPHA Network Process Metadata Capture DDI-CDI and the FAIR Ecosystem DDI-CDI and EOSC: The Interoperability Framework Recommendations
DDI-CDI Functional Description Developed to meet the needs of Social, Behavioural and Economic research projects using data from other domains but more widely applicable DDI Cross-Domain Integration (DDI-CDI) Describes individual datums and all associated foundational metadata (concepts, variables, classifications, coding, etc.) Describes the roles played by datums in various data structures (wide data, long data/event data, multi-dimensional/cube data, key-value/big data) Describes processing of data (implements PROV, SDTL, other standards/descriptions to describe processing chain) DDI-CDI is a model-driven standard, expressed in UML DDI-CDI is domain-independent Designed to complement domain standards and ontologies Can be implemented in different syntaxes (XML, RDF etc.)
EOSC Data-Sharing Requirements EOSC spans many different domain clusters, and will support the sharing of data between them Reflects the current needs of large-scale research Supports work on the grand challenges such as COVID-19 and climate change Presents a challenge in terms of scale Amount of data is growing exponentially Availability of FAIR data will lead to increased demand for FAIR data Data-hungry methods are becoming common (i.e., machine learning) Demands automation for the production and use of metadata Existing manual approaches are insufficient, even today Large-scale sharing of data will exacerbate the problem The cost of having un-FAIR data is huge (up to 80% of resources go to data prep)
SERL: Granular Metadata for Cross-Domain Sharing Smart Energy Research Laboratory (SERL) Several partners, UKDA is managing and processing the data Currently, and example of a pre-integrated data coming from different domains Temperature readings from Copernicus Meter readings from energy suppliers Survey responses from a customer sample We looked at how DDI-CDI could be used to support such an integration in an automated fashion (e.g., a run-time discovery and use scenario) DDI-CDI descriptions of inputs could provide the basis for automated creation of integration/linking information
Temperature Data (by Region grid cell ) Utility Meter Data (by Customer ID) Linking Data (grid cells mapped to customer IDs) Customer Questionnaire Data (by Customer ID)
Pattern: Granular Reuse of Data from Disparate Sources Key-Value Captured Data (clinical, social media, etc.) User selects a set of variables based on a shared value across several data sets (case ID, geo region, time, etc.) Tall Sensor Data Wide Survey Data Described by [On Use] DDI -CDI Described by DDI-CDI can provide both structural and processing descriptions Data could be in any form: Long, wide, key-value, etc. DDI -CDI
Dataverse Repository Metadata Capture Dataverse is a generic data repository platform Open source/community-driven Support for some common metadata standards (including DDI Codebook) Automation of metadata on ingest of statistical package data files has been explored This feature could be extended to DDI-CDI to support integration ready availability of data held in the repository Would require a minimum of manual effort Example of scalable approach to metadata capture
European Social Survey Multi-Level Application Developed and maintained by the Norwegian Data Archive (NSD) Pan-European survey with data at multiple levels of geography (NUTS levels) Integrated with contextual data from many different sources Data is revised and published on different schedules ESS Multilevel Application currently involves monolithic production runs External sources for context data have separate schedules Use of DDI-CDI for two purposes Assist integration of external contextual data Support more timely, focused revisions to specific data in the disseminated product
ALPHA Network Process Capture Several African HDSS sites collecting HIV data in the field Integrated, analysis-ready data made available through LSHTM in the UK DDI-CDI used to capture standard process metadata (for display in an integrated documentation system) Users can see what processes were applied to data at each stage Users can see what data were used in each process INSPIRE extends this to an African data-sharing hub Integrates clinical data (using the OMOP CDM) Being extended to cover COVID data (including some genome sequencing results on variants)
SITE A PENTAHO Site Center in a Box Site Source Data SITE B Integrated Analysis Data LSHTM Data Aggregation Site Center in a Box Site Source Data DDI-CDI Capture (Process and Structure)
ISSUES RAISED BY ALPHA/INSPIRE/PEACH PROTOTYPE Questionnaire CLINICAL PARADIGM Pool of OMOP Observations Analysis Data OMOP Cohort Definition Data Processing Raw Data Set DDI-CDI Process/ Structure Description POPULATION RESEARCH PARADIGM Clinical Systems
DDI-CDI and the FAIR Ecosystem FAIR is an evolving picture Lots of discussion and ideas Not sufficiently specified to drive implementation Some (possibly unrealistic) assumptions about technology implementations Requires a coordinated set of standards used in a recognized framework/architecture DDI-CDI fits into the picture in describing the FAIR Digital Object For describing the structures of data and connecting concepts to their roles, at a granular level For describing processing (e.g., data provenance, data re-use) For describing metadata resources (esp. to support data integration and reuse)
Domain A Data User (3) Retrieve Needed Metadata Resources FDP 2 FDP 3 FDP 1 (2) Query/Retrieve the FDO (1) Discover the FDP Domain B Metadata Resource FDP 4 Register of FDPs/Data Portals/Data Catalogues (by Domain) FDP 5 Metadata Resource FDP 6 Metadata Resource Provision of FIPs
Registry of Catalogues FAIR Data Point PID (By Domain) FIPs (By Domain) DATA FAIR Digital (Data) Object STRUCTURAL METADATA PROVENANCE/ PROCESS METADATA (META)METADATA RESOURCES SEMANTICS/ CLASSIFICATIONS
DDI-CDI And the EOSC Interoperability Framework/Integrated Metadata Catalogues The EOSC IF provides a high-level view of how different types of metadata relate within the overall EOSC ecosystem They recognize several levels of standards, including DDI-CDI (as well as other domain-level DDI specifications) The challenge is how to support actual implementation Requires agreement on many levels (i.e., syntaxes, specific profiles of standards, coordinated use of standards) Requires detailed specification of the entire architecture/system Including mechanisms for applying these to specific domains With a lingua franca for needed exchanges between domains
EOSC Interoperability Framework https://doi.org/10.2777/620649 EOSC Interoperability Framework (1)
EOSC Interoperability Framework https://doi.org/10.2777/620649 EOSC Interoperability Framework (2) DCAT DDI-CDI DDI domain standards How do these different types of metadata connect?
Recommendations 1. 2. 3. 4. Link DDI-CDI to other Metadata Standards within the EOSC Metadata Infrastructure 5. Establish Guidelines for Metadata Provision 6. Support Technology-Neutral Solutions 7. Align with International Metadata Initiatives 8. Align with Relevant Implementation Technologies and Platforms Enhance Support for Data Integration Automate Metadata Capture Develop Crosswalks to Domain Standards
Some Questions to Answer How does DDI-CDI fit in with other standards to promote data interoperability and reusability in EOSC? Are there specific use cases that can be pursued to explore and demonstrate this? What are the next steps? How does this work go forward and remain coordinated?