
Data Integration Best Practices and NCEMS Vision
Learn about the best practices for data integration, presented by Tyson Swetnam and Justin Petucci. Explore the NCEMS vision focused on empowering working groups with seamless access to high-quality data for cutting-edge research and interdisciplinary collaboration.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Data Integration Best Practices Presented by: Tyson Swetnam & Justin Petucci
Strategic Implementation Plan Feedback Session Outline DURATION (MIN) SESSION TIME ACTIVITY DESCRIPTION Session leads provide background and summary of NCEMS goals in topic area 00:00 00:12 Introduction 12 Group Discussion 00:12 00:24 Groups of 6 form and consider 2-3 questions 12 Groups of 6 report back to entire session; one group member uses Qualtrics form to record responses 00:24 00:37 Report Back 13
What is Data Integration? Data Integration: The process of combining data from various sources, formats, and domains to provide a unified, cohesive view that enables deeper analysis and new insights.
NCEMS Data Integration Vision: Empower NCEMS working groups by providing seamless access to harmonized, high-quality, and reproducible data in the MCB field, facilitating cutting-edge research, fostering interdisciplinary collaboration, and educating the next generation of scientists. Data Integration Data-Quality Assurance FAIR Principles Definitions: Working Groups (WGs) - The primary research drivers at NCEMS . WGs are diverse teams of scientists that integrate ideas & theories across fields with publicly available data to get deeper, broader insights Reproducibility & Transparency Metadata Standards Adherence to established standards Open Science Staff scientists - NCEMS employees who support WGs with expert assistance in data wrangling, analyses, statistics, and data science methodologies
Interfaces Between NCEMS Staff and Working Groups Staff Scientists Collaborate with the WGs to create Data Management Plans (DMPs) Carry out required data integration efforts Coordinate the computational needs on CyVerse as well as ACCESS-CI for HPC, HTC, and Cloud resources Working Groups (WGs) Define datasets and underlying synthesis questions and inform staff scientist(s) of applicable existing standards Provide WGs with detailed documentation, training, and support in use of integrated data products Maintain data quality standards and ensuring intermediate data products are FAIR Conduct regular reviews of processes in conjunction with WGs to ensure they are fit to purpose
Data Integration Discussion Questions (1) How should NCEMS staff scientists interface with working groups before, during, and after data integration processes? (Identify key information and resources that should be exchanged to ensure success) (1) What general quality assurance steps and checks should be standardized across all NCEMS data products? (1) (Optional) What technical challenges do you foresee that could hinder the effective integration of data in the molecular and cellular sciences?