Advancing Research Data Services for Improved Data Commons
The challenge lies in coordinating the growth of research data services across universities to support the increasing demand for data-centric research. The Harvard Data Commons MVP aims to streamline research data management processes, enhancing data integrity, provenance, and reproducibility to meet sponsor requirements. By engaging researchers and following best practices, the project focuses on facilitating data sharing, archiving, and collaboration with peer institutions, ultimately contributing to a more efficient and effective research ecosystem.
Uploaded on Mar 01, 2025 | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Towards a Data Commons MVP Scott Yockel University Research Computing Officer Merc Crosas University Research Data Management Officer
The RDM Challenge A recent increase of research data and computing services: Along with increase in data-centric and data science research To support funders and journals requirements Uncoordinated growth resulting in services distributed across units and schools often disconnected Review of Research Data Services in U.S. Universities Ithaka S+R report (Radecki & Springer, 2020, https://doi.org/10.18665/sr.314397) Reviewed research data services from 120 U.S. Universities and Colleges A growing number of research data services distributed across various university units: Within Libraries and IT (main providers) Consulting (~65%) Training events (~35%) Backend (data engineering), metadata design) Frontend (web dev, data vis) Outside Libraries and IT Statistics Bioinformatics Geospatial Social Science Visualizations Clinical data Business
A Harvard Data Commons MVP will ... improve the researcher experience by automating the flow of research data from research computing environments to management, publication, and preservation environments. As a result, data integrity, provenance, and reproducibility of research data will be improved in order to meet sponsor requirements. Objectives Guiding Principles 1. Facilitate the workflow from managing active research data to data sharing and archiving, following best practices 2. Use existing (open source) systems when possible and build on top in an iterative fashion 3. Engage researchers to drive the use case and be user testers of the MVP tools built from this project 4. Collaborate with peer institutions and build tools and guidelines that can be reused and updated by others Key Performance Indicators The connector tools with well documented code in GitHub Standards for packaging multiple common workflow artifacts with rich metadata Publish active datasets, large (TB sized) datasets, and only metadata about data Preserve selected datasets from Dataverse through DRS Link automatically datasets deposited in Dataverse to publications in DASH 1. Integrate RC environments and repositories to facilitate publishing data and/or metadata throughout the research project lifecycle 2. Support advanced research workflows and provide packaging options that can be deposited in a repository to enable reproducibility and reuse 3. Integrate Harvard repositories: Dataverse with the DRS for long- term preservation Dataverse with DASH to connect datasets with open access publications