
Importance of Documentation for Effective Data Management
Learn about the critical role documentation plays in data management, from capturing essential details to providing context, with examples of different documentation types such as README files and data dictionaries.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Documentation: Writing Down the Science
Documentation is the process of writing down all aspects of a dataset that you or someone in your field would need to know to make use of the data. Obvious details you will forget (units of measurement or software used) Documentation Less obvious details (versions of code packages used) Without this, data loses context. Without context, we cannot understand data.
Documentation Types README Data Dictionary Codebook Lab Notebook (electronic or paper) Standard Operating Procedures (SOP) Field Notes Protocols Andreia Fernandes, Vahid Hosseini, Viola Vogel, Robert Lovchik 2022. Engineering solutions for biological studies of flow-exposed endothelial cells on orbital shakers. protocols.io. https://dx.doi.org/10.17504/protocols.io.b2bwqape
README Files README files are simple text or markdown documents that exist along with a research project folder and explain: the file organization their relationships to one another provide general information about the project. May need to have multiple README s at different levels of the file directory README template for sharing along with research data README template for internal team documentation
README Example EEOC Litigation Dataset (2025). Washington University in St. Louis. https://doi.org/10.7936/6rxs-108252 Dataset on Equal Employment Opportunity Commission (EEOC) federal court litigation from 1997-2006 Take a look at the README file: readme_doi1079366rxs108252.txt Great example of the type of information that should be included in a README: Title, Author Information, Date Coverage, Context for Data, Citation for Related Publications, Methodology, and Data & File Overview.
Data Dictionary Files Data dictionaries explain what the variables names and values in your spreadsheet or dataset mean. Add context to the data, making it understandable and valuable. Data Dictionary Blank Template
Data Dictionary Example Morrow Plots Data Curation Working Group (2024): Morrow Plots Treatment and Yield Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7865141_V2 The longest-running continuous experimental plots in the Americas. In continuous operation since 1876, the plots were established to explore the impact of crop rotation and soil treatment on corn crop yields. Download the file morrow-plots_v02_codebook.pdf to see the data dictionary within the code book on pages 5-6.
Codebook Codebooks explain the contents, structure, and layout of a data collection. Similar to data dictionaries, but used for survey data. Allows the reader to follow the structured format and skip logic of the survey. MEPS HC-239F: 2022 Outpatient Visit Files. Codebook. https://meps.ahrq.gov/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-239F
Codebook Example Capital Punishment in the United States, 1973-2008 (ICPSR 27982) https://www.icpsr.umich.edu/web/NACJD/studies/27982/datadocumentation# Survey is part of the National Prisoner Statistics Program (NPS-8) that collects capital punishment information annually Distributed by the National Archive of Criminal Justice Data (NACJD) hosted and run by ICPSR For a relatively simple survey, you can see how codebooks are incredibly important in representing the amount of information that needs to be documented for a survey.
Lets Talk! Lauren Phegley Research Data Engineer Penn Libraries lphegley@upenn.edu Request a Consultation Data Management Resource Guide - https://guides.library.upenn.edu/datamgmt