Managing and Manipulating Data Using R: Tidy Data Overview
Creating analysis datasets requires reshaping data to achieve tidiness, ensuring consistency and uniformity. Learn about the key advantages and rules of tidy data, along with examples and techniques for tidying wide and long data formats, handling missing values, and more in the context of data manipulation using R.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
EDUC 263: Managing and Manipulating Data Using R EDUC 263: Managing and Manipulating Data Using R Lecture 8: Tidy data Lecture 8: Tidy data NOV 15
1. Overview 1. Overview Creating analysis datasets often require changing the organizational structure of data Reshape your data tidying Rows Columns Untidy Tidy Why tidy? 2 main advantages one consistent way of storing data; underlying uniformity. R s vectorised nature; tidyverse (dplyr, ggplot2,etc)
2. Data structure vs. data concepts 2. Data structure vs. data concepts Data structure physical layout rows columns cell Data concepts what should be observations variables value = if tidy (Wickham)
3 Tidy vs. 3 Tidy vs. Untidy data Untidy data Rules of tidy data Each variable must have its own column Each observation must have its own row Each value must have its own cell
3 Tidy vs. 3 Tidy vs. Untidy data Example Untidy data Example table1 is tidy. It s the only representation where each column is a variable. Question: How to reshape untidy data to tidy data? More untidy examples: Digest of Education Statistics https://nces.ed.gov/programs/digest/current_tables.asp
4 Tidying data: Wide 4 Tidying data: Wide Long Long Function {tidyr} Description Main Arguments wide long (gathering) # rows ; # columns Long wide (spreading) # rows ; # columns pivot_longer Names_to Values_to pivot_wider Names_from Values_from Long pivot_longer() Wide pivot_wider() table1 %>% pivot_longer(cols = c('cases','population'), names_to = 'type', values_to = 'count') table2 %>% pivot_wider(names_from = type, values_from = count)
5 Missing values 5 Missing values Two types of missing values: Explicit missing values: variable has the value NA for a particular row Implicit missing values: the row is simply not present in the data Complete() {tidyr} turns implicit missing values into explicit missing values 1 explicit missing value 1 implicit missing value