Tidy Data Global Health 811 April 9th 2018

Tidy Data Global Health 811 April 9th 2018
Slide Note
Embed
Share

This dataset, titled "Tidy.DataGLOBAL.HEALTH.811APRIL.9TH,.2018," contains information related to global health trends on April 9th, 2018. The data is organized in a tidy format, making it easier for analysis and interpretation. Researchers and analysts can delve into this dataset to uncover insights and patterns in the realm of global health on the specified date. From tracking disease outbreaks to exploring healthcare access, this dataset offers a comprehensive view of various health indicators worldwide.

  • Global Health
  • Data Analysis
  • Tidy Format
  • Health Trends
  • April 9th

Uploaded on Feb 15, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Tidy Data GLOBAL HEALTH 811 APRIL 9TH, 2018

  2. Happy families are all alike; every unhappy family is unhappy in its own way Leo Tolstoy

  3. Concept of Tidy Data Data is often messy! We need a precise way to talk about Tidy data Goal: Represent one fact in one place If one fact in multiple places, chance to record different values!

  4. Data Semantics The dataset contains 18 values representing three variables and six observations. Information remains the same in the tidy dataset, but values, variables, and observations are more clear.

  5. Common problems with messy data Column headers are values, not variable names. Multiple variables are stored in one column. Variables are stored in both rows and columns. Multiple types of observational units are stored in the same table. A single observational unit is stored in multiple tables

  6. Columns are values, not variables Cases in which you may come across data of this nature: Tabular data designed for presentation Sometimes used to record regularly spaced observations over time

  7. Example 1: Pew Survey Data What are the variables & observations in this dataset? What would the tidy version look like?

  8. The Tidy Version The first ten rows of the tidied survey dataset on income and religion. This version is tidy because each column represents a variable and each row represents an observation. In this case a demographic unit corresponding to a combination of religion and income

  9. Example 2: Billboard Data

  10. The Tidy Version

  11. Your reward. Thank me later! https://youtu.be/F7lfNXddV6A

  12. Melting Data

  13. Multiple variables stored in one column After melting (reshaping) data, the column variable often becomes a combination of multiple underlying variable names.

  14. Example: WHO TB Dataset

  15. After melting, the data still need tidying

  16. Variables stored in both rows & columns The most complicated form of messy data occurs when variables are stored in both rows and columns

  17. Example: Climate Database - Data are drawn from the Global Historical Climatology Network - One weather station (MX17004) in Mexico - Five month period in 2010

  18. Example: Climate Database

  19. Example: Climate Database In the tidy dataset, each row represents the meteorological measurements for a single day. There are two measured variables, minimum and maximum temperate; all other variables are fixed.

  20. For more on tidy data see the link on the GH 811 site

  21. R Tidyr Interactive Demo - gather() - separate() - spread()

  22. Installing packages Install the whole tidyverse (warning: this takes a while): install.packages( tidyverse ) OR Just install tidyr: install.packages( tidyr )

  23. Upcoming deadlines Sunday, November 4th at 5pm Tuesday, November 6th at 2pm Data dictionary Journal 2 Table shells Methods section Team charter review

Related


More Related Content