BUSQOM 1080 Data Analysis for Business - Lecture 6 Summary

BUSQOM 1080 Data Analysis for Business - Lecture 6 Summary
Slide Note
Embed
Share

Dive into Lecture 6 of the BUSQOM 1080 Data Analysis for Business course focusing on topics like handling missing values, using factors to understand categorical data, utilizing the aggregate() function for data aggregation, and plotting numeric data in R. Explore practical examples and datasets to enhance your data analysis skills in a business context.

  • Data Analysis
  • Business
  • Lecture Summary
  • Data Visualization
  • R Programming

Uploaded on Feb 25, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. BUSQOM 1080: Data Analysis for Business Fall 2020 Lecture 6 (9/8) Professor: Michael Hamilton

  2. Lecture Summary Outline for today: 1. Topic: is.na, Factors[5 Mins] 2. Topic: aggregate() [5 Mins] 3. Topic: Side by side plots [5 Mins] 4. Topic: Layered plots [5 Mins] 5. Topic: Legends [5 Mins] 6. Misc. Course Updates Lecture 6 - More on Plotting

  3. Topic: is.na() and factors First load in cba_admissions_1999.txt dataset containing admissions profiles for cba students from 1999. It can be found in the file folder for Assignment 4 or on the Assignment 4 canvas page. Try running the following lines NA is a special type in R. It represents missing values and can be very annoying! Will mess up your functions if your data has them. Lecture 6 - More on Plotting

  4. Topic: is.na() and factors Factors are how R understands categorical data. It is distinct from characters or numerics. We use the command as.factor() to create factored data. Example: plot(data = cba, HS_rank ~ Race) Note: requires factored Lecture 6 - More on Plotting

  5. Data for this lecture Download the file: warming_cities_1990.csv located in the Lecture 6 Folder This is the average monthly temperature of six cities: "Bangalore , "Cape Town , "London", "Los Angeles , "New York , and "Tokyo from 1900-2013 To see columns run: Lecture 6 - More on Plotting

  6. Topic: The function aggregate() Often, we would like to aggregate data in specific ways. In R this is easily accomplished via the aggregate() function. The syntax is: Aggregate(data_to_aggregate, by = list(name = labels), FUN = func_to_apply) Example: Suppose we want to the average temperature in NYC for each year (in the data each row represents a month!): Data being collapsed by year The mean() function is run on each group. Vector of group labels to collapse by Lecture 6 - More on Plotting

  7. Topic: Plotting numeric data We plot a vector of y data by a vector of x data using the syntax plot(y ~ x) or by plot(x,y) Example: Lecture 6 - More on Plotting

  8. Topic: Plotting numeric data side by side We can make multiple plots in a grid using the par() function. Syntax: par(mfrow = c(num_rows, num_cols)) The par function tells R to lay out a grid for plots, then populates in one at a time Example: Lecture 6 - More on Plotting

  9. Topic: Plotting numeric data on one graph We can make layer this plots using the par() function as well. Syntax: par(new = TRUE) between plot() calls! The par tells R to add on the next plot Example: Note the ylim parameter! This tells R what the y limits should be! Important since they ll be on the same plot! Note in the second command I omitted the call to main. No new title needed!

  10. Topic: Plotting numeric data on one graph We can also add a legend using the legend() function afterwards! Syntax: (After plots) legend(x_loc, y_loc, legend=vec_names, col = vec_cols, cex = size_scaler, pch = vec_markers) The par() tells R to add on the next plot Example: The legend is placed at x and y coordinates. Information is passed as vectors!

  11. For Next Time Assignment 4 is posted! It s about plotting due (9/18) Assignment 3 is due on Friday (9/11) Thanks! See you on Tuesday! Lecture 6 - More on Plotting

More Related Content