R Short Course Part 1 Session 2: Install, Use R Packages, Data Management, Conditional Execution, Loops, Functions

R Short Course Part 1 Session 2: Install, Use R Packages, Data Management, Conditional Execution, Loops, Functions
Slide Note
Embed
Share

In this session of the R Short Course, learn how to install and use R packages, read in and write to external files, manage data, execute conditionally with if statements, work with loops, and write your own functions. Explore basic info on R packages, installing and loading packages, reading external files, and accessing RMR datasets. Dive into practical examples and hands-on tasks to enhance your R programming skills.

  • R Short Course
  • Data Management
  • R Packages
  • Loops
  • Functions

Uploaded on Mar 10, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. R Short Course Part 1 Session 2 Daniel Zhao, PhD Sixia Chen, PhD Department of Biostatistics and Epidemiology College of Public Health, OUHSC 3/2/2020

  2. Outline 1. Install and use R packages 2. Read in and write to external files 3. Data management 4. Conditional execution using if statements 5. Loops: for loop, repeat loop, and while loop 6. Write your own R functions 2

  3. INSTALL AND USE R PACKAGES 3

  4. Basic Info on R Packages 1. R is made up of built-in and user-written packages 2. All R functions and datasets are stored in packages 3. Standard R packages come with R installation 4. Other R packages will need to be installed one time for each R version 5. Most packages need to be loaded for each R session 4

  5. Install and Load R Packages (1) 1. To list installed R packages on your computer > library () 2. To install a new R package, say, randomForest > install.packages("randomForest") Then select a CRAN mirror site, say USA (TX) (only need to do it once per R version per computer) 3. To load an R package, say, randomForest > library("randomForest") (do it one time for each R session) 5

  6. Install and Load R Packages (2) 1. To list functions in a package > help(package="randomForest") 2. To list all the data sets in a package > data(package="randomForest") 3. To view the data > data(imports85) > edit(imports85) 4. To get more description of the data > help(imports85) 6

  7. READ IN AND WRITE TO EXTERNAL FILES 7

  8. RMR Data Sets 1. Two RMR (Resting Metabolic Rate) data sets RMR.txt, RMR.csv Source: Owen et al. The American Journal of Clinical Nutrition. (1986) 44:1-19. 2. Data were collected on 44 healthy women Aged 18-65 years Weighing 43-143 kg 8 of the subjects were athletes (the first 8 rows) 3. Five variables include: weight, RMR, athlete (1 for athletes and 0 otherwise) age, height 8

  9. Read Text or .csv Files into R Read text or csv files with R built-in functions 1. Set working directory first > setwd("C:/Users/dzhao1/Desktop") 2. Use read.table () to read in a text file. The first row contains the variable names > RMR = read.table ("RMR.txt", header=T) 3. Use read.csv () to read in a .csv file. The first row contains the variable names > RMR2 = read.csv ("RMR.csv", header=T) 4. To view the data frames > RMR > edit (RMR) 9

  10. Write to External .csv Files Write to .csv files with R built-in functions 1. Use write.csv () to write to a .csv file. > write.csv (RMR2, "RMRout.csv", row.names=F) 10

  11. DATA MANAGEMENT 11

  12. Basic Operations on Data Frames 1. Dimensions (numbers of rows and columns) > dim(RMR) results: 44, 5 2. Variables names > names(RMR) results: "weight" "rmr" "athlete" "age" "height" 3. First or last few observations > head(RMR,5) Can you come up with another way? > tail(RMR,4) 4. View or edit a data frame > edit(RMR) 12

  13. Subset Variables (Columns) For example, keep the 2nd and 4th variables (cols) of a data frame RMR 1. Use column indices > RMR.1 = RMR[, c(2,4)] Any other suggestions? 2. Use subset () function and variables names > RMR.2 = subset (RMR, select=c(rmr, age)) Note: there are no quotes around the variable names. What if I want to drop rmr and age? 13

  14. Subset Observations (Rows) For example, subset all athletes (the first 8 rows) 1. Use row indices (integers) > RMR.3 = RMR[1:8,] 2. Use row indices (logical vector) > RMR.4 = RMR[RMR$athlete==1,] 3. More naturally, use subset function > RMR.5 = subset (RMR, athlete==1) 14

  15. More Examples on Subsetting 1. Subset rows with multiple conditions > subset (RMR, athlete==1 & age <30) 2. Subset rows and columns simultaneously > subset (RMR, athlete==1 & age <30, select=c(rmr, athlete, age)) 3. Notes on logical operators &, &&: > & can be used between two logical vectors whose lengths are not restricted > && can be used between two logical values (or, logical vectors whose lengths are 1) > The word and can not be used as a logical operator 15

  16. Merge Data Frames with Common Variables 1. For preparation, add a column > RMR=data.frame(ID=1:44, RMR) 2. Subset two data frames with common ID > RMR.8=subset(RMR,select=c(ID, athlete)) > RMR.9=subset(RMR,select=c(ID, age)) 3. Merge two data frames > RMR.10=merge(RMR.8, RMR.9, by="ID") 16

  17. CONDITIONAL EXECUTION 17

  18. if else Statement 1. Syntax: if (condition) {commands} else {commands} note: else {commands} is optional 2. eg: create an indicator (u) and square root (v) > x=2 > if (x>=0) { u=1; v=sqrt(x) } else {u=0; v=NA} 3. ifelse () function can be used if there are only one command in each of the curly braces > u = ifelse(x >= 0, sqrt(x), NA) 18

  19. LOOPS: FOR LOOP, REPEAT LOOP, AND WHILE LOOP 19

  20. for Loop 1. Syntax: for (name in expression){commands} 2. eg: compute element-wise squares > x=1:3 > y=rep(NA,3) > for (i in 1:3) {y[i]=x[i]^2} 3. How to achieve the above without using loops? 20

  21. repeat Loop 1. Syntax: repeat { commands; if (condition) break; commands} 2. eg: compute element-wise squares > x=1:3 > y=rep(NA,3) > i=1 > repeat { y[i] = x[i]^2 if (i==3) {break} else {i=i+1} } 21

  22. while Loop 1. Syntax: while (condition) {commands} 2. eg: compute element-wise squares > x=1:3 > y=rep(NA,3) > i=1 > while(i<=3){ y[i]=x[i]^2 i=i+1 } 22

  23. WRITE YOUR OWN FUNCTIONS 23

  24. Remarks on R Functions 1. R ~= Data Structures + Functions 2. R functions ~= SAS procs + SAS macros 3. When should we consider writing a function? If you need to do something in R multiple times, then write a function. Stop copying and pasting 4. Try find R built-in functions first. Then search for user-written R packages. Finally, write your own functions 5. A general sytax for defining an R function > name = function (arg1, arg2, ...) {commands} 24

  25. Two-Sample T-Test 1. We observe two independent samples ?1 and ?2 and the sample sizes are ?1, and ?2. 2. Let ?1and ?2be the sample means 3. Let ?1 and ?2 be the sample variances 4. Let ? be the pooled variance 5. The two-sample t statistic is ?1 ?2 ?(1/?1+ 1/?2)~??1+?2 2 ? = 6. P-value: 2Pr(??1+?2 2> ? ) 7. A 100 1 ? % CI is ?1 ?2 ??1+?2 2;1 ? ? (1/?1+ 1/?2) 2 25

  26. An R Function on Two-Sample T-Test two.sample.t = function(y1, y2, alpha=.05) { n1 = length(y1); n2 = length(y2) yb1 = mean(y1); yb2 = mean(y2) s1 = var(y1); s2 = var(y2) df = n1+n2-2 s = ((n1-1)*s1 + (n2-1)*s2)/df T = (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2)) p=2*pt(abs(T),df,lower.tail=F) LCI=(yb1-yb2)-qt(1-alpha/2,df)*sqrt(s*(1/n1+1/n2)) UCI=(yb1-yb2)+qt(1-alpha/2,df)*sqrt(s*(1/n1+1/n2)) CI=c(LCI,UCI) list (T=T, p=p, CI=CI, df=df, yb1=yb1, yb2=yb2, y1=y1, y2=y2) } # to apply the above function x1=RMR$rmr[1:8]; x2=RMR$rmr[9:44] z=two.sample.t(x1,x2) 26

More Related Content