Inference Methods: Bootstrapping for Numerical Data Analysis

unit 4 inference for numericaldata 2 bootstrapping n.w
1 / 71
Embed
Share

Explore the concept of bootstrapping as a solution for making inferences on medians when traditional methods are not applicable. Understand how bootstrapping involves sampling with replacement and using bootstrap percentiles. Learn about central limit theorem conditions and hypothesis testing for population parameters.

  • Inference
  • Bootstrapping
  • Numerical Data
  • CLT
  • Hypothesis Testing

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Unit 4: Inference for numericaldata 2. Bootstrapping Sta 101 Fall 2019 Duke University, Department of Statistical Science Dr. Ellison Slides posted at https://www2.stat.duke.edu/courses/Fall19/sta101.001/

  2. Outline 1. Housekeeping 2. Main ideas Problem: We can t make CLT confidence intervals or hypothesis tests for MEDIANS. Solution: Use Bootstrapping (a simulation method). 1. Bootstrapping = sampling with replacement from the observed sample 2. Bootstrap percentile intervals: middle XX% of the bootstrap distribution 3. Bootstrap SE intervals: point estimate ME 4. Bootstrap testing for a single numerical variable requires shifting the bootstrap distribution to be centered at the null value

  3. Coming up Group Evaluation 1 is due tonight 11:55pm. Lab Assignment 6 is due Thursday just before your lab section time. Read over Project Stage 1 and Stage 2 Statements before Thursday s lab. (see Sakai Resources) Project Stage 1 is due Thursday 10/24 1

  4. Outline 1. Housekeeping 2. Main ideas Problem: We can t make CLT confidence intervals or hypothesis tests for MEDIANS. Solution: Use Bootstrapping (a simulation method). 1. Bootstrapping = sampling with replacement from the observed sample 2. Bootstrap percentile intervals: middle XX% of the bootstrap distribution 3. Bootstrap SE intervals: point estimate ME 4. Bootstrap testing for a single numerical variable requires shifting the bootstrap distribution to be centered at the null value

  5. Recap of Inference Methods that use CLT When When certain certain conditions conditions are met are met ? ?~?(???? = ?,???????? ???./????? = ?) and we can make CLT confidence intervals and p-values for When other When other certain conditions certain conditions are met are met ????? ?????) ?????~?(???? = ?????,???????? ???./????? = and we can make CLT confidence intervals and p-values for ????? When other When other certain conditions certain conditions are met are met 2 2 ?1 ?1 +?2 ?1 ?2~?(???? = ?1 ?2,???????? ???./????? = ?2) and we can make CLT confidence intervals and p-values for 1- 2

  6. Central Limit Theorem Confidence Interval for Population Parameter (????? ????????) ????.????? ?? When When CLT conditions conditions are met. are met. CLT

  7. Central Limit Theorem Hypothesis Testing for Population Parameter ????? ???????? ???? ????? ?? ??:???.????? = ???? ????? ??:???.?????( ?? > ?? <)???? ????? ???? ???? = When When CLT conditions conditions are met. are met. CLT

  8. What about CLT Inference for a Population Median? Can we do it? Or in other words, is this true? When When other other certain conditions certain conditions are met are met ?????? ?????? ~?(???? = Median,???????? ???./????? = _______) and we can make CLT confidence intervals and p-values for Median

  9. What about CLT Inference for a Population Median? Can we do it? Or in other words, is this true? When When other other certain conditions certain conditions are met are met ?????? ?????? ~?(???? = Median,???????? ???./????? = _______) and we can make CLT confidence intervals and p-values for Median No! We don t have a Central Limit Theorem for a Median that tells us what the sampling distribution of sample medians looks like. But we can simulate a sampling distribution of sample medians!

  10. Bootstrapping An alternative approach to constructing confidence intervals and p-values is bootstrapping. This term comes from the phrase pulling oneself up by one s bootstraps , which is a metaphor for accomplishing an impossible task without any outside help. In this case the impossible task is estimating a population parameter, and we ll accomplish it using data from only the given sample.

  11. Outline What are some reasons to use bootstrapping methods to create a confidence interval or conduct a hypothesis test instead of CLT-based methods? a) The population parameter of interest is the b) The population parameter of interest is the . , but (ie. CLT conditions not met).

  12. Outline What are some reasons to use bootstrapping methods to create a confidence interval or conduct a hypothesis test instead of CLT-based methods? a) The population parameter of interest is the median. b) The population parameter of interest is the , but (ie. CLT conditions not met).

  13. Outline What are some reasons to use bootstrapping methods to create a confidence interval or conduct a hypothesis test instead of CLT-based methods? a) The population parameter of interest is the median. b) The population parameter of interest is the mean, but n < 30 AND The population distribution is not nearly normal (ie. CLT conditions not met). In general you can create a confidence interval or conduct a hypothesis test on population parameters like ?,?????,? (others exist) under less strict conditions than the CLT.

  14. Outline What we can do after today

  15. Outline What are some reasons to use bootstrapping methods to create a confidence interval or conduct a hypothesis test instead of CLT-based methods? a) The population parameter of interest is the median. b) The population parameter of interest is the mean, but n < 30 AND The population distribution is not nearly normal (ie. CLT conditions not met). In general you can create a confidence interval or conduct a hypothesis test on population parameters like ?,?????,? (others exist) under less strict conditions than the CLT. But are there some CLT conditions that should still be met in order to calculate a p-value or confidence interval using bootstrapping?

  16. Outline From videos Some CLT conditions still do apply however: Sample should be random.

  17. Outline How do we make a confidence interval using bootstrapping? Step 1: Create a bootstrap distribution.

  18. Bootstrapping Step 1: Create the bootstrap distribution For comparison

  19. Bootstrapping Step 1: Create the bootstrap distribution For comparison

  20. Bootstrapping Step 1: Create the bootstrap distribution For comparison

  21. Bootstrapping Step 1: Create the bootstrap distribution For comparison

  22. Bootstrapping Step 1: Create the bootstrap distribution For comparison

  23. Bootstrapping Step 1: Create the bootstrap distribution aka:bootstrap sample aka:bootstrap statistic For comparison

  24. Outline How do we make a confidence interval using bootstrapping? Step 1: Create a bootstrap distribution. Step 2: Use the bootstrap distribution to create your confidence interval.

  25. Bootstrapping Step 2: Use the bootstrap distribution to create a confidence interval. Percentile Method XX% bootstrap confidence interval = the cutoff values for the middle XX% of the bootstrap distribution

  26. Bootstrapping Step 2: Use the bootstrap distribution to create a confidence interval. Percentile Method XX% bootstrap confidence interval = the cutoff values for the middle XX% of the bootstrap distribution Standard Error Method XX% bootstrap confidence interval = point estimate ?? 1 SEboot

  27. Bootstrapping Step 2: Use the bootstrap distribution to create a confidence interval. Percentile Method XX% bootstrap confidence interval = the cutoff values for the middle XX% of the bootstrap distribution Standard Error Method XX% bootstrap confidence interval = point estimate ?? 1 SEboot Standard deviation of all the bootstrap statistics in the bootstrap distribution When the Population Parameter is: ??????,?,?????,? degrees of freedom = n-1 (n = size of the original sample = each bootstrap sample) area under t-distribution curve =(1-XX/100)% TWO TAILS.

  28. Outline Let s make a bootstrap confidence interval for the population median audience score! Step 1: Create a bootstrap distribution.

  29. Rotten horrors is a movie aggregator, where the audience is also able to review and score the movies. We want to estimate the medianaudience score of horror movies on RottenTomatoes.com. We start with a random sample of 20 horror movies.

  30. Data title audience_score Patrick Demon Seed Tormented Under the Bed Phantasm IV: Oblivion Fright Night Part 2 House of 1000 Corpses Creepshow 2 The Forsaken All the Boys Love Mandy Lane Lives: Friday the 13th Part VI Vampire's Kiss The Witches of Eastwick Yellowbrickroad Dying Breed 1 2 3 4 5 6 7 8 9 52 43 34 12 41 42 65 46 44 34 57 48 60 28 27 73 56 23 29 65 10 11 Jason 12 13 14 15 16 17 18 19 20 Carrie Whoever Slew Auntie Roo? The Mangler Primal The Twilight Saga: New Moon

  31. Bootstrap sample1 (1) Take a bootstrap sample:

  32. Bootstrap sample1 (1) Take a bootstrap sample: Notice that we have some repeated movies in our bootstrap sample of size n=20. This will be common because we are sampling from a sample (also of size n=20) with replacement. title audience_score Vampire's Kiss Phantasm IV: Oblivion House of 1000 Corpses Dying Breed Whoever Slew Auntie Roo? The Forsaken The Twilight Saga: New Moon The Twilight Saga: New Moon Whoever Slew Auntie Roo? The Twilight Saga: New Moon The Mangler Dying Breed Creepshow 2 House of 1000 Corpses Whoever Slew Auntie Roo? Tormented 17 Jason Lives: Friday the 13th Part VI 18 Vampire's Kiss 19 20 The Witches of Eastwick 1 2 3 4 5 6 7 8 9 48 41 65 27 56 44 65 65 56 65 23 27 46 65 56 34 57 48 29 60 10 11 12 13 14 15 16 Primal (2) Calculate the median of the bootstrap sample:

  33. Bootstrap sample1 (1) Take a bootstrap sample: title audience_score Vampire's Kiss Phantasm IV: Oblivion House of 1000 Corpses Dying Breed Whoever Slew Auntie Roo? The Forsaken The Twilight Saga: New Moon The Twilight Saga: New Moon Whoever Slew Auntie Roo? The Twilight Saga: New Moon The Mangler Dying Breed Creepshow 2 House of 1000 Corpses Whoever Slew Auntie Roo? Tormented 17 Jason Lives: Friday the 13th Part VI 18 Vampire's Kiss 19 20 The Witches of Eastwick 1 2 3 4 5 6 7 8 9 48 41 65 27 56 44 65 65 56 65 23 27 46 65 56 34 57 48 29 60 10 11 12 13 14 15 16 Primal (2) Calculate the median of the bootstrap sample: 23, 27, 27, 29, 34, 41, 44, 46, 48, 48, 56, 56, 56, 57, 60, 65, 65, 65, 65, 65 median = (48 + 56) / 2 = 52

  34. Bootstrap sample1 (1) Take a bootstrap sample: title audience_score Vampire's Kiss Phantasm IV: Oblivion House of 1000 Corpses Dying Breed Whoever Slew Auntie Roo? The Forsaken The Twilight Saga: New Moon The Twilight Saga: New Moon Whoever Slew Auntie Roo? The Twilight Saga: New Moon The Mangler Dying Breed Creepshow 2 House of 1000 Corpses Whoever Slew Auntie Roo? Tormented 17 Jason Lives: Friday the 13th Part VI 18 Vampire's Kiss 19 20 The Witches of Eastwick 1 2 3 4 5 6 7 8 9 48 41 65 27 56 44 65 65 56 65 23 27 46 65 56 34 57 48 29 60 10 11 12 13 14 15 16 Primal (2) Calculate the median of the bootstrap sample: 23, 27, 27, 29, 34, 41, 44, 46, 48, 48, 56, 56, 56, 57, 60, 65, 65, 65, 65, 65 median = (48 + 56) / 2 = 52 (3) Plot this value

  35. Bootstrap sample1 (1) Take a bootstrap sample: title audience_score Vampire's Kiss Phantasm IV: Oblivion House of 1000 Corpses Dying Breed Whoever Slew Auntie Roo? The Forsaken The Twilight Saga: New Moon The Twilight Saga: New Moon Whoever Slew Auntie Roo? The Twilight Saga: New Moon The Mangler Dying Breed Creepshow 2 House of 1000 Corpses Whoever Slew Auntie Roo? Tormented 17 Jason Lives: Friday the 13th Part VI 18 Vampire's Kiss 19 20 The Witches of Eastwick 1 2 3 4 5 6 7 8 9 48 41 65 27 56 44 65 65 56 65 23 27 46 65 56 34 57 48 29 60 10 11 12 13 14 15 16 Primal (2) Calculate the median of the bootstrap sample: 23, 27, 27, 29, 34, 41, 44, 46, 48, 48, 56, 56, 56, 57, 60, 65, 65, 65, 65, 65 median = (48 + 56) / 2 = 52 (3) Plot this value A bootstrap statistic (median)

  36. Bootstrap sample2 (1) Take another bootstrap sample:

  37. Bootstrap sample2 (1) Take another bootstrap sample: title audience_score 1 2 3 4 5 6 7 Jason Lives: Friday the 13th Part VI 8 9 10 All the Boys Love Mandy Lane 11 The Twilight Saga: New Moon 12 All the Boys Love Mandy Lane 13 14 15 16 17 Phantasm IV: Oblivion 18 19 House of 1000 Corpses 20 The Twilight Saga: New Moon Fright Night Part 2 42 73 44 23 29 52 57 23 48 34 65 34 28 48 34 23 41 52 65 65 Carrie The Forsaken The Mangler Primal Patrick The Mangler Vampire's Kiss Yellowbrickroad Vampire's Kiss Tormented The Mangler Patrick

  38. Bootstrap sample2 (1) Take another bootstrap sample: title audience_score 1 2 3 4 5 6 7 Jason Lives: Friday the 13th Part VI 8 9 10 All the Boys Love Mandy Lane 11 The Twilight Saga: New Moon 12 All the Boys Love Mandy Lane 13 14 15 16 17 Phantasm IV: Oblivion 18 19 House of 1000 Corpses 20 The Twilight Saga: New Moon Fright Night Part 2 42 73 44 23 29 52 57 23 48 34 65 34 28 48 34 23 41 52 65 65 Carrie The Forsaken The Mangler Primal Patrick The Mangler Vampire's Kiss Yellowbrickroad Vampire's Kiss Tormented The Mangler Patrick (2) Calculate the median of the bootstrap sample:

  39. Bootstrap sample2 (1) Take another bootstrap sample: title audience_score 1 2 3 4 5 6 7 Jason Lives: Friday the 13th Part VI 8 9 10 All the Boys Love Mandy Lane 11 The Twilight Saga: New Moon 12 All the Boys Love Mandy Lane 13 14 15 16 17 Phantasm IV: Oblivion 18 19 House of 1000 Corpses 20 The Twilight Saga: New Moon Fright Night Part 2 42 73 44 23 29 52 57 23 48 34 65 34 28 48 34 23 41 52 65 65 Carrie The Forsaken The Mangler Primal Patrick The Mangler Vampire's Kiss Yellowbrickroad Vampire's Kiss Tormented The Mangler Patrick (2) Calculate the median of the bootstrap sample: 23, 23, 23, 28, 29, 34, 34, 34, 41, 42, 44, 48, 48, 52, 52, 57, 65, 65, 65, 73 median = (42 + 44) / 2 = 43

  40. Bootstrap sample2 (1) Take another bootstrap sample: title audience_score 1 2 3 4 5 6 7 Jason Lives: Friday the 13th Part VI 8 9 10 All the Boys Love Mandy Lane 11 The Twilight Saga: New Moon 12 All the Boys Love Mandy Lane 13 14 15 16 17 Phantasm IV: Oblivion 18 19 House of 1000 Corpses 20 The Twilight Saga: New Moon Fright Night Part 2 42 73 44 23 29 52 57 23 48 34 65 34 28 48 34 23 41 52 65 65 Carrie The Forsaken The Mangler Primal Patrick The Mangler Vampire's Kiss Yellowbrickroad Vampire's Kiss Tormented The Mangler Patrick (2) Calculate the median of the bootstrap sample: 23, 23, 23, 28, 29, 34, 34, 34, 41, 42, 44, 48, 48, 52, 52, 57, 65, 65, 65, 73 median = (42 + 44) / 2 = 43 (3) Plot this value Another bootstrap statistic (median)

  41. Bootstrap sample3 (1) Take another bootstrap sample:

  42. Bootstrap sample3 (1) Take another bootstrap sample: title audience_score Tormented The Witches of Eastwick The Witches of Eastwick The Witches of Eastwick The Mangler The Witches of Eastwick Patrick Phantasm IV: Oblivion Yellowbrickroad 10 Jason Lives: Friday the 13th Part VI 11 Yellowbrickroad 12 Jason Lives: Friday the 13th Part VI 13 Fright Night Part 2 14 15 Fright Night Part 2 16 Whoever Slew Auntie Roo? 17 Fright Night Part 2 18 Fright Night Part 2 19 Under the Bed 20 Phantasm IV: Oblivion 1 2 3 4 5 6 7 8 9 34 60 60 60 23 60 52 41 28 57 28 57 42 29 42 56 42 42 12 41 Primal

  43. Bootstrap sample3 (1) Take another bootstrap sample: title audience_score Tormented The Witches of Eastwick The Witches of Eastwick The Witches of Eastwick The Mangler The Witches of Eastwick Patrick Phantasm IV: Oblivion Yellowbrickroad 10 Jason Lives: Friday the 13th Part VI 11 Yellowbrickroad 12 Jason Lives: Friday the 13th Part VI 13 Fright Night Part 2 14 15 Fright Night Part 2 16 Whoever Slew Auntie Roo? 17 Fright Night Part 2 18 Fright Night Part 2 19 Under the Bed 20 Phantasm IV: Oblivion 1 2 3 4 5 6 7 8 9 34 60 60 60 23 60 52 41 28 57 28 57 42 29 42 56 42 42 12 41 Primal (2) Calculate the median of the bootstrap sample:

  44. Bootstrap sample3 (1) Take another bootstrap sample: title audience_score Tormented The Witches of Eastwick The Witches of Eastwick The Witches of Eastwick The Mangler The Witches of Eastwick Patrick Phantasm IV: Oblivion Yellowbrickroad 10 Jason Lives: Friday the 13th Part VI 11 Yellowbrickroad 12 Jason Lives: Friday the 13th Part VI 13 Fright Night Part 2 14 15 Fright Night Part 2 16 Whoever Slew Auntie Roo? 17 Fright Night Part 2 18 Fright Night Part 2 19 Under the Bed 20 Phantasm IV: Oblivion 1 2 3 4 5 6 7 8 9 34 60 60 60 23 60 52 41 28 57 28 57 42 29 42 56 42 42 12 41 Primal (2) Calculate the median of the bootstrap sample: 12, 23, 28, 28, 29, 34, 41, 41, 42, 42, 42, 42, 52, 56, 57, 57, 60, 60, 60, 60 median = (42 + 42) / 2 = 42

  45. Bootstrap sample3 (1) Take another bootstrap sample: title audience_score Tormented The Witches of Eastwick The Witches of Eastwick The Witches of Eastwick The Mangler The Witches of Eastwick Patrick Phantasm IV: Oblivion Yellowbrickroad 10 Jason Lives: Friday the 13th Part VI 11 Yellowbrickroad 12 Jason Lives: Friday the 13th Part VI 13 Fright Night Part 2 14 15 Fright Night Part 2 16 Whoever Slew Auntie Roo? 17 Fright Night Part 2 18 Fright Night Part 2 19 Under the Bed 20 Phantasm IV: Oblivion 1 2 3 4 5 6 7 8 9 34 60 60 60 23 60 52 41 28 57 28 57 42 29 42 56 42 42 12 41 Primal (2) Calculate the median of the bootstrap sample: 12, 23, 28, 28, 29, 34, 41, 41, 42, 42, 42, 42, 52, 56, 57, 57, 60, 60, 60, 60 median = (42 + 42) / 2 = 42 (3) Plot this value and another bootstrap statistic (median)

  46. Many more bootstrap samples ... repeat Bootstrap Distribution

  47. Clicker question The dot plot below is the bootstrap distribution of medians constructed using 100 simulations. What does each dot on the dot plot represent? Bootstrap Distribution (a) Score of a horror movie in the original sample (b) Score of a horror movie in the population (c) Median from one bootstrap sample from the original sample (d) Median from one sample from the population

  48. Clicker question The dot plot below is the bootstrap distribution of medians constructed using 100 simulations. What does each dot on the dot plot represent? Bootstrap Distribution (a) Score of a horror movie in the original sample (sample dist) (b) Score of a horror movie in the population (population dist) (c) Median from one bootstrap sample from the original sample (boostrap dist) (d) Median from one sample from the population (sampling dist)

  49. Outline 1. Housekeeping 2. Main ideas Problem: We can t make CLT confidence intervals or hypothesis tests for MEDIANS. Solution: Use Bootstrapping (a simulation method). 1. Bootstrapping = sampling with replacement from the observed sample 2. Bootstrap percentile intervals: middle XX% of the bootstrap distribution 3. Bootstrap SE intervals: point estimate ME 4. Bootstrap testing for a single numerical variable requires shifting the bootstrap distribution to be centered at the null value

Related


More Related Content