Hypothesis Testing for Difference in Cuckoo Bird Egg Sizes

stat 101 n.w
1 / 45
Embed
Share

Conducting hypothesis tests to determine if cuckoo birds found in nests of different species differ in size based on the lengths of cuckoo eggs sampled. Learn the process of testing for a difference in means across multiple categories with relevant statistical data.

  • Cuckoo Birds
  • Egg Sizes
  • Hypothesis Testing
  • Statistics
  • Data Analysis

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. STAT 101 Dr. Kari Lock Morgan ANOVA SECTION 8.1 Testing for a difference in means across multiple categories Statistics: Unlocking the Power of Data Lock5

  2. Review: Chi-Square Tests The 2 goodness-of-fit tests if one categorical variable differs from a null distribution The 2 test for association tests for an association between two categorical variables For both, you compute the expected counts in each cell (assuming H0) and the 2 statistic: ( 2 observed - expected expected ) 2 = Find the proportion above the 2 statistic in a randomization or 2-distribution (if all expected counts > 5) Statistics: Unlocking the Power of Data Lock5

  3. Multiple Categories So far, we ve learned how to do inference for a difference in means IF the categorical variable has only two categories Today, we ll learn how to do hypothesis tests for a difference in means across multiple categories Statistics: Unlocking the Power of Data Lock5

  4. Hypothesis Testing 1. State Hypotheses 2. Calculate a statistic, based on your sample data test statistic 3. Create a distribution of this statistic, as it would be observed if the null hypothesis were true 4. Measure how extreme your test statistic from (2) is, as compared to the distribution generated in (3) Statistics: Unlocking the Power of Data Lock5

  5. Cuckoo Birds Cuckoo birds lay their eggs in the nests of other birds When the cuckoo baby hatches, it kicks out all the original eggs/babies If the cuckoo is lucky, the mother will raise the cuckoo as if it were her own Do cuckoo birds found in nests of different species differ in size? http://opinionator.blogs.nytimes.com/2010/06/01/c uckoo-cuckoo/ Statistics: Unlocking the Power of Data Lock5

  6. Length of Cuckoo Eggs Statistics: Unlocking the Power of Data Lock5

  7. Notation k = number of groups nj= number of units in group j n = overall number of units = n1 + n2+ + nk Statistics: Unlocking the Power of Data Lock5

  8. Cuckoo Eggs Bird Sample Mean 22.90 22.50 22.58 23.12 21.13 22.46 Sample SD 1.07 0.97 0.68 1.07 0.74 1.07 Sample Size 15 60 16 14 15 120 Pied Wagtail Pipit Robin Sparrow Wren Overall k = 5 n1 = 15, n2 = 60, n3 = 16, n4 = 14, n5 = 15 n = 120 Statistics: Unlocking the Power of Data Lock5

  9. Hypotheses To test for a difference in means across k groups: Statistics: Unlocking the Power of Data Lock5

  10. Test Statistic Why can t use the familiar formula sample statistic null value SE to get the test statistic? We need something a bit more complicated Statistics: Unlocking the Power of Data Lock5

  11. Difference in Means Whether or not two means are significantly different depends on How far apart the means are How much variability there is within each group Statistics: Unlocking the Power of Data Lock5

  12. Difference in Means 14 14 14 10 10 10 6.5 6.5 6.5 12 12 12 8 8 8 10 10 10 6.0 6.0 6.0 6 6 6 8 8 8 5.5 5.5 5.5 4 4 4 6 6 6 5.0 5.0 5.0 4 4 4 2 2 2 2 2 2 4.5 4.5 4.5 0 0 0 0 0 0 group1 group1 group1 group2 group2 group2 group1 group1 group1 group2 group2 group2 group1 group1 group1 group2 group2 group2 = = 5 X X = = = = 5 X X = 5 6 X X = 1 1 1 6 = = 9 = 2 2 2 s 2 s s = 2 s s 0. 2 s 1 2 1 2 1 2 Statistics: Unlocking the Power of Data Lock5

  13. Analysis of Variance Analysis of Variance (ANOVA) compares the variability between groupsto the variability within groups = Variability Between Groups Variability Within Groups + Total Variability Statistics: Unlocking the Power of Data Lock5

  14. Analysis of Variance If the groups are actually different, then a) the variability between groups should be higher than the variability within groups b) the variability within groups should be higher than the variability between groups Statistics: Unlocking the Power of Data Lock5

  15. Discoveries for Today How to measure variability between groups? How to measure variability within groups? How to compare the two measures? How to determine significance? Statistics: Unlocking the Power of Data Lock5

  16. Discoveries for Today How to measure variability between groups? How to measure variability within groups? How to compare the two measures? How to determine significance? Statistics: Unlocking the Power of Data Lock5

  17. Sums of Squares We will measure variability as sums of squared deviations (aka sums of squares) familiar? Statistics: Unlocking the Power of Data Lock5

  18. Sums of Squares = Variability Between Groups Variability Within Groups + Total Variability = k n j n ( ) ( ) k ( ) 2 + 2 2 n X X X X X X j j , i j j i = = = 1 j 1 i 1 j = 1 i data value i overall mean mean in group j overall mean ithdata value in group j mean in group j Sum over all data values Sum over all groups Sum over all data values Statistics: Unlocking the Power of Data Lock5

  19. Deviations Between X X Total X 1 X Group 1 i Within X X X , i j j 1 Group 1 Mean Group 2 X Overall Mean Statistics: Unlocking the Power of Data Lock5

  20. Sums of Squares = Variability Between Groups Variability Within Groups + Total Variability = k n j n ( ) ( ) k ( ) 2 + 2 2 n X X X X X X j j , i j j i = = = 1 j 1 i 1 j = 1 i = SST + SSG SSE (Total sum of squares) (sum of squares due to groups) ( Error sum of squares) Statistics: Unlocking the Power of Data Lock5

  21. Cuckoo Birds = n ( ) 2 = = 137.19 SST X X i = 1 i + k ( ) 2 = = 35.90 SSG n X X j j = 1 j j n k ( ) = 2 = 101.29 SSE X X , i j j = = 1 i 1 j Statistics: Unlocking the Power of Data Lock5

  22. ANOVA Table The mean square is the sum of squares divided by the degrees of freedom Source df Sum of Squares SSG Mean Square MSG = SSG/(k-1) MSE = SSE/(n-k) Groups k-1 Error n-k SSE Total n-1 SST average variability variability Statistics: Unlocking the Power of Data Lock5

  23. ANOVA Table Fill in the beginnings of the ANOVA table based on the Cuckoo birds data. Bird Sample Mean 22.90 Sample SD 1.07 Sample Size 15 Source Sum of Squares SSG df Mean Square MSG = SSG/(k-1) Pied Wagtail Pipit 22.50 0.97 60 Groups k-1 Robin 22.58 0.68 16 Sparrow 23.12 1.07 14 Wren 21.13 0.74 15 MSE = SSE/(n-k) Error SSE n-k Overall 22.46 1.07 120 SSG = 35.9 SSE = 101.20 Total SST n-1 Statistics: Unlocking the Power of Data Lock5

  24. ANOVA Table Fill in the beginnings of the ANOVA table based on the Cuckoo birds data. Source df Sum of Squares Mean Square Groups Error Total Statistics: Unlocking the Power of Data Lock5

  25. Discoveries for Today How to measure variability between groups? How to measure variability within groups? How to compare the two measures? How to determine significance? Statistics: Unlocking the Power of Data Lock5

  26. F-Statistic The F-statistic is a ratio of the average variability between groups to the average variability within groups average between group variability average within group variability MSG MSE = = F Statistics: Unlocking the Power of Data Lock5

  27. ANOVA Table Source df Sum of Squares SSG Mean Square MSG = SSG/(k-1) MSE = SSE/(n-k) F Statistic MSG MSE Groups k-1 Error n-k SSE Total n-1 SST Statistics: Unlocking the Power of Data Lock5

  28. Cuckoo Eggs Source df Sum of Squares 35.90 Mean Square 35.9/4 = 8.97 101.29/115 = 0.88 F Statistic 8.97/0.88 = 10.19 Groups 4 Error 115 101.29 Total 119 137.19 Statistics: Unlocking the Power of Data Lock5

  29. F-statistic If there really is a difference between the groups, we would expect the F-statistic to be a) Higher than we would observe by random chance b) Lower than we would observe by random chance Statistics: Unlocking the Power of Data Lock5

  30. Discoveries for Today How to measure variability between groups? How to measure variability within groups? How to compare the two measures? How to determine significance? Statistics: Unlocking the Power of Data Lock5

  31. How to determine significance? We have a test statistic. What else do we need to perform the hypothesis test? A distribution of the test statistic assuming H0 is true How do we get this? Two options: 1) Simulation 2) Distributional Theory Statistics: Unlocking the Power of Data Lock5

  32. Simulation www.lock5stat.com/statkey Because a difference would make the F- statistic higher, calculate proportion in the upper tail An F-statistic this large would be very unlikely to happen just by random chance if the means were all equal, so we have strong evidence that the mean lengths of cuckoo birds in nests of different species are not all equal. Statistics: Unlocking the Power of Data Lock5

  33. F-distribution Randomization Distribution Randomization Distribution 600 600 500 500 F-distribution Frequency Frequency 400 400 300 300 200 200 100 100 0 0 0 0 2 2 4 4 6 6 8 8 10 10 F-statistic F-statistic Statistics: Unlocking the Power of Data Lock5

  34. F-Distribution If the following conditions hold, 1. Sample sizes in each group are large (each nj 30) OR the data are relatively normally distributed 2. Variability is similar in all groups 3. The null hypothesis is true then the F-statistic follows an F-distribution The F-distribution has two degrees of freedom, one for the numerator of the ratio (k 1) and one for the denominator (n k) Statistics: Unlocking the Power of Data Lock5

  35. Equal Variance The F-distribution assumes equal within group variability for each group As a rough rule of thumb, this assumption is violated if the standard deviation of one group is more than double the standard deviation of another group Statistics: Unlocking the Power of Data Lock5

  36. F-distribution Can we use the F-distribution to calculate the p-value for the Cuckoo bird eggs? a) b) No c) information Yes Bird Sample Mean 22.90 Sample SD 1.07 Sample Size 15 Pied Wagtail Need more Pipit 22.50 0.97 60 Robin 22.58 0.68 16 Sparrow 23.12 1.07 14 Wren 21.13 0.74 15 Overall 22.46 1.07 120 Statistics: Unlocking the Power of Data Lock5

  37. Length of Cuckoo Eggs Statistics: Unlocking the Power of Data Lock5

  38. ANOVA Table Source df Sum of Squares SSG Mean Square MSG = SSG/(k-1) MSE = SSE/(n-k) F p-value Statistic MSG MSE Groups k-1 Use Fk-1,n-k Error n-k SSE Total n-1 SST Statistics: Unlocking the Power of Data Lock5

  39. Cuckoo Eggs Statistics: Unlocking the Power of Data Lock5

  40. ANOVA Table Source df Sum of Squares 35.90 Mean Square 8.97 F p-value Statistic 10.19 Groups 4 4.3 10-7 Error 115 101.29 0.88 Equal variability Normal(ish) data Total 119 137.19 We have very strong evidence that average length of cuckoo eggs differs for nests of different species Statistics: Unlocking the Power of Data Lock5

  41. Study Hours by Class Year Can we use the F-distribution to calculate the p-value for whether there is a difference in average hours spent studying per week by class year at Duke? a) b) No c) information Yes Year Sample Mean 16.06 17.51 19.31 Sample SD 10.33 9.29 14.74 Sample Size 72 74 52 First Year Sophomore Upperclass Need more Statistics: Unlocking the Power of Data Lock5

  42. Study Hours by Class Year Is there a difference in the average hours spent studying per week by class year at Duke? Year Sample Mean 16.06 17.51 19.31 Sample SD 10.33 9.29 14.74 Sample Size 72 74 52 First Year Sophomore Upperclass (a)Yes (b)No (c)Cannot tell from this data (d)I didn t finish = = 318 24984 SSG SSE Statistics: Unlocking the Power of Data Lock5

  43. ANOVA Table Source df Sum of Squares Mean Square F- p-value Statistic Groups Error Total Statistics: Unlocking the Power of Data Lock5

  44. Summary Analysis of variance is used to test for a difference in means between groups by comparing the variability between groups to the variability within groups Sums of squares are used to measure variability The F-statistic is the ratio of average variability between groups to average variability within groups The F-statistic follows an F-distribution, if sample sizes are large (or data is normal), variability is equal across groups, and the null hypothesis is true Statistics: Unlocking the Power of Data Lock5

  45. To Do Read Section 8.1 (we are skipping 8.2) Do Homework 6 (due Monday, 3/24) Statistics: Unlocking the Power of Data Lock5

Related


More Related Content