Relationships and Gender Proportions in Statistical Data Analysis

stat 101 n.w
1 / 52
Embed
Share

Explore the analysis of two categorical variables - relationship status and gender - to determine proportions of students in a sample in a relationship, females in relationships, and differences in proportions between males and females. Discover how statistical data can unveil insights into these relationships.

  • Relationships
  • Gender
  • Statistical Analysis
  • Proportions
  • Data Interpretation

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. STAT 101 Dr. Kari Lock Morgan Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical (2.1) Quantitative and categorical (2.4) Two quantitative (2.5) Statistics: Unlocking the Power of Data Lock5

  2. The Big Picture Population Sampling Sample Statistical Inference Descriptive Statistics Statistics: Unlocking the Power of Data Lock5

  3. Two Categorical Variables Look at the relationship between two categorical variables 1. Relationship status 2. Gender Statistics: Unlocking the Power of Data Lock5

  4. Two-Way Table Female 32 12 63 107 Male 10 7 45 62 Total 42 19 108 169 In a Relationship It s Complicated Single Total It doesn t matter which variable is displayed in the rows and which in the columns Data from Duke students R: table(relationship, gender) Statistics: Unlocking the Power of Data Lock5

  5. Two-Way Table Female 32 12 63 107 Male 10 7 45 62 Total 42 19 108 169 In a Relationship It s Complicated Single Total a) 42/169 25% b) 32/107 30% c) 10/62 16% d) 32/42 76% What proportion of students in this sample are in a relationship? Statistics: Unlocking the Power of Data Lock5

  6. Two-Way Table Female 32 12 63 107 Male 10 7 45 62 Total 42 19 108 169 In a Relationship It s Complicated Single Total a) 42/169 25% b) 32/107 30% c) 10/62 16% d) 32/42 76% What proportion of females in this sample are in a relationship? Statistics: Unlocking the Power of Data Lock5

  7. Male and Female Proportions 30% of females in the sample say they are in a relationship 16% of males in the sample say they are in a relationship Why the difference??? Statistics: Unlocking the Power of Data Lock5

  8. Difference in Proportions A difference in proportions is a difference in proportions for one categorical variable calculated for different levels of the other categorical variable Example: proportion of females in a relationship proportion of males in a relationship ?? ??= 0.30 0.16 = 0.14 Statistics: Unlocking the Power of Data Lock5

  9. Two-Way Table Female 32 12 63 107 Male 10 7 45 62 Total 42 19 108 169 In a Relationship It s Complicated Single Total a) 42/169 25% b) 32/107 30% c) 10/62 16% d) 32/42 76% What proportion of people in a relationship in this sample are female? Statistics: Unlocking the Power of Data Lock5

  10. Two-Way Table CAUTION: The proportion of females in a relationship is NOT THE SAME AS the proportion of people in a relationship who are female! 30% 76%! Statistics: Unlocking the Power of Data Lock5

  11. Side-by-Side Bar Chart The height of each bar is the number of the corresponding cell in the two-way table R: barplot(relationship~gender, beside=TRUE) Statistics: Unlocking the Power of Data Lock5

  12. Segmented Bar Chart A segmented bar chart is like a side-by-side bar chart, but the bars are stacked instead of side-by-side R: barplot(relationship~gender) Statistics: Unlocking the Power of Data Lock5

  13. Vitamin D Injections Many kidney dialysis patients get vitamin D injections to correct for a lack of calcium. Two forms of vitamin D injections are used: calcitriol and paricalcitol. The records of 67,000 dialysis patients were examined, and half received one drug; the other half the other drug. After three years, 58.7% of those getting paricalcitol had survived, while only 51.5% of those getting calcitriol had survived. Construct an approximate two-way table of the data (due to rounding of the percentages we can t recover the exact counts round to whole numbers). Source: Teng, M., et. al., Survival of patients undergoing hemodialysis with paricalcitol or calcitriol Therapy, New England Journal of Medicine, July 31, 2003; 349(5): 446-456. Statistics: Unlocking the Power of Data Lock5

  14. Vitamin D Injections Survived Died 17,252 16,248 Total 33,500 33,500 Calcitriol Paricalcitol Total 19,665 36,917 13,835 30,083 67,000 Statistics: Unlocking the Power of Data Lock5

  15. Kidney Stones Success Failure 273 289 Treatment A Treatment B 77 61 Which treatment is better at removing kidney stones? a) Treatment A b) Treatment B R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham (1986). "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy". Br Med J (Clin Res Ed) 292 (6524): 879 882 Statistics: Unlocking the Power of Data Lock5

  16. Kidney Stones SMALL STONES Success Failure Treatment A Treatment B 81 234 6 36 Which treatment is better at removing small kidney stones? a) Treatment A b) Treatment B Statistics: Unlocking the Power of Data Lock5

  17. Kidney Stones LARGE STONES Success Failure Treatment A Treatment B 192 55 71 25 Which treatment is better at removing large kidney stones? a) Treatment A b) Treatment B Statistics: Unlocking the Power of Data Lock5

  18. Kidney Stones Treatment A is more effective for all kidney stones, but the data shows Treatment B to be effective overall! How is this possible!?!? Statistics: Unlocking the Power of Data Lock5

  19. Kidney Stones Simpsons Paradox ALL STONES Success Failure Success Rate Treatment A 273 Treatment B 289 77 61 78% 83% Small Stones Treatment A Treatment B Success Failure Success Rate 81 6 234 36 93% 87% Large Stones Treatment A Treatment B Success Failure Success Rate 192 71 55 25 73% 69% Statistics: Unlocking the Power of Data Lock5

  20. Kidney Stones Treatment A is used more often on large stones, which are harder to treat. This is an example of Simpson s Paradox: an observed relationship between two variables can change (or even reverse!) when a third variable is considered Statistics: Unlocking the Power of Data Lock5

  21. Kidney Stones Statistics: Unlocking the Power of Data Lock5

  22. Statistics: Unlocking the Power of Data Lock5

  23. Slope = # successful / # unsuccessful = odds Small Stones Treatment A Treatment B Successful 81 (93%) 234 (87%) Unsuccessful 6 36 Statistics: Unlocking the Power of Data Lock5

  24. Slope = # successful / # unsuccessful = odds Large Stones Treatment A Treatment B Successful 192 (73%) 55 (69%) Unsuccessful 71 25 Statistics: Unlocking the Power of Data Lock5

  25. Combined Treatment A Treatment B Successful 81+192=273 289 Unsuccessful 6+71=77 61 Statistics: Unlocking the Power of Data Lock5

  26. Combined Treatment A Treatment B Successful 273 (78%) 289 (83%) Unsuccessful 77 61 Statistics: Unlocking the Power of Data Lock5

  27. Combined Treatment A Treatment B Successful 273 (78%) 289 (83%) Unsuccessful 77 61 Statistics: Unlocking the Power of Data Lock5

  28. Statistics: Unlocking the Power of Data Lock5

  29. Summary: Two Categorical Variables Summary Statistics Two-way table Difference in proportions Visualization Side-by-side bar chart Segmented bar chart Statistics: Unlocking the Power of Data Lock5

  30. Quantitative and Categorical Relationships Interested in a quantitative variable broken down by categorical groups Statistics: Unlocking the Power of Data Lock5

  31. Tea and the Immune System Participants were randomized to drink five or six cups of either tea or coffee every day for two weeks (both drinks have caffeine but only tea has L-theanine) After two weeks, blood samples were exposed to an antigen, and production of interferon gamma (immune system response) was measured Explanatory variable: tea or coffee Response variable: measure of interferon gamma Mednick, Cai, Kanady, and Drummond (2008). Comparing the benefits of caffeine, naps and placebo on verbal, motor and perceptual memory, Behavioral Brain Research, 193, 79-86. Statistics: Unlocking the Power of Data Lock5

  32. Tea and the Immune System If the tea drinkers have significantly higher levels of interferon gamma, can we conclude that drinking tea rather than coffee caused an increase in this aspect of the immune response? a) Yes Randomized experiment possible to make conclusions about causality b) No Statistics: Unlocking the Power of Data Lock5

  33. Side-by-Side Boxplots R: boxplot(InterferonGamma~Drink) Statistics: Unlocking the Power of Data Lock5

  34. Quantitative Statistics by a Categorical Variable Any of the statistics we use for a quantitative variable can be looked at separately for each level of a categorical variable Tea: ??= 34.82 Coffee: ??= 17.70 Statistics: Unlocking the Power of Data Lock5

  35. Difference in Means Often, when comparing a quantitative variable across two categories, and compute the difference in means ?? ??= 34.82 17.70 = 17.12 R: compareMean(InterferonGamma~Drink) Statistics: Unlocking the Power of Data Lock5

  36. Summary: One Quantitative and One Categorical Summary Statistics Any summary statistics for quantitative variables, broken down by groups Difference in means Visualization Side-by-side boxplots Statistics: Unlocking the Power of Data Lock5

  37. Two Quantitative Variables Summary Statistics: correlation Visualization: scatterplot Statistics: Unlocking the Power of Data Lock5

  38. Scatterplot A scatterplot is the graph of the relationship between two quantitative variables. R: plot(study_hours, gpa) Statistics: Unlocking the Power of Data Lock5

  39. Direction of Association A positive association means that values of one variable tend to be higher when values of the other variable are higher A negative association means that values of one variable tend to be lower when values of the other variable are higher Two variables are not associated if knowing the value of one variable does not give you any information about the value of the other variable Statistics: Unlocking the Power of Data Lock5

  40. Cars Data Handout Quantitative Variables: Weight (pounds) City MPG Fuel capacity (gallons) Page number (in Consumer Reports) Time to go mile (in seconds) Acceleration time from 0 to 60 mph Relationships Weight vs. CityMPG Weight vs. FuelCapacity PageNum vs. Fuel Capacity Weight vs. QtrMile Acc060 vs. QtrMile CityMPG vs. QtrMile Statistics: Unlocking the Power of Data Lock5

  41. Car Associations Statistics: Unlocking the Power of Data Lock5

  42. Correlation The correlation is a measure of the strength and direction of linear association between two quantitative variables Sample correlation: r Population correlation: ( rho ) R: cor(x,y) Statistics: Unlocking the Power of Data Lock5

  43. Car Correlations (-.91) (.89) (-.45) (.51) (.99) (-.08) What are the properties of correlation? Statistics: Unlocking the Power of Data Lock5

  44. Correlation 1. -1 r 1 2. The sign indicates the direction of association 1. positive association: r > 0 2. negative association: r < 0 3. no linear association: r 0 3. The closer r is to 1, the stronger the linear association 4.r has no units and does not depend on the units of measurement 5. The correlation between X and Y is the same as the correlation between Y and X Statistics: Unlocking the Power of Data Lock5

  45. Correlation Guessing Game http://istics.net/gett/gcstart.php?group_id=duke Highest scorer in the class by the first exam gets one extra credit point! Statistics: Unlocking the Power of Data Lock5

  46. Correlation NFL Teams 1.0 z-score for Penalty Yards 0.5 0.0 -0.5 -1.0 r = 0.43 -1.5 3.0 Malevolence Rating of Uniform 3.5 4.0 4.5 5.0 Statistics: Unlocking the Power of Data Lock5

  47. Correlation 0.4 z-score for Penalty Yards 0.0 -0.4 r = 0.08 -0.8 3.0 3.5 4.0 4.5 5.0 Malevolence Rating of Uniform Same plot, but with Dolphins and Raiders (outliers) removed Statistics: Unlocking the Power of Data Lock5

  48. Human Cannonball Plot Y vs. X X Y What is the correlation between X and Y? a) r > 0 b) r < 0 c) r = 0 Are X and Y associated? a) Yes b) No Statistics: Unlocking the Power of Data Lock5

  49. Correlation Cautions 1. Correlation can be heavily affected by outliers. Always plot your data! 2.r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data! 3. Correlation does not imply causation! Statistics: Unlocking the Power of Data Lock5

  50. Summary: Two Quantitative Variables Summary Statistics: correlation Visualization: scatterplot Statistics: Unlocking the Power of Data Lock5

More Related Content