Understanding Correlation in Statistics: Methods and Interpretation

biol 4605 7220 ch 20 1 correlation n.w
1 / 43
Embed
Share

Explore the concept of correlation in statistics through various examples and visualizations. Learn about different types of correlations, parametric vs non-parametric measures, Pearson's correlation coefficient, and the distinction between regression and correlation. Enhance your understanding of relationships between variables and their implications in data analysis.

  • Correlation
  • Statistics
  • Regression
  • Parametric
  • Non-parametric

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. BIOL 4605/7220 CH 20.1 Correlation GPT Lectures Cailin Xu November 9, 2011

  2. GLM: correlation Regression ANOVA Only one dependent variable GLM ANCOVA Multivariate analysis Multiple dependent variables (Correlation)

  3. Correlation Two variables associated with each other? No casual ordering (i.e., NEITHER is a function of the other) Total length of aphid stem mothers 1 Y Mean thorax length of theirparthenogenetic offspring 2 Y Data from Box 15.4 Sokal and Rohlf 2012

  4. Correlation 7 6.5 6 .Y Y vs 2 1 Y2 5.5 5 4.5 4 5 7 9 11 13 Y1

  5. Correlation 13 11 .Y Y vs 1 2 Y1 9 7 5 3 4 5 6 7 Y2

  6. Correlation Rotate

  7. Regression vs. Correlation Regression Correlation Does Y depend on X? (describe func. relationship/predict) Are Y1 and Y2 related? Both Y1 & Y2 are random variables Usually, X is manipulated & Y is a random variable No casual ordering Casual ordering Y=f(X)

  8. Correlation: parametric vs. non-parametric Parametric measures: Pearson s correlation Nonparametric measures: Spearman s Rho, Kendall s Tau Type of data Measures of correlation Measurements (from Normal/Gaussian Population) Parametric: Pearson s correlation Ranks, Scores, or Data that do not meet assumptions for sampling distribution (t, F, 2) Nonparametric: Spearman s Rho, Kendall s Tau

  9. Pearsons Correlation Coefficient () - Strength of relation between two variables - Geometric interpretation 1&Y Y 2 = cos( ) 2 Y Regression of Y on Y 2 1 Perfect positive association: =0 =1 No association: =90 =0 Perfect negative association: =180 =-1 Regression of Y 1 on Y 2 -1 1, true relation 1 Y

  10. Pearsons Correlation Coefficient () - Strength of relation between two variables - Geometric interpretation - Definition 1&Y Y 2 ( )( ) cov( , ) Y Y E Y Y 1 2 1 2 Y Y = = 1 2 , Y Y 1 2 Y Y Y Y 1 2 1 2 Covariance of the two variables divided by the product of their standard deviations

  11. Pearsons Correlation Coefficient () - Strength of relation between two variables - Geometric interpretation - Definition - Estimate from a sample ) ( r = 1&Y Y 2 Parameter Estimate Name Symbol 1 Y Mean of 1 Y 1 Y 2 Y Mean of 2 Y 2 Y 2 2 1 Y s Variance of 1 Y 1 Y 2 2 Y s Variance of 2 Y 2 Y 2

  12. Pearsons Correlation Coefficient () - Strength of relation between two variables - Geometric interpretation - Definition - Estimate from a sample ) ( r = 1&Y Y 2 ( )( ) cov( , ) Y Y E Y Y 1 2 1 2 Y Y = = 1 2 Parameter Estimate , Y Y 1 2 Y Y Y Y 1 2 1 2 1 Y 1 Y 2 Y 2 Y 2 2 1 Y s ( )( ) ( )( ) 1 Y i i Y Y Y Y Y Y Y Y 1 1 2 2 1 1 2 2 i i i i 1 2 2 Y s = = = r 2 Y ( ) ( ) 2 1 n s s i i 2 2 Y Y Y Y Y Y 1 2 1 1 2 2 i i

  13. Pearsons Correlation: Significance Test Determine whether a sample correlation coefficient could have come from a population with a parametric correlation coefficient of ZERO - could have come from a population with a parametric correlation coefficient of ZERO Determine whether a sample correlation coefficient Determine whether a sample correlation coefficient could have come from a population with a parametric correlation coefficient of CERTAIN VALUE 0 - could have come from a population with a parametric correlation coefficient of CERTAIN VALUE 0 Determine whether a sample correlation coefficient - Generic recipe for Hypothesis Testing

  14. Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error State frequency distribution Calculate statistic Report statistic with decision Calculate p-value Declare decision

  15. Hypothesis Testing --- Generic Recipe State population All measurements on total length of aphid stem mothers & mean thorax length of theirparthenogenetic offspring made by the same experimental protocol 1). Randomly sampled 2). Same environmental conditions

  16. Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) Correlation of the two variables, In the case 1 n = : 0 H 0 r = t 2 1 r ) 1 , 0 , N if n LARGE 2 r = , ~ r 2 n = ) 2 , , 2 t distributi on df n otherwise 2 = : ( ) 0 H In the case 0 1 1 + 1 1 1 = ln 2 1 1 n z + 1 1 1 r = = = = ln , ( ) , var( ) where z E z z t 2 1 3 r n 1 3 z: Normal/tends to normal rapidly as n increases for 0 t-statistic: N(0, 1) or t (df = )

  17. Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) Correlation of the two variables, In the case 1 n = : 0 H 0 r = t 2 1 n r ) 1 , 0 , N if n LARGE 2 r = , ~ r 2 = ) 2 , 2 t distributi on df n 2

  18. Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis = : 0 H 0

  19. Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis : 0 H A

  20. Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error = 5 % ( ) convention al level

  21. Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error State frequency distribution t-distribution

  22. Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error State frequency distribution Calculate statistic t-statistic correlation coefficient estimate, r = 0.65 t = (0.65 0)/0.21076 = 3.084

  23. Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error State frequency distribution Calculate statistic t = 3.084, df = 13 p = 0.0044 (one-tail) & 0.0088 (two-tail) Calculate p-value

  24. Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error State frequency distribution p = 0.0088 < = 0.05 reject 0 H H Calculate statistic Calculate p-value Declare decision = : 0 o

  25. Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis r = 0.65, n = 15, p = 0.0088 State tolerance for Type I error Total length & offspring thorax length are related State frequency distribution Calculate statistic Report statistic with decision Calculate p-value Declare decision

  26. Pearsons Correlation Assumptions -6 -4 -2 0 2 4 6 8 0.6 Assumptions 6 0.4 4 Normal & independent errors 5 0.2 11 15 2 4 10 Comp.2 Lmother 0.0 Homogeneous around straight line 0 2 7 6 14 3 Lthor -2 1 -0.2 12 13 -4 -0.4 -6 9 -0.6 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 Comp.1 What if assumptions for Pearson test not met? Here are the observations relative to the correlation line (comp 1) Not homogeneous, due to outliers (observations 8 & 9)

  27. Pearsons Correlation Randomization test Significance test with no distributional assumptions Hold one variable, permute the other one many times A new r from each new permutation Construct empirical frequency distribution Compare the empirical distribution with the observed r

  28. Pearsons Correlation Randomization test 8000 times p1 =p(r > 0.65) = 0.001875 p2 = p (r < -0.65) = 0.003875 p = p1 + p2 = 0.00575 < = 0.05 -0.65 0.65 Reject Null Consistent with testing result from theoretical t-distribution, for this data

  29. Pearsons Correlation coefficient Confidence Limit 95% confidence limit (tolerance of Type I error @ 5%) t-distribution (df = n 2) (NO) a). H0: = 0 was rejected b). Distribution of r is negatively skewed c). Fisher s transformation + n 1 1 r z ( 1 , 0 ) = ln ; ~ z N or t [ ] 2 1 r 1 3 + 1 1 1 = ln 2 1 1

  30. Pearsons Correlation coefficient Confidence Limit C. I. for : = ) 3 1 /( z z 1 ( z n ) 2 / l , , z critical value from N(0, 1) at p = 1- /2 1 ( ) 2 / = + ) 3 1 /( z z z n 1 ( ) 2 / u C. I. for : For our example: exp( 2 ) 1 z 95 percent confidence interval: 207 . 0 = u r l = = tanh( ) r z l l + ) = exp( 2 ) 1 z r l l exp( 2 1 z . 0 872 u = = tanh( ) r z u u + exp( 2 ) 1 z u

  31. Nonparametric: Spearmans Rho Measure of monotone association used when the distribution of the data make Pearson's correlation coefficient undesirable or misleading Spearman s correlation coefficient (Rho) is defined as the Pearson s correlation coefficient between the ranked variables ( )( ) , 2 2 2 1 1 i i i y y y y 1 1 2 2 i i , , where y y are ranks of Y Y = Rho 1 2 1 2 i i i i ( ) ( ) 2 y y y y i i 2 6 d i = = , 1 , i If no ties Rho where d y y 1 2 i i i ) 1 2 ( n n Randomization test for significance (option)

  32. Nonparametric: Kendalls Tau Concordant pairs ( ) ( : ) j Y Y and Y Y 1 , 2 1 , 2 i i j If Y Y and Y Y or if Y Y and Y Y 1 1 2 2 1 1 2 2 i j i j i j i j (if the ranks for both elements agree) Discordant pairs ( ) ( : ) j Y Y and Y Y 1 , 2 1 , 2 i i j If Y Y and Y Y or if Y Y and Y Y 1 1 2 2 1 1 2 2 i j i j i j i j (if the ranks for both elements disagree) Neither concordant or discordant = = If Y Y or Y Y 1 1 2 2 i j i j

  33. Nonparametric: Kendalls Tau Kendall s Tau = n n c d 1 (no ties) ( ) 1 n n = , where n number of concordant pairs 2 c = n number of discordant pairs d n n c d (in the case of ties) + n n c d Properties: Gamma coefficient or Goodman correlation coefficient The denominator is the total number of pairs, -1 tau 1 tau = 1, for perfect ranking agreement tau = -1, for perfect ranking disagreement tau 0, if two variables are independent For large samples, the sampling distribution of tau is approximately normal

  34. Nonparametric For more information on nonparametric test of correlation e.g., significance test, etc. References: Conover, W.J. (1999) Practical nonparametric statistics , 3rd ed. Wiley & Sons Kendall, M. (1948) Rank Correlation Methods , Charles Griffin & Company Limited Caruso, J. C. & N. Cliff. (1997) "Empirical Size, Coverage, and Power of Confidence Intervals for Spearman's Rho", Ed. and Psy. Meas., 57 pp. 637 654 Corder, G.W. & D.I. Foreman. (2009) "Nonparametric Statistics for Non- Statisticians: A Step-by-Step Approach", Wiley

  35. DataTotal length of aphid stem mothers (Y1) Vs. Mean thorax length of theirparthenogenetic offspring (Y2) # 2 Y 1 Y 1 2 3 4 5 6 7 8 9 8.7 8.5 9.4 10.0 6.3 7.8 11.9 6.5 6.6 10.6 10.2 7.2 8.6 11.1 11.6 5.95 5.65 6.00 5.70 4.70 5.53 6.40 4.18 6.15 5.93 5.70 5.68 6.13 6.30 6.03 10 11 12 13 14 15

  36. Total length of mothers Vs. Mean thorax length of offspring RAW RANK 2 Y 1 y 2 y 1 Y # 1 2 3 4 5 6 7 8 9 8.7 8.5 9.4 10.0 6.3 7.8 11.9 6.5 6.6 10.6 10.2 7.2 8.6 11.1 11.6 5.95 5.65 6.00 5.70 4.70 5.53 6.40 4.18 6.15 5.93 5.70 5.68 6.13 6.30 6.03 8 6 9 9 4 10 6.5 10 1 5 2 3 15 2 3 12 11 4 7 13 14 15 1 13 8 6.5 10 11 12 13 14 15 5 12 14 11

  37. Group Activity

  38. Activity Instructions Question: REGRESSION or CORRELATION? Justification guideline: y X Regression: Y2 Y1 Correlation: X1, . . . Xn unknown

  39. Activity Instructions Form small groups or 2-3 people. Each group is assigned a number Group members work together on each example for 5 minutes, come up with an answer & your justifications A number will be randomly generated from the group # s The corresponding group will have to present their answer & justifications Go for the next example . . .

  40. Activity Instructions There is NO RIGHT/WRONG ANSWER (for these examples), as long as your justifications are LOGICAL

  41. Example 1 Height and ratings of physical attractiveness vary across individuals. Would you analyze this as regression or correlation? Subject Height Phy 1 69 7 2 61 8 3 68 6 4 66 5 5 66 8 . .. . 48 71 10

  42. Example 2 Airborne particles such as dust and smoke are an important part of air pollution. Measurements of airborne particles made every six days in the center of a small city and at a rural location 10 miles southwest of the city (Moore & McCabe, 1999. Introduction to the Practice of Statistics). Would you analyze this relation as regression or correlation?

  43. Example 3 A study conducted in the Egyptian village of Kalama examined the relation between birth weights of 40 infants and family monthly income (El-Kholy et al. 1986, Journal of the Egyptian Public Health Association, 61: 349). Would you analyze this relation as regression or correlation?

Related


More Related Content