
Pearson's Correlation Coefficient and Its Applications
Explore the concepts of positive and negative correlations, Fisher r to z transformation, range restriction, range enhancement, and the impact of reliability on correlation coefficient. Discover why ANOVA data may not imply causality and the importance of sampling distribution of correlations.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Correlation A bit about Pearson s r
Questions What does it mean when a correlation is positive? Negative? What is the purpose of the Fisher r to z transformation? What is range restriction? Range enhancement? What do they do to r? Give an example in which data properly analyzed by ANOVA cannot be used to infer causality. Why do we care about the sampling distribution of the correlation coefficient? What is the effect of reliability on r?
Basic Ideas Nominal vs. continuous IV Degree (direction) & closeness (magnitude) of linear relations Sign (+ or -) for direction Absolute value for magnitude Pearson product-moment correlation coefficient = z z X Y r N
Illustrations Plot of Weight by Height Plot of Errors by Study Time 210 30 180 20 Weight Errors 150 10 120 90 0 60 63 66 69 72 75 0 100 200 300 400 Height Study Time Plot of SAT-V by Toe Size 700 Positive, negative, zero 600 SAT-V 500 400 1.5 1.6 1.7 1.8 1.9 Toe Size
Simple Formulas = xy Use either N throughout or else use N-1 throughout (SD and denominator); result is the same as long as you are consistent. r NS S X Y = = x X X and y Y Y 2) ( X X = SX N xy = ( , ) C ov X Y N Pearson s r is the average cross product of z scores. Product of (standardized) moments from the means. = z z N X X = x y z r SX
Graphic Representation Plot of Weight by Height Plot of Weight by Height Plot of Weight by Height Plot of Weight by Height in Z-scores Plot of Weight by Height in Z-scores Plot of Weight by Height in Z-scores 2 2 210 - + 1 1 180 M ean = 150.7 lbs. Z-weight Z-weight Weight 0 0 150 -1 -1 120 - + M ean = 66.8 Inc hes -2 -2 90 -2 -1 0 1 2 60 60 63 63 66 66 69 69 72 72 75 75 Height Height Z-height 1. Conversion from raw to z. 2. Points & quadrants. Positive & negative products. 3. Correlation is average of cross products. Sign & magnitude of r depend on where the points fall. 4. Product at maximum (average =1) when points on line where zX=zY.
Descriptive Statistics Ht Wt Valid N (listwise) N 10 10 10 Minimum 60.00 110.00 Maximum 78.00 200.00 Mean 69.0000 155.0000 Std. Deviation 6.05530 30.27650 r = 1.0
r=1 Leave X, add error to Y. r=.99
r=.99 Add more error. r=.91
Review What does it mean when a correlation is positive? Negative?
Sampling Distribution of r Statistic is r, parameter is (rho). In general, r is slightly biased. Sampling Distributions of r 0.08 0.06 Relative Frequency rho=0 rho=.5 rho=-.5 0.04 0.02 0.00 -1.2 -0.8 -0.4 0.0 0.4 0.8 1.2 Observed r N 2 2 ) ( 1 = 2 The sampling variance is approximately: r Sampling variance depends both on N and on .
Empirical Sampling Distributions of the Correlation Coefficient 100 ; 5 . = = N ; 5 . = 0.9 + 0 | 0 | | 0 | | 0 0 | 0.8 + 0 | | | | | | | | | +-----+ | 0 | +-----+ | | 0.7 + 0 | *--+--* *--+--* | | | +-----+ | | | | | | +-----+ | | | | | 0.6 + | | | | | | +-----+ 0 | | +-----+ | | 0 | | | | | | 0 | 0.5 + *--+--* *--+--* 0 0 | | | | | 0 0 | +-----+ | | * 0 | | +-----+ 0 0.4 + | | 0 | | | * 0 | | | * | | | 0.3 + 0 | | 0 | * | 0 | | 0 0 0.2 + 0 0 | 0 0 | 0 0 | 0 0.1 + 0 | 0 | 0 | 0 0 + * | * | * | -0.1 + ------------+-----------+-----------+-----------+----------- param .5_N100 .5_N50 .7_N100 .7_N50 50 ; 7 . = = 100 N ; 7 . = = = 50 N N
Fishers r to z Transformation Fisher r to z Transformation 1.5 r .10 .20 .30 .40 .50 .60 .70 .80 .90 z .10 .20 .31 .42 .55 .69 .87 1.10 1.47 + 1 ( ) r 1.3 = 5 . ln z 1 ( ) r 1.1 z (output) 0.9 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 r (sample value input) Sampling distribution of z is normal as N increases. Pulls out short tail to make better (normal) distribution. Sampling variance of z = (1/(n-3)) does not depend on . R to z function is also atanh in geometry.
Hypothesis test: = : 0 H 0 r Result is compared to t with (N- 2) df for significance. = 2 t N 2 1 r Say r=.25, N=100 25 . 25 . p< .05 = = = 98 . 9 899 . 2 56 t 986 . 2 1 25 . t(.05, 98) = 1.984.
Hypothesis test 2: + / 1 = : H value 0 + 1 1 1 1 r r One sample z test where r is sample value and is hypothesized population value. . log 5 . log 5 e e = z 3 N Say N=200, r = .54, and is .30. + 1 + 1 54 1 54 . . 1 30 1 30 . . . log 5 . log 5 07 . 60 31 . e e z = = =4.13 z . / 200 3 Compare to unit normal, e.g., 4.13 > 1.96 so it is significant. Our sample was not drawn from a population in which rho is .30.
Hypothesis test 3: = 0: H 1 2 Testing equality of correlations from 2 INDEPENDENT samples. r r N + /( ) /( 1 3 1 1 + + 1 1 1 1 r r 3 . log 5 . log 5 1 2 e e = 1 2 z ) N 2 Say N1=150, r1=.63, N2=175, r2=70. + + /( ) 1 150 3 1 175 3 + 1 63 1 63 . . 1 70 1 70 . . . log 5 . log 5 11 . 74 87 . e e = -1.18, n.s. z = = z . /( )
Hypothesis test 4: = = = : ... H 0 1 2 k Testing equality of any number of independent correlations. = ) 3 ( i n Compare Q to chi-square with k-1 df. k ( ) 3 n z = 2) ( 3 )( Q n z z i i i i = 1 i z Study r n z (n-3)z zbar (z-zbar)2 (n-3)(z-zbar)2 1 2 3 sum .2 .5 .6 200 150 75 425 .2 .55 80.75 .69 49.91 170.6 39.94 .41 .41 .41 .0441 .0196 .0784 8.69 2.88 5.64 17.21=Q Chi-square at .05 with 2 df = 5.99. Not all rho are equal.
Hypothesis test 5: dependent r = H 0: Hotelling-Williams test 12 13 + ( 1 )( 1 ) r N 23 r + = ( ) tN 12 r 13 r ) 3 ( ) 1 | ) 3 2 3 ( 2 /( | 1 ( ) N N R 23 r Say N=101, r12=.4, r13=.6, r23=.3 = / ) 6 . + 5 . = = + (. 4 2 r ( / ) 13 r 2 12 r 2 r 12 r | R = + 2 2 | 1 ( 2 )( )( ) 13 r r 12 r 13 r r 23 23 = 4 . 6 . 3 . + = 2 2 2 | | 1 2 (. 4 )(. 6 )(. ) 3 534 . R ) 3 . + 100 ( )( 1 = ) 6 . = 1 . 2 (. 4 t ) 3 ( N 5 . + ) 3 . 2 3 ( 2 100 ) /( 534 ). 98 1 ( t(.05, 98) = 1.98 = 0: H See my notes. 12 34
Review What is the purpose of the Fisher r to z transformation? Test the hypothesis that Given that r1 = .50, N1 = 103 r2 = .60, N2 = 128 and the samples are independent. Why do we care about the sampling distribution of the correlation coefficient? = 1 2
Reliability Reliability sets the ceiling for validity. Measurement error attenuates correlations. = ' ' XY T T XX YY X Y If correlation between true scores is .7 and reliability of X and Y are both .8, observed correlation is 7.sqrt(.8*.8) = .7*.8 = .56. Disattenuated correlation = / ' ' T T XY XX YY X Y If our observed correlation is .56 and the reliabilities of both X and Y are .8, our estimate of the correlation between true scores is .56/.8 = .70.
Add Error to Y only The correlation decreases. Distribution of X does not change. Distribution of Y becomes wider (increased variance). Slope of Y on X remains constant (SDy effect on b and r cancels out. Not true for error in X.
Review What is range restriction? Range enhancement? What do they do to r? What is the effect of reliability on r?
SAS Power Estimation proc power; nullcorr = 0.2 sides = 1 ntotal = 100 power = .; run; proc power; nullcorr = 0 sides = 2 ntotal = . power = .8; run; onecorr dist=fisherz corr = 0.35 onecorr corr = 0.35 Computed N Total Alpha = .05 Actual Power = .801 Ntotal = 61 Computed Power Actual alpha = .05 Power = .486
Power for Correlations Rho N required against Null: rho = 0 782 346 193 123 84 61 .10 .15 .20 .25 .30 .35 Sample sizes required for powerful conventional significance tests for typical values of the correlation coefficient in psychology. Power = .8, two tails, alpha is .05.
Programs Review corrs Excel program from website Download Excel file Show examples of tests for correlations Review R program for computing correlations
Exercises Download Spector s data Compute univariates & correlation matrix 5 vbls: Age, Autonomy, Work hours, Interpersonal conflict, Job Satisfaction Problems: Which pairs are significant? (use the per comparison or nominal alpha) Is the absolute value of the correlation between conflict and job satisfaction significantly different from .5? Is the correlation between age and conflict different than the correlation between age and job satisfaction?