
Understanding Correlation Coefficients in Data Analysis
Learn about the linear correlation coefficient and correlation coefficient, how to compute correlation coefficients, possible values, examples, and a practical example in R coding. Understand how to interpret correlation values and their significance in statistical analysis.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
LINEAR CORRELATION The linear correlation coefficient measures the tendency of two numerical variables (call them X and Y) to co-vary that is to change together along a line.
CORRELATION COEFFICIENT The correlation coefficient measures the strength and direction of the association between two numerical variables. X and Y must have the same length. Values from X are paired with values of Y. In this case, each pair of values corresponds to the body & brain masses of a particular animal. Note the log-log scale
COMPUTING CORRELATION COEFFICIENT In order to compute an estimate for the correlation coefficient, we must first compute the average for both X and Y. Then our estimate for the correlation coefficient is: ( ) i (Xi-X)2 (Xi-X)(Yi-Y) r = i i (Yi-Y)2 The numerator is the sum of products, measuring how X and Y vary together. The denominator is the product (of the square roots) of the sums of squares for each of X and Y separately. (Similar to standard deviation) When doing exercises for this section, it is important to use the definition rather than the built in function. It is suggested that you try them using both Excel and R.
POSSIBLE VALUES The potential values for r range from -1 to +1. The strongest correlations are -1 and +1. These values indicate that our data is exactly linear, and are very, very unlikely to occur naturally. Correlations close to zero indicate that our data is not at all linearly correlated. This means that our data is all over the place, or that we have a correlation that is simply nonlinear.
R CODE WITH AN EXAMPLE This data pairs the number of visits by non-parent seabirds with the birds future likelihood to engage in aggressive behavior. Enter this data into two lists in R: numvisits and futureaggro (These lists must end up being the same size!) Compute the average for each list: avgvisits and avgaggro Compute the sums: sumproduct = sum( (numvisits-avgvisits)(futureaggro-avgaggro) ) ssvisits = sum( (numvisits-avgvisits)^2 ) ssaggro = sum( (futureaggro-avgaggro)^2 ) r = sumproduct / ( sqrt(ssvisits)*sqrt(ssaggro) ) cor( numvisits, futureaggro) #this is the built-in With both commands, we should end up with r = 0.534
STANDARD ERROR As with all sample statistics, there is a standard error for our approximation of the correlation coefficient. 1-r2 n-2 SEr= Remember that n is the number of pairs that we re working with. We then convert our r and our SE into another context that is normally distributed, compute the confidence interval and then convert back.
FISHERS Z-TRANSFORMATION Our first step is to convert our rstatistic via Fisher s z-transformation : z =0.5ln1+r 1-r Not to be confused with the Z -score, Fisher s z is lower-case. We ll also need the standard error for the sampling distribution of z: 1 SEz= n-3
FISHERS Z-TRANSFORMATION EXAMPLE Our first step is to convert our rstatistic via Fisher s z-transformation : 0.595 0.5ln1+0.534 1-0.534 Not to be confused with the Z -score, Fisher s z is lower-case. We ll also need the standard error for the sampling distribution of z: 1 24-3 0.218
CONFIDENCE INTERVAL FOR Z Because z is normally distributed (and r was not), we then use basic facts about the normal distribution to build a confidence interval. z-1.96 SEz<z <z+1.96 SEz And then convert our results back into a correlation coefficient: r =e2z-1 e2z+1
CONFIDENCE INTERVAL FOR Z EXAMPLE For our example, here is what we get: 0.595-1.96 0.218<z <0.595+1.96 0.218 0.168<z <1.023 And converting our results back into a correlation coefficient: e2(0.168)-1 e2(0.168)+1< r <e2(1.023)-1 e2(1.023)+1