
Understanding Scatter Plots and Paired Data in Statistics
Learn how to analyze paired data sets through scatter plots to identify relationships between variables. Explore examples like foot size vs height and GPA vs study time, interpreting patterns and trends for better statistical insights.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
MAT 1372 Statistics with Probability Ezra Halleck NYC College of Technology Sections 2.5 & 3.7 Paired data, scatter plots and correlation
2.5 Sets of paired data and scatter plots Given data set of paired values (ordered). Is there a relation between the 2 variables? First step: create scatter plot. Next step: study the plot. Are the data points clustered in any way? Is there a discernable pattern or do the points seem be randomly placed? Patterns come in many forms, but our focus is how closely data fit to a line.
Exercise: Drawing regression (trend) line Plot the points or print out the scatter plot for a data set Draw the line in using a straightedge. roughly half the data points should be above and below the line. You may have Excel do it and compare. With a little practice, you will soon get close to what software can do. In next session, we study the trend line more closely.
Foot size (cm) vs height (inches), college-age males 32 31 30 29 outlier 28 27 26 25 65 70 75 80 85
Foot size (cm) vs height (inches), college-age males 32 31 30 29 28 27 26 25 65 70 75 80 85
Foot size (cm) vs height (inches), college-age males 32 31 30 29 28 27 26 25 65 67 69 71 73 75 77 79 81
2.5.7 GPA vs study time A random group of 12 high school juniors asked to estimate # hours/wk they study. The grade point averages of these students were then determined. Using graph paper, plot the points and put in what you think the trend line is. Compare it to what Excel gives-> GPA vs weekly study time 3.9 3.7 3.5 3.3 3.1 2.9 2.7 2.5 0 5 10 15 20 25
2.5.5 attention span vs IQ score IQ score vs attention span in minutes 18 preschool-age children. The line is an ill-fit. We need another approach. Is there a pattern? What is the explanatory variable and what is the response variable? 150 140 130 120 110 100 90 80 0 2 4 6 8
AttentionSpan vs IQ score 8 7 6 5 4 3 2 80 90 100 110 120 130 140 150
AttentionSpan vs IQ score 8 7 6 5 4 3 2 80 90 100 110 120 130 140 150
AttentionSpan vs IQ score speculation Teacher more likely to gear instruction to middle of class. Students who are frustrated will stop paying attention (low IQ). Students who are bored will stop paying attention (high IQ).
3.7.4 Sentence vs time served sentence vs Time-served (both in months) 120 100 80 60 40 20 0 0 50 100 150 200 250
3.7 Covariance and Correlation Coefficient Given a data set of paired values (ordered), covariance is a measure of how much two variables change together: ( )( ) n x x y y i i = = cov( , ) 1 i x y 1 n Note the deviations from the mean encountered earlier. In words, cov(x, y) is the average of all the products of the 2 individual deviations from mean ix x
Perfectly correlated data and covariance If data lie on a straight line going , then cov(x, y) = If data lie on a straight line going , then cov(x, y) = In all other situations, cov(x, y) will be somewhere in between: = var( )var( ) x y x y s s = var( )var( ) x y x y s s cov( , ) x y s s x y x y s s
Covariance & Pearson correlation coefficient Recall from previous slide, cov(x, y) : Units for these expressions is product of original x & y units. We normalize by dividing each expression by cov( , ) x y s s x y x y s s cov( , ) s s x y : x y s s 1 1 x y Middle expression is the Pearson correlation coefficient: = cov( , ) s s x y x y Note: rho is a Greek letter (Roman letter r may also be used).
Equivalent expression for Pearson coefficient If we substitute the definition of covariance, we get: Let s do a dimension analysis: Suppose the x units are tons and y units are mph, then the standard deviations will have the respective units as well, so r is unitless.
Properties of correlation coefficient r r measures how far the data vary from a line: A value close to 1 or -1 indicates that the data are close to a line. For the mpg vs weight graph on last slide, r = .9 Anything close to a zero indicates no linear connection between the 2 variable components; the data may appear randomly scattered. r is invariant under a change of units. e.g., if metric tons were used instead of tons, r value of .9 would not change. Likewise if km/liter were used instead of mpg, r value would not change.
Properties of r2 Often, r2 is calculated and displayed rather than r. provides upper bound (max) on % of response variable that can be ascribed to explanatory variable. e.g., for mpg vs weight: r2=.81 so 81% of a car s mileage may be due to its weight
3.7.4 sentence vs actual time served (revisited) The following is a sampling of 10 recently (at the time of the study) released first-time federal prisoners. The data give their crime, their sentence, and the actual time that they served. Compute r and r2. What does this say about the relationship between the length of a sentence and the time actually served?
3.7.8 milk vs soda consumption among a sampling of 1st world countries The following gives yearly per capita soft drink consumption (in litres) and the yearly per capita milk consumption (in kg) for a variety of countries. Find r and r2 and discuss the correlation.