Least Squares Regression in Data Analysis

3 2 least squares regression n.w
1 / 18
Embed
Share

Learn about least squares regression, a statistical method used to model the relationship between variables. Discover how to calculate the regression line, interpret coefficients, and make predictions. Explore examples and applications to enhance your data analysis skills.

  • Regression Analysis
  • Data Modeling
  • Predictive Analytics

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. 3.2: LEAST-SQUARES REGRESSION

  2. REGRESSION LINE Regression Line A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. The line is often to predict values of y for given values of x. Regression, unlike correlation, requires an explanatory/response relationship. In other words, when x and y are reversed, the regression line changes. Recall that correlation is the same no matter which variable is x and which is y. Least-Squares Regression Line The least-squares regression line is the line that makes the sum of the squares of the vertical distances from the data points to the line as small as possible. FORM: ? = ? + ?? ?read as y hat DEFINE Variables: x=(explanatory variable) ? = predicted (response variable)

  3. EQUATION OF THE LEAST -SQUARES REGRESSION LINE To find the equation of the regression line in the form , where a is the y-intercept and b is the slope, use the following equations: Standard dev. of y a= intercept of regression line Mean of y ? = ??? and ? = ? ? ? ?? b= slope of regression line Standard dev. of x Mean of x

  4. EX 6: FIND THE LEAST-SQUARES REGRESSION LINE FOR THE DATA ON SANDWICH SALES IN OCTOBER. Hours in a shift: 5 6 7 7 6.5 2 Sandwiches Sold: 52 60 71 70 67 21 Step 1: Enter data into L1 and L2 Step 2: Run linear regression

  5. Step 3: write out equation, define variables and show work ? = 1.46 + 9.92? TI-84 linreg(a+bx) x=number of hours in a shift ? = predicted number of sandwiches sold Use the least-squares regression line to predict the number of sandwiches sold in a 10 hr shift. ? = 1.46 + 9.92 10 ? = 100.68 sandwiches

  6. USING YOUR LSRL TO MAKE PREDICTIONS 80 Interpolation: When you use the LSRL to predict a y-value that corresponds to an x-value in the domain. 70 60 # of Sandwiches Sold 50 Extrapolation: When you use the LSRL to make a prediction outside the domain. 40 30 20 Don t make predictions using values of x that are much larger or much smaller than those that actually appear in your data--- The model may not hold. 10 0 0 1 2 3 4 5 6 7 8 Hours in shift

  7. EX 7:ALLISONS KILLS IN VOLLEYBALL FIND AND INTERPRET THE COEFFICIENT OF DETERMINATION. Hrs practice/week: 7 2 4 5 6 0 Kills per game: 17 4 13 8 15 3 Enter data into list 1 and 2. Run the LSRL ( stat then CALC then #8) 81% of the variation in Allison s kills is accounted for by the linear model relating Allison s kills to amount of hours spent practicing. MAGIC Sentence: ___% of the variation in [Response Variable] is accounted for by the linear model relating [Response Variable] to [Explanatory Variable]

  8. RESIDUALS A residual is the difference between an observed value of y and the value predicted by the regression line. That is, residual = actual y - predicted y residual = ? ? Standard deviation of the residuals (s) gives the approximate size of a typical prediction error (residual). It is the typical distance that the actual values are from the expected values how far we are typically off by when using the LSRL to make predictions

  9. EX 8: PLOT THE DATA, FIND THE LSRL, AND FIND THE RESIDUALS FOR ALLISON S VOLLEYBALL KILLS. THEN CALCULATE THE STANDARD DEVIATION OF THE RESIDUALS. Step 1: Enter data into lists and run regression Step 2: Find predicted y values Write the LSRL in L3 using L1 as your x ? = 1.88 + 2.03? TI-84 linreg(a+bx) x=number of hours practiced ? = predicted number of kills per game These are your predicted y values

  10. Step 3: Find residuals In L4: actual y-predicted y (L2-L3) ? ? ? RESIDUALS-how far the data points are from the LSRL

  11. CALCULATE THE STANDARD DEVIATION OF THE RESIDUALS. sum L42 ? 2 ? = 31.97 6 2 ? = Interpretation: When using the LSRL to make predictions, we are typically off by 2.83 kills per game. ? = 2.83

  12. RESIDUAL PLOT A residual plot is a scatterplot of each x-value and its residual value. The residual plot is used to determine whether a linear equation is a good model for a set of data, as follows: If the residual plot exhibits randomness, then a line is a good model for the data (see left) If the residual plot exhibits a pattern, then a line is NOT a good model for the data (right) This is what we want Residuals X-values Outliers and Influential Points A point that lies outside the overall pattern of the other observations is considered an outlier. If the removal of such a point has a large effect on the correlation and/or regression, that point is considered an influential point.

  13. EX 9: MAKE A RESIDUAL PLOT FOR THE RESIDUALS YOU FOUND IN EX 8. Make a scatterplot using L1 as the x-values and L4 as the y- values There is no left over pattern so the line is a good model

  14. EX10 Many people believe that students learn better if they sit closer to the front of the classroom. To investigate, an AP statistics teacher randomly assigned students to seat locations in his classroom for a particular chapter. At the end of the chapter, he recorded the row number (Row 1 is closest to the front) and test score for each student. Least- squares regression was performed on the data, and the computer output is shown below: Intercept (a) Predictor Constant Row S = 10.0673 Coef 85.706 -1.1771 SECoef 4.239 0.9472 T 20.22 -1.18 P 0.000 0.248 This is called a mini-tab output Slope (b) R-Sq = 4.7% R-Sq(adj) = 1.3% Standard dev. ?2

  15. EX10 A-B Predictor Constant Row S = 10.0673 Coef 85.706 -1.1771 SECoef 4.239 0.9472 T 20.22 -1.18 P 0.000 0.248 R-Sq = 4.7% R-Sq(adj) = 1.3% a) What is the equation of the least-squares regression line that describes the relationship between row number and test score? ? = 85.706 1.1771? ? =row number ? = predicted test score b) Interpret the slope of the regression line in context. For every row you move back, predicted student scores drop by 1.1771 points.

  16. EX10 C-E Predictor Constant Row S = 10.0673 Coef 85.706 -1.1771 SECoef 4.239 0.9472 T 20.22 -1.18 P 0.000 0.248 c) Find the correlation. ?2= .047 ? = .047 = .217 Neg b/c the slope is neg R-Sq = 4.7% R-Sq(adj) = 1.3% d) Interpret the value of s in this setting. When using the LSRL to make predictions, we will typically be off by 10.0673 points e) What percent of the variation in test scores is accounted for by the straight-line relationship with which row students sat in for the chapter? 4.7% of the variation in test scores is accounted for by the linear model relating test scores to row assignment.

  17. EX 11 In the previous example, we investigated the relationship between test scores and seat location. The mean and standard deviation of the row numbers are = 4.033 and = 1.974. The mean and standard deviation of the test scores are = 81.2 and = 10.135. The correlation between row number and test score is r = -0.218. (Note that this value is slightly different than the previous example because of rounding in the computer output.) Find the equation of the least-squares regression line for predicting test score from row number. Show your work. ?? ?? Find a: ? = ? ? ? Find b first: ? = ? ? = 81.2 4.033 1.119 = 85.71 10.135 1.974 ? = .218 = 1.119 ? = 85.71 1.119? x = row number ? = predicted test score

  18. Read through the rest of the notes on your own

More Related Content