Model Building Ideas and Testing Procedures
The concepts of model building in statistical analysis, starting with a complete 2nd-order model and refining it down to the most useful form. Learn about independent variables, quadratic and interaction terms, and adjustments to data sets. Delve into testing methods such as Global-F and Partial-F tests to evaluate the effectiveness of the model components.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Model Building Ideas Already Discussed QN Independent Variables QL Independent Variables Quadratic Terms Interaction Terms
Model Building Begin with a complete 2nd-order model and build down to the most useful (parsimonious) model. Three steps to building the complete 2nd-order model: 1. Add in all QN (linear and quadratic) terms to the model 2. Add in all QL terms to the model 3. Interact all the terms from steps 1 and 2
Changes to Data Set for HW5 Delete the Year variable (2015 or earlier/2016 or after) OR just don t use it Delete ONE of your vehicles, you will then have only two Dummy code your vehicles (0, 1) Add the quadratic term for the QN variable (mileage) Add the interactions (mileage * model; mileage sq * model)
Complete 2nd-orderModel Homework Data - one QN and one QL with two levels. x1 QN, x2 - QL(2) ?2= 1 ?? ?????? 0 ?? ??????? 1x1 + 2x12 + 3x2 Model: E(y) = 0 + + 4x1x2+ 5x12 x2 Step 1 Step 2 Step 3
Complete 2nd-orderModel Example: One QN and one QL with three levels (ex., Camaro, Mustang, Charger). ?3= 1 ?? ??????? 0 ?2= 1 ?? ?????? 0 x1 QN, x2, x3 - QL(3) ?? ??? ?? ??? Model: E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x3 Step 1 Step 2 + 5x1x2+ 6x12 x2 + 7x1x3+ 8x12 x3 Step 3
Testing the Model Global-F Test -Tests the entire model at one time. Model: E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x1x2+ 5x12 x2 1 = 2 = 3 = 4 = 5 = 0 At least one 0 Test: Ho: Ha: Test Statistic/P-value: (From Printout) Conclusion: Reject Ho. Something works in the model. Fail to Reject Ho. Stop! (Except for Homework 5. If FTR, Keep going anyway!)
Testing the Model Partial-F Test -Tests a portion of the model. Full Model: E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x1x2+ 5x12 x2 Reduced Model: E(y) = 0 + 1x1 + 3x2 + 4x1x2 Example: test the quadratic component in the model. Test: Ho: Ha: At least one 0 2 = 5 = 0 Test Statistic/P-value: (From Printout) Conclusion: Reject Ho. The quadratic terms work. Keep them. FTR Ho. Drop the quadratic terms.
Testing the Model The Partial-F test compares two models (the one that has the tested terms in it to the one that drops them out) to determine which model is better at predicting y. STATISTIX program refers to the Partial-F test as the Best Subset Regressions Test The model terms are separated into two classifications: Non-Forced: the terms we wish to test Forced: the terms that appear in both models
Testing the Model T-Test -Tests a single term in the model. Model: E(y) = 0 + 1x1 + 3x2 + 4x1x2 Reduced Model: E(y) = 0 + 1x1 + 3x2 Example: test the interaction component in the model. Test: Ho: Ha: 4 0 4 = 0 Test Statistic/P-value: (From Printout - may need to adjust) Conclusion: Reject Ho. The interaction terms work. Keep it. FTR Ho. Drop the interaction term.
Model Testing in Statistix Global F-Test Statistics Linear Models Linear Regression Fit the full model and use F-test p-value Partial F-Test Statistics Linear Models Best Subsets Regressions Terms to be tested - Non-forced variables Terms in reduced model - Forced variables T-Test Statistics Linear Models Linear Regression Fit the full model and use appropriate t-test p-value
Model Building Game Plan FTR Ho Global F-Test Model 1 Reject Ho Quadratics Test Model 1 vs. 2 Reject Ho FTR Ho Interactions Model 1 vs. 3 Model 2 vs. 4 FTR Ho FTR Ho Reject Ho Reject Ho 2 1 Model 4 vs. 6 QL Test (if needed) Model 3 vs. 5 Reject Ho FTR Ho Reject Ho FTR Ho 6 Model 4 vs. 7 QN Test (if needed) 3 5 FTR Ho Reject Ho 4 7
Model Building Game Plan FTR Ho Global F-Test Model 1 Reject Ho Quadratics Test Model 1 vs. 2 Reject Ho FTR Ho Interactions Model 1 vs. 3 Model 2 vs. 4 FTR Ho FTR Ho Reject Ho Reject Ho 2 1 Model 4 vs. 6 QL Test (if needed) Model 3 vs. 5 Reject Ho FTR Ho Reject Ho FTR Ho 6 Model 4 vs. 7 QN Test (if needed) 3 5 FTR Ho Reject Ho 4 7
Model Building - Models Model 1: E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x1x2 + 5x12x2 Model 2: E(y) = 0 + 1x1 + 3x2 + 4x1x2 Model 3: E(y) = 0 + 1x1 + 2x12 + 3x2 Model 4: E(y) = 0 + 1x1 + 3x2 Model 5: E(y) = 0 + 1x1 + 2x12 Model 6: E(y) = 0 + 1x1 Model 7: E(y) = 0 + 3x2
Model Building: Example Use the apartment data set: y = Rental Price of an Apartment x1 = Size of the Apartment x2 = 1 if located in Brandon, 0 if St. Pete Model 1: E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x1x2 + 5x12x2 Global F-Test Test: Ho: 1 = 2 = 3 = 4 = 5 = 0 Ha: At one least i not equal to 0 Need the Statistix Printout!
Model Building: Global F-Test Least Squares Linear Regression of Rent Predictor Variables Coefficient Std Error Constant 1430.37 1064.33 1.34 0.1859 0.0 Location -1087.93 1333.19 -0.82 0.4189 1307.1 Size -1.60803 2.00282 -0.80 0.4264 250.0 x1sq 0.00102 9.362E-04 1.09 0.2833 268.7 x1sqx2 -9.082E-04 0.00113 -0.80 0.4272 1389.6 x1x2 1.91086 2.46958 0.77 0.4432 5035.9 T P VIF R-Squared 0.3709 Resid. Mean Square (MSE) 16998.0 Adjusted R-Squared 0.2994 Standard Deviation 130.376 AICc 497.32 PRESS 1.82E+06 Source DF SS MS F P Regression 5 440997 88199.5 5.19 0.0008 Residual 44 747911 16998.0 Total 49 1188908 Lack of Fit 35 632537 18072.5 1.41 0.3036 Pure Error 9 115375 12819.4 Cases Included 50 Missing Cases 0
Model Building: Global F-Test Model 1: E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x1x2 + 5x12x2 Test: Ho: 1 = 2 = 3 = 4 = 5 = 0 Ha: At least one i not equal to 0 F = 5.19 p = .0008 Conclusion: At = .05, we reject Ho. There is sufficient evidence to indicate that at least one of the variables is useful for predicting rent. FOR HW 5: Regardless of whether you RTN you will continue. Next Step: Test Quadratics!
Model Building Game Plan FTR Ho Global F-Test Model 1 Reject Ho Quadratics Test Model 1 vs. 2 Reject Ho FTR Ho Interactions Model 1 vs. 3 Model 2 vs. 4 FTR Ho FTR Ho Reject Ho Reject Ho 2 1 Model 4 vs. 6 QL Test (if needed) Model 3 vs. 5 Reject Ho FTR Ho Reject Ho FTR Ho 6 Model 4 vs. 7 QN Test (if needed) 3 5 FTR Ho Reject Ho 4 7
Model Building: Quadratics Test Model 1: E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x1x2 + 5x12x2 Model 2: E(y) = 0 + 1x1 + 3x2 + 4x1x2 Test: Ho: 2 = 5 = 0 Ha: At least one i not equal to 0 Need the Statistix Printout!
Model Building: Quadratics Test Best Subset Regression Models for Rent Forced Independent Variables: (A)Location (B)Size (C)x1x2 Unforced Independent Variables: (D)x1sqx2 (E)x1sq Adjusted AICc - P Cp R Square Min AICc Resid SS F P(F) Model Variables 4 3.2 0.3115 0.00 768461 A B C 5 4.6 0.3050 1.96 758827 0.57 0.4537 A B C E 5 5.2 0.2966 2.56 767969 0.03 0.8659 A B C D 6 6.0 0.2994 3.95 747911 0.60 0.5508 A B C D E Cases Included 50 Missing Cases 0 Ho: 2 = 5 = 0 Ha: At least one i not equal to 0 F = 0.60 p = .5508 Test: Conclusion: At = .05, we fail to reject Ho. There is insufficient evidence to indicate that at least one of the quadratic terms is useful for predicting rent. Model 2 is better than Model 1. Next Step: Use Model 2 to test the remaining interaction term!
Model Building Game Plan FTR Ho Global F-Test Model 1 Reject Ho Quadratics Test Model 1 vs. 2 Reject Ho FTR Ho Interactions Model 1 vs. 3 Model 2 vs. 4 FTR Ho FTR Ho Reject Ho Reject Ho 2 1 Model 4 vs. 6 QL Test (if needed) Model 3 vs. 5 Reject Ho FTR Ho Reject Ho FTR Ho 6 Model 4 vs. 7 QN Test (if needed) 3 5 FTR Ho Reject Ho 4 7
Model Building: Interaction Test Model 2: E(y) = 0 + 1x1 + 3x2 + 4x1x2 Model 4: E(y) = 0 + 1x1 + 3x2 Test: Ho: 4 = 0 Ha: 4 0 Need Statistix Printout (t-test)!
Model Building: Interaction Test Least Squares Linear Regression of Rent Predictor Variables Coefficient Std Error T P VIF Constant 297.331 209.944 1.42 0.1634 0.0 Location -88.4936 268.289 -0.33 0.7430 53.9 Size 0.55603 0.20428 2.72 0.0091 2.6 x1x2 -0.00882 0.25918 -0.03 0.9730 56.4 R-Squared 0.3536 Resid. Mean Square (MSE) 16705.7 Adjusted R-Squared 0.3115 Standard Deviation 129.250 AICc 493.37 PRESS 911416 Source DF SS MS F P Regression 3 420448 140149 8.39 0.0001 Residual 46 768461 16706 Total 49 1188908 Lack of Fit 37 653086 17651.0 1.38 0.3180 Pure Error 9 115375 12819.4 Cases Included 50 Missing Cases 0
Model Building: Interaction Test Model 2: E(y) = 0 + 1x1 + 3x2 + 4x1x2 Model 4: E(y) = 0 + 1x1 + 3x2 Test: Ho: 4 = 0 Ha: 4 0 t = -0.03 p = .9730 Conclusion: At = .05, we fail to reject Ho. There is insufficient evidence to indicate that the interaction term is useful for predicting rent. Model 4 is better than Model 2. Next Step: Test the QL variable in Model 4!
Model Building Game Plan FTR Ho Global F-Test Model 1 Reject Ho Quadratics Test Model 1 vs. 2 Reject Ho FTR Ho Interactions Model 1 vs. 3 Model 2 vs. 4 FTR Ho FTR Ho Reject Ho Reject Ho 2 1 Model 4 vs. 6 QL Test (if needed) Model 3 vs. 5 Reject Ho FTR Ho Reject Ho FTR Ho 6 Model 4 vs. 7 QN Test (if needed) 3 5 FTR Ho Reject Ho 4 7
Model Building: QL Variable Test Model 4: E(y) = 0 + 1x1 + 3x2 Model 6: E(y) = 0 + 1x1 Test: Ho: 3 = 0 Ha: 3 0 Need Statistix Printout (t-test)!
Model Building: QL Variable Test Least Squares Linear Regression of Rent Predictor Variables Coefficient Std Error T P VIF Constant 302.920 129.412 2.34 0.0235 0.0 Location -97.5401 36.2128 -2.69 0.0098 1.0 Size 0.55055 0.12438 4.43 0.0001 1.0 R-Squared 0.3536 Resid. Mean Square (MSE) 16350.6 Adjusted R-Squared 0.3261 Standard Deviation 127.870 AICc 490.90 PRESS 865546 Source DF SS MS F P Regression 2 420428 210214 12.86 0.0000 Residual 47 768480 16351 Total 49 1188909 Lack of Fit 38 653106 17187.0 1.34 0.3349 Pure Error 9 115375 12819.4 Cases Included 50 Missing Cases 0
Model Building: QL Variable Test Model 4: E(y) = 0 + 1x1 + 3x2 Model 6: E(y) = 0 + 1x1 Test: Ho: 3 = 0 Ha: 3 0 t = -2.69 p = .0098 Conclusion: At = .05, we reject Ho. There is sufficient evidence to indicate that the qualitative term is useful for predicting rent. Model 4 is better than Model 6. Next Step: Test the QN variable in Model 4!
Model Building Game Plan FTR Ho Global F-Test Model 1 Reject Ho Quadratics Test Model 1 vs. 2 Reject Ho FTR Ho Interactions Model 1 vs. 3 Model 2 vs. 4 FTR Ho FTR Ho Reject Ho Reject Ho 2 1 Model 4 vs. 6 QL Test (if needed) Model 3 vs. 5 Reject Ho FTR Ho Reject Ho FTR Ho 6 Model 4 vs. 7 QN Test (if needed) 3 5 FTR Ho Reject Ho 4 7
Model Building: QN Variable Test Model 4: E(y) = 0 + 1x1 + 3x2 Model 6: E(y) = 0 + 3x2 Test: Ho: 1 = 0 Ha: 1 > 0 Need Statistix Printout (t-test)!
Model Building: QN Variable Test Least Squares Linear Regression of Rent Predictor Variables Coefficient Std Error Constant 302.920 129.412 2.34 0.0235 0.0 Location -97.5401 36.2128 -2.69 0.0098 1.0 Size 0.55055 0.12438 4.43 0.0001 1.0 T P VIF R-Squared 0.3536 Resid. Mean Square (MSE) 16350.6 Adjusted R-Squared 0.3261 Standard Deviation 127.870 AICc 490.90 PRESS 865546 Source DF SS MS F P Regression 2 420428 210214 12.86 0.0000 Residual 47 768480 16351 Total 49 1188909 Lack of Fit 38 653106 17187.0 1.34 0.3349 Pure Error 9 115375 12819.4 Cases Included 50 Missing Cases 0
Model Building: QN Variable Test Model 4: E(y) = 0 + 1x1 + 3x2 Model 7: E(y) = 0 + 3x2 Test: Ho: 1 = 0 Ha: 1 > 0 t = 4.43 p = .0001/2 = .00005 Conclusion: At = .05, we reject Ho. There is sufficient evidence to indicate that the quantitative term is useful for predicting rent. Model 4 is better than Model 7. Next Step: Use Model 4 as our Best Model!
Model Building Game Plan FTR Ho Global F-Test Model 1 Reject Ho Quadratics Test Model 1 vs. 2 Reject Ho FTR Ho Interactions Model 1 vs. 3 Model 2 vs. 4 FTR Ho FTR Ho Reject Ho Reject Ho 2 1 Model 4 vs. 6 QL Test (if needed) Model 3 vs. 5 Reject Ho FTR Ho Reject Ho FTR Ho 6 Model 4 vs. 7 QN Test (if needed) 3 5 FTR Ho Reject Ho 4 7
Best Model: Model 4 Least Squares Linear Regression of Rent Predictor Variables Coefficient Std Error Constant 302.920 129.412 2.34 0.0235 0.0 Location -97.5401 36.2128 -2.69 0.0098 1.0 Size 0.55055 0.12438 4.43 0.0001 1.0 T P VIF R-Squared 0.3536 Resid. Mean Square (MSE) 16350.6 Adjusted R-Squared 0.3261 Standard Deviation 127.870 AICc 490.90 PRESS 865546 R2=.3536 We can explain 35.36% of the variation in the sampled rents around their mean using the model with size and location. Standard Deviation=127.870 We expect most of the sampled prices to fall within $255.74 of their least squares predicted values. ? = 302.92 + 0.551?1 97.54?2
PI/CI for Best Model Predicted/Fitted Values of Rent Lower Predicted Bound 438.72 Predicted Value 703.08 Upper Predicted Bound 967.44 SE (Predicted Value) 131.41 Lower Fitted Bound 642.14 Fitted Value Upper Fitted Bound 764.02 SE (Fitted Value) 703.08 30.294 Unusualness (Leverage) Percent Coverage Corresponding T 0.0561 95 2.01 Case number 15 was used to estimate the regression Predictor Values: Size=904.00, Location=1.0000 We are 95% confident that the rent of a single Brandon 904 square foot apartment will fall between $438.72 and $967.44. We are 95% confident that the average rent of all Brandon 904 square foot apartments will fall between $642.14 and $764.02.
Using the Model in Practice P-value Low? Yes R2 High? Meh! Standard Deviation Low? Not great! PI/CI Narrow? Not great! We would not use this regression model in practice as it does not accurately predict the rent of an apartment. Look for additional predicting variables