Advanced Modeling Techniques for Regression Analysis

advanced modeling multiple independent variables n.w

1 / 23

Embed Share

Explore the intricacies of multiple regression analysis, including defining partial regression coefficients, handling collinearity, and understanding the impact of binary variables. Dive into examples and learn how to interpret results effectively.

lies990 Follow

Uploaded on May 10, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

ADVANCED MODELING: Multiple Independent Variables, Binary Variables, and Collinearity

MULTIPLE REGRESSION ANALYSIS: TWO PREDICTOR VARIABLES Model: Yi = 0+ 1 Xi1 + 2 Xi2 + i Response Surface: E(Yi)= 0+ 1 Xi1 + 2 Xi2

MULTIPLE REGRESSION ANALYSIS: DEFINITIONS 1. 1 and 2 are partial regression coefficients. 2. 1 indicates the change in E(Y) given a one unit increase in X1 when X2 is held fixed. 2 is similarly defined. 3. Contrast to simple regression coefficient in which other variables are ignored. This can be a major problem when X1 and X2 are correlated, or the effect of X1 depends on the level of X2

EXAMPLE REGRESSION (DATA FILE IS LAND) Landval Improval Salepric (Y) 5,600 5,600 8,600 5,600 6,200 5,500 7,020 5,000 2,000 2,000 2,500 4,000 4,300 3,600 3,600 3,600 3,630 3,490 3,450 3,490 2,500 2,500 2,500 3,000 19,410 21,040 35,560 22,020 31,640 22,360 40,790 56,520 7,340 9,130 13,300 13,630 22,070 18,060 19,220 14,970 19,200 14,680 21,630 19,100 10,680 10,010 10,520 13,640 44,000 45,000 69,900 44,000 70,300 48,200 80,000 97,500 25,000 27,600 39,000 40,500 50,000 39,500 41,500 32,500 44,500 32,500 43,000 38,000 28,500 26,000 26,000 30,500

PARTIAL REGRESSION COEFFICIENTS (DATA FILE IS LAND)

SIMPLE REGRESSION COEFFICIENTS: LANDVAL = X The coefficient for Landval is 8 times larger than it should be. Moreover, you cannot get the correct value for the coefficient by increasing sample size. You would just get more and more confident about the wrong answer.

SIMPLE REGRESSION COEFFICIENTS: IMPROVAL = X The coefficient for Improval is larger than it should be. It is picking up the effect of both land value and improvement value.

GENERAL MODEL Model Point Estimators 0 j Yi= 0+ 1Xi1+ 2Xi2+.....+ kXik+ i for j = 1, ..., k Response Surface * Referred to as B on SPSS output E(Yi) = 0+ 1Xi1+ 2Xi2+.....+ kXik Standard Errors S j * Referred to as Std Error on SPSS output

GENERAL MODEL: TESTS OF INTEREST 1. All regression coefficients ( 1, ..., k) are equal to zero. That is, Y is not related to any of the X s. Null Hypothesis (Ho): 1 = 2= = k = 0 Alternate Hypothesis (Ha): at least one j 0 Mean Square (MSR) Mean Square Error (MSE) F = * Referred to as F on SPSS output * Associated p-value is labeled Sig * If the p-value for the F-test of overall significance test is less than your significance level (p-value < 0.05, level is 0.05), you can reject the null-hypothesis. In other words, if the p-value of the overall F-test is significant, your regression model predicts the response variable better than the mean of the response.

GENERAL MODEL: TESTS OF INTEREST (cont.) 2. An individual j is equal to zero. That is, an individual X is not related to Y given the other X variables that are in the model. Ho: j = 0 (no relationship / X is not related to Y at all) Ha: j 0 t is the ratio of the coefficient ( j) to the standard error of the coefficient ( ) j S * Referred to as t on SPSS output * Alternatively, a 100(1- )% confidence interval on j is S j + tn-(k+1), /2 j * The larger the absolute value of the t-value, the smaller the p-value, and the greater the evidence against the null hypothesis.

GENERAL MODEL: COEFFICIENT OF MULTIPLE DETERMINATION Perfect Relationship R2 = SS(Model) / SS(Total) = 1 ( e2/ y2) No Relationship Measures the proportionate reduction in the total variance of Y that is associated with the use of all the predictor variables in the model

BINARY VARIABLES AND INTERACTIONS

INDICATOR VARIABLES Indicator or dummy variables are used to code Qualitative or Categorical variables such as gender, ethnic group, industry classification, or season. It is easiest to use the Transform tab on the SPSS menu. There is an option to Create Dummy Variables Quarterly and monthly seasonal dummy variables already have been created in your database.

ADDITIVE SEASONAL REGRESSION MODELS Example Suppose we want to model EXHST as a function of the yearly quarter. 2. Allocated Code X = 1 if January-March = 2 if April-June = 3 if July-September = 4 if October-December 1.

ADDITIVE SEASONAL REGRESSION MODELS (CONT.) Model: E(Y)= 0+ 1X1 This model implies: E(Y)= 0+ 1 in Quarter 1 (January March) E(Y)= 0+ 2 1 in Quarter 2 (April June) E(Y)= 0+ 3 1 in Quarter 3 (July September) E(Y)= 0+ 4 1 in Quarter 4 (October December) Note that the model forces differences in EXHST between the quarters to be in increments of 1. In addition, the allocated code forces an implicit ordering among the quarters with regard to EXHST. Does this make sense?

Plot of Mean Exhaust by Quarter Not a linear relationship

ADDITIVE SEASONAL REGRESSION MODELS (CONT.) 3. Indicator Variables For a qualitative variable with c categories of classification, define c-1 indicator variables of the form ?? = 1 if in category i; 0 otherwise for i = 1, ..., c-1 In our example, c=4 and we define Q1 = 1 if quarter 1; 0 otherwise Q2 = 1 if quarter 2; 0 otherwise Q3 = 1 if quarter 3; 0 otherwise Quarter 4 is not represented by a variable and is referred to as the base category.

ADDITIVE SEASONAL REGRESSION MODELS (CONT.) Model: E(Y)= 0+ 1Q1+ 2Q2+ 3Q3 This model implies: E(Y)= 0+ 1 in Quarter 1 E(Y)= 0+ 2 in Quarter 2 E(Y)= 0+ 3 in Quarter 3 E(Y)= 0 in Quarter 4 1 represents the difference in average EXHST between quarter 1 and quarter 4. 2 represents the difference in average EXHST between quarter 2 and quarter 4. 3 represents the difference in average EXHST between quarter 3 and quarter 4. 1 = 2 = 3 = 0 implies the average for EXHST is equal for all quarters. That is, there is no quarterly effect. What does 1 - 2 represent?

B1 B2 is the difference in the exhaustion rates between quarter 1 and quarter 2

Attempt to Run with Four Quarterly Dummies

MODELS WITH BOTH QUANTITATIVE AND QUALITATIVE PREDICTORS 1. No interaction example Y = EXHST Q1, Q2, Q3 = Dummy variables to represent quarter X4=UNEMPLOYMENT_RATE

MODELS WITH BOTH QUANTITATIVE AND QUALITATIVE PREDICTORS Model: E(Y)= 0+ 1Q1+ 2Q2+ 3Q3+ 4X4 This model implies: E(Y)=( 0+ 1)+ 4X4 in Quarter 1 E(Y)= ( 0+ 2)+ 4X4 in Quarter 2 E(Y)= ( 0+ 3)+ 4X4 in Quarter 3 E(Y)= ( 0)+ 4X4 in Quarter 4 EXHST *Note that the slope is the same in each quarter in this model. UNEMPLOYMENT RATE