Endogenous Variables in Microeconometrics

Endogenous Variables in Microeconometrics
Slide Note
Embed
Share

This presentation delves into the concept of endogeneity in microeconometrics, discussing issues like omitted variables, unobserved heterogeneity, and measurement errors in linear regression models. It explores instrumental variable estimation, using examples such as the London Cholera epidemic and the Cornwell and Rupert data on returns to schooling.

  • Microeconometrics
  • Endogeneity
  • Linear Regression
  • Instrumental Variables
  • Causal Inference

Uploaded on Mar 10, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Topics in Microeconometrics William Greene Department of Economics Stern School of Business [Topic 2-Endogeneity] 1/33

  2. Part 2: Endogenous Variables in Linear Regression [Topic 2-Endogeneity] 2/33

  3. Endogeneity y = X + , Definition: E[ |x] 0 Why not? Omitted variables Unobserved heterogeneity (equivalent to omitted variables) Measurement error on the RHS (equivalent to omitted variables) Structural aspects of the model Endogenous sampling and attrition Simultaneity (?) [Topic 2-Endogeneity] 3/33

  4. Instrumental Variable Estimation One problem variable the last one yit = 1x1it + 2x2it+ + KxKit + it E[ it|xKit] 0. (0 for all others) There exists a variable zit such that E[xKit| x1it, x2it, , xK-1,it,zit] = g(x1it, x2it, , xK-1,it,zit) In the presence of the other variables, zit explains xit E[ it| x1it, x2it, , xK-1,it,zit] = 0 In the presence of the other variables, zit and it are uncorrelated. A projection interpretation: In the projection XKt = 1x1it,+ 2x2it+ + k-1xK-1,it + K zit, K 0. [Topic 2-Endogeneity] 4/33

  5. The First IV Study: Natural Experiment (Snow, J., On the Mode of Communication of Cholera, 1855) http://www.ph.ucla.edu/epi/snow/snowbook3.html London Cholera epidemic, ca 1853-4 Cholera = f(Water Purity,u)+ . Causal effect of water purity on cholera? Purity=f(cholera prone environment (poor, garbage in streets, rodents, etc.). Regression does not work. Two London water companies Lambeth Southwark Main sewage discharge River Thames Paul Grootendorst: A Review of Instrumental Variables Estimation of Treatment Effects http://individual.utoronto.ca/grootendorst/pdf/IV_Paper_Sept6_2007.pdf [Topic 2-Endogeneity] 5/33

  6. IV Estimation Cholera=f(Purity,u)+ Z = water company Cov(Cholera,Z)= Cov(Purity,Z) Z is randomly mixed in the population (two full sets of pipes) and uncorrelated with behavioral unobservables, u) Cholera= + Purity+u+ Purity = Mean+random variation+ u Cov(Cholera,Z)= Cov(Purity,Z) [Topic 2-Endogeneity] 6/33

  7. Cornwell and Rupert Data Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years Variables in the file are EXP WKS OCC IND SOUTH SMSA MS FEM UNION ED LWAGE = work experience = weeks worked = occupation, 1 if blue collar, = 1 if manufacturing industry = 1 if resides in south = 1 if resides in a city (SMSA) = 1 if married = 1 if female = 1 if wage set by union contract = years of education = log of wage = dependent variable in regressions These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The data were downloaded from the website for Baltagi's text. [Topic 2-Endogeneity] 7/33

  8. Specification: Quadratic Effect of Experience [Topic 2-Endogeneity] 8/33

  9. The Effect of Education on LWAGE 2 2 EXP EXP = + 1 + + + + ... LWAGE EDUC EXP 2 3 4 What is ? Abil ity, Motivation se ,... + everything el = f( , , , Ability, Motivation ,...) EDUC GENDER SMSA SOUTH [Topic 2-Endogeneity] 9/33

  10. What Influences LWAGE? = + 1 ( , Ability, Motivation EXP EXP ,...) LWAGE EDUC X 2 2 2 + + + + ion Increased ( , EDUC X ... ) EXP 3 4 Ability, Motivat ( is associated with increases in Ability Ability, Motivati , on ...) and ( What looks like an effect due to increase in be an increase in . The estimate of the effect of and the hidden effect of EDUC Ability, Motivation EDUC ) may 2 Ability picks up Ability . [Topic 2-Endogeneity] 10/33

  11. An Exogenous Influence = + + + + + ( Ability, Motivation ( , , C X Abili ty, Motivation ,...) LWAGE EDU Z 1 2 2 2 EXP EXP ... ) EXP 3 4 Increased is asso ( , , Z EDUC X An effect due to the effect of an increase on only be an increase in the effect of only. Z is an Instrumental Variable ciate d with increases in , ti . n . o .) and not ( Z Ability, Motiva Ability, Motiv EDUC ation ) will Z 2 . The estimate of EDUC picks up ED UC [Topic 2-Endogeneity] 11/33

  12. Instrumental Variables Structure LWAGE (ED,EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) ED (MS, FEM) Reduced Form: LWAGE[ ED (MS, FEM), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ] [Topic 2-Endogeneity] 12/33

  13. Two Stage Least Squares Strategy Reduced Form: LWAGE[ ED (MS, FEM,X), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ] Strategy (1) Purge ED of the influence of everything but MS, FEM (and the other variables). Predict ED using all exogenous information in the sample (X and Z). (2) Regress LWAGE on this prediction of ED and everything else. Standard errors must be adjusted for the predicted ED [Topic 2-Endogeneity] 13/33

  14. OLS [Topic 2-Endogeneity] 14/33

  15. The weird results for the coefficient on ED happened because the instruments, MS and FEM are dummy variables. There is not enough variation in these variables. [Topic 2-Endogeneity] 15/33

  16. Source of Endogeneity LWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + ED = f(MS,FEM, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u [Topic 2-Endogeneity] 16/33

  17. Remove the Endogeneity LWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u + LWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u + Strategy Estimate u Add u to the equation. ED is uncorrelated with when u is in the equation. [Topic 2-Endogeneity] 17/33

  18. Auxiliary Regression for ED to Obtain Residuals [Topic 2-Endogeneity] 18/33

  19. OLS with Residual (Control Function) Added 2SLS [Topic 2-Endogeneity] 19/33

  20. A Warning About Control Functions Sum of squares is not computed correctly because U is in the regression. A general result. Control function estimators usually require a fix to the estimated covariance matrix for the estimator. [Topic 2-Endogeneity] 20/33

  21. Endogenous Dummy Variable Y = x + T + (unobservable factors) T = a dummy variable (treatment) T = 0/1 depending on: x and z The same unobservable factors T is endogenous same as ED [Topic 2-Endogeneity] 21/33

  22. Application: Health Care Panel Data Application: Health Care Panel Data German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods Variables in the file are Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). Note, the variable NUMOBS below tells how many observations there are for each person. This variable is repeated in each row of the data for the person. (Downloaded from the JAE Archive) DOCTOR = 1(Number of doctor visits > 0) HOSPITAL = 1(Number of hospital visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years MARRIED = marital status EDUC = years of education [Topic 2-Endogeneity] 22/33

  23. A study of moral hazard Riphahn, Wambach, Million: Incentive Effects in the Demand for Healthcare Journal of Applied Econometrics, 2003 Did the presence of the ADDON insurance influence the demand for health care doctor visits and hospital visits? For a simple example, we examine the PUBLIC insurance (89%) instead of ADDON insurance (2%). [Topic 2-Endogeneity] 23/33

  24. Evidence of Moral Hazard? [Topic 2-Endogeneity] 24/33

  25. Regression Study [Topic 2-Endogeneity] 25/33

  26. Endogenous Dummy Variable Doctor Visits = f(Age, Educ, Health, Presence of Insurance, Other unobservables) Insurance = f(Expected Doctor Visits, Other unobservables) [Topic 2-Endogeneity] 26/33

  27. Approaches (Parametric) Control Function: Build a structural model for the two variables (Heckman) (Semiparametric) Instrumental Variable: Create an instrumental variable for the dummy variable (Barnow/Cain/ Goldberger, Angrist, Current generation of researchers) (?) Propensity Score Matching (Heckman et al., Becker/Ichino, Many recent researchers) [Topic 2-Endogeneity] 27/33

  28. Heckmans Control Function Approach Y = x + T + E[ |T] + { - E[ |T]} = E[ |T] , computed from a model for whether T = 0 or 1 Magnitude = 11.1200 is nonsensical in this context. [Topic 2-Endogeneity] 28/33

  29. Instrumental Variable Approach Construct a prediction for T using only the exogenous information Use 2SLS using this instrumental variable. Magnitude = 23.9012 is also nonsensical in this context. [Topic 2-Endogeneity] 29/33

  30. Propensity Score Matching Create a model for T that produces probabilities for T=1: Propensity Scores Find people with the same propensity score some with T=1, some with T=0 Compare number of doctor visits of those with T=1 to those with T=0. [Topic 2-Endogeneity] 30/33

  31. Difference in Differences With two periods, i2 i1 Consider a "treatment, D ," that takes place between time 1 and time 2 for some of the individuals y = + ( ) + D + u D = the "treatment dummy" y = y -y = + ( x - x ) + u it i2 i1 0 i i x i 0 i 1 i i i This is a linear regression model. If there are no regressors, = y|treatment - y|control = "difference in differences" estimator. Average change in y for the "treated" = 1 0 i [Topic 2-Endogeneity] 31/33

  32. Difference-in-Differences Model With two periods and strict exogeneity of D and T, + 0 + + + y = D = dummy variable for a treatment that takes place between time 1 and time 2 for some of the individuals, T = a time period dummy variable, 0 in period 1, 1 in period 2. D T TD it 1 it 2 t 3 t it it it t This is a linear regression model. If there are no regressors, Using least squares, b (y = 1 D 1 y ) (y 1 D 0 y ) = = 3 2 2 [Topic 2-Endogeneity] 32/33

  33. Difference in Differences + + + = + + + y |D + + + = y = D D D T ( ( D T it x , t 1,2 it y = 0 1 it 2 t 3 t it it = y |D + + u x ) ) 0 it 2 3 i 2 it it x 2 3 i 2 it i ( If the same individual is observed in both states the second term is zero. If the effect is estimated by averaging individuals with D = 1 and different individuals with D=0, then part of the 'effect' is explained by change in the covariates, not the treatment. ) ( ) = 1 it it = + = ( = ( x |D 1) x |D 0) 3 it it , [Topic 2-Endogeneity] 33/33

More Related Content