
Econometrics in Practice: Problem of Endogenous Regressors Monte Carlo Simulation
Explore a Monte Carlo simulation of endogenous regressors in practice, focusing on Paweł and Gawe's study habits and the econometrician's perspective in estimating a linear regression equation based on their data generating process.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Econometrics in Practice Problem of Endogenous Regressors: Monte Carlo simulation Andrzej Tor j
Pawe & Gawe problem (1) Pawe Pawe and Gawe shared a common student apartment, in which it is sometimes difficult to focus. Gawe lived in one house. Or, in fact, they Once upon a time, Pawe failed to pass an exam in the first session, and this made him reflect. Conclusions? His level of academic activity in the preceding week was low. Then he asked himself a question: what weekly weekly amount amount of of time time that what determines determines the devote to learning? to learning? the that I I devote www.sgh.waw.pl
Pawe & Gawe problem (2) After a dilligent analysis, Pawe further concluded that is own study time in week t denote it as ?? depends on two factors: Gawe s Gawe s study study time time (??). Pawe noticed that, when Gawe is studying, he starts studying more frequently as well. And conversely: when Gawe is partying, watches films or procrastinates (by prioritizing cleaning or cooking), Pawe instinctively joins him in that more often. The The time time of of Pawe s Pawe s girlfriend girlfriend visits visits (???). She is a brilliant student and when she is learning, he does the same under her highly motivational supervision. www.sgh.waw.pl
Econometricians perspective Pawe reached to handbooks of econometrics and, based on his reflections, specified a linear regression equation: ??= ?0+ ?1??+ ?2???+ ??? All that is left is to collect the data and call a friend who studies econometrics, asking him to use this data to estimate ?0, ?1and ?2. The econometritian thinks this might be trivial. With a linear linear regression regression model model, it is sufficient to use ordinary estimator estimator, to receive parameter estimates. The student of econometrics heard in a lecture that by the Gauss Gauss- -Markov Markov theorem theorem this guarantees an unbiased efficient efficient estimation. ordinary least least squares squares by the consistent and unbiased, , consistent and www.sgh.waw.pl
Data generating process (1) DGP usually remains unknown to the econometrician. Here, as a matter of exception, we take insights into the knowledge of Guardian Angels of Pawe and Gawe , who know more: If Gawe studies one additional hour, Pawe will follow him in 7 in 10 cases (?1= 0,7). When the girlfriend of Pawe visits him, they devote half of the time to studying (?2= 0,5). The studying time of Pawe sometimes deviates from this pattern there are exceptions due to other, less impactful circumstances that are more difficult to measure, but the standard deviation is relatively low and amounts to 1 hour per week (??= 1). These deviations (???) have identical normal distributions with expected value 0 and are independent of one another (in various weeks): ???~? 0,?? 2 ?.?.?. www.sgh.waw.pl
Data generating process (2) BUT: BUT: what He also happens to be influenced by Pawe , encouraged or discouraged from learning. Gawe also has a girlfriend (her visit time: ???), who in contrast to Pawe s girlfriend loves parties and distracts Gawe from learning. This description of Gawe s study time is not perfect, either, since deviations of identical size and other characteristics (???) occur. We additionally assume these deviations to be independent of those observed for Pawe . THE WHOLE THE WHOLE TRUTH TRUTH is as follows: ??= ?0+ ?1??+ ?2???+ ???, ??= ?0+ ?1??+ ?2???+ ???, ???~? 0,?? where, in addition: ?1= 0,5, ?2= 0,3 i ??= 1. what if if Gawe Gawe told told HIS version of HIS version of this this story? story? 2 ?.?.?. 2 ?.?.?., ???~? 0,?? ? ??????= 0 www.sgh.waw.pl
Data generating process (3) To complete this story, there are yet some additional previously unknown facts from the Guardian Angels of both girls. Each of them (???, ???) visits Pawe / Gawe 10 Both gentleman study 20 hours a week, which leads to ?0= 1 and ?0= 7. ???, ??? are independent, Poisson-distributed variables. The girls come whenever they want and stay as long as they want. They do not take into account in these decisions any circumstances that would depend on Pawe s or Gawe s study time. 10 hours hours a a week week, on average. The econometrician usually assumes this, including an explanatory variable in an equation. He considers this variable as exogenous exogenous. www.sgh.waw.pl
Data generating proces versus econometricians perspective After writing both equations in the matrix form and solving for the vector ?? ?? (reduced reduced form form): reduced form error term 1 1?0 1??? ?2 0 0 ?2 ?? ?? 1 ?1 1 1 ?1 1 ??? ??? = + ??? ?1 ?0 ?1 The econometrician took, however, only part of this story into account: ??= ?0+ ?1??+ ?2???+ ??? www.sgh.waw.pl
Back to Gauss-Markov theorem assumptions Estimator ?????= ??? 1??? is unbiased when ?(?????) = ?. ?(?????) = ? ??? 1??? = = ? ??? 1???? + ? = ? ??? 1???? + ??? 1??? = = ? + ? ??? 1??? = The assumption ? ??? would mean that the error term in the data generating process is uncorrelated to any of the regressors. www.sgh.waw.pl
Why is the econometrician wrong? The student of econometris who works on this case heard in the lecture that by Gauss by Gauss- -Markov Markov theorem theorem OLS guarantees that the estimation is unbiased unbiased, , consistent consistent and and efficient efficient. Reality Reality? ? Let us substitute the first equation (??= ) into the second one, solving out ??. ?? = = Assumptions Assumptions? ? One of them is that the error term error term is is uncorrelated to to any any of the of the explanatory variables variables. That is, ??? uncorrelated to ?? nor ???. ? ?????= 0 the uncorrelated explanatory ?= ?0+ ?1?0+ ?1??+ ?2???+ ???+ ?2???+ ?? ?0+ ?1?0 + ?1?1??+ ?2???+ ?1?2???+ ?1???+ ?? ? ??=?0+ ?1?0 1 ?1?1 ?2 ?1?2 1 ?1?1???+ ?1 1 ? 1 ?1?1???+ + 1 ?1?1???+ 1 ?1?1?? ?1 2 ? ?????= 1 ?1?1?? www.sgh.waw.pl
How to fix this error? In an ideal case, we have ? ?????= 0when ??=?0+ ?1?0 1 ?1?1 ?2 ?1?2 1 ?1?1???+ ?1 1 But we don t know ???, ?1, ?1. ? 1 ?1?1???+ + 1 ?1?1???+ 1 ?1?1?? The realistic case: Remove the entire reduced-form error. ??=?0+ ?1?0 1 ?1?1 ?2 ?1?2 1 ?1?1???+ ?1 1 ? 1 ?1?1???+ + 1 ?1?1???+ 1 ?1?1?? This can be achieved by estimating the second equation of the model s reduced form, using OLS: ?0+ ?1?0 1 ?1?1 ?2 ?1?2 1 ?1?1 ??= + ???+ ??? 1 ?1?1 www.sgh.waw.pl
Two-stage least squares (2SLS) Step 2 Step 2: estimate the structural equation of interest (below), but modified: Step 1 Step 1: estimate the reduced form equation, in which the endogenous regressor in our structural equation of interest is put in the role of dependent variable (in our case: ??). From this equation, we obtain the theoretical value of ?? (see the previous slide). ??= ?0+ ?1 ??+ ?2???+ ??? When we focus on a single equation, rather than the full multi-equation specification, we refer to this method as Instrumental Variable (IV) method. Otherwise, 2SLS can be viewed as a special case of IV, with ???being an instrument suggested by the full specification. www.sgh.waw.pl
Monte Carlo simulation Data Data generating generating process as in the considered example. Let s look at its S=200 (As if an econometrician had 200 samples.) Sample sizes: N=100 N=100, N=1000 N=1000, N=10000 N=10000 N=100000 N=100000. For each sample, compute: ????? ????? process: equations and parameter values S=200 realizations. www.sgh.waw.pl
Distribution of estimators (1) Parameter OLS 2SLS true value ESTIMATOR VARIANCE? in both cases declines as N grows CONSISTENCY? only for 2SLS the distribution shrinks around the true parameter value as N grows www.sgh.waw.pl
Distribution of estimators(2) Parameter OLS 2SLS true value OLS 2SLS true value www.sgh.waw.pl
UNBIASEDNESS vs CONSISTENCY Estimator... Unbiased Biased in finite samples, but asymptotically unbiased Biased (also asymptotically) Decreasing variance with growing N Consistent (and unbiased) Consistent (and biased, but asymptotically unbiased) Inconsistent (and biased) e.g. OLS as an estimator of parameters ? under Gauss- Markov assumptions fulfilled e.g. OLS as an estimator of parameters ? under an endogenous regressor e.g. estimator of variance 1 ? ?=1 ? ?? ?2 Not decreasing variance with growing N Inconsistent (but unbiased) Inconsistent (and biased, but asymptotically unbiased) Inconsistent (and biased) e.g. first sampled element ?1 (or the last one, ??) as an estimator of the mean e.g. ?1+ ? or ??+ ? (for ? = ????? > 0) as an estimator of the mean 1 ? or ??+ 1 ? as an e.g. ?1+ estimator of the mean www.sgh.waw.pl
2SLS vs IV ??= ?0+ ?1??+ ?2???+ ??? ??= ?0+ ?1??+ ?2???+ ??? Using 2SLS for the first equation in this model is equivalent to using IV with DG as the additional instrument for G. When the number of endogenous regressors in equation 1 is equal to the numer of exogenous variables outside equation 1, we say that equation 1 is just identified. When the latter group is bigger overidentified. www.sgh.waw.pl
Tests (and implementations in R ivreg) Weak Weak instruments instruments Are instruments really linked to the endogenous explanatory variable? H0: no Wald restriction test comparing model 1 (G depending depending on DP) a more general model 2 (G 2 (G depending depending on DP and DG). and DG). Endogeneity Endogeneity (Wu Is the suspicion of regressor endogeneity justified? H0: no Wald restriction test comparing model 1 model 1 ??= ?0+ ?1??+ ?2???+ ??? with model 2 model 2 in which step 1 OLS residuals are included as as an additional regressor. (Wu- -Hausman) Hausman) Orthogonality Orthogonality of instruments instruments ( (Sargan Does overidentification lead to (significant) mutually exclusive results if one thinks of multiple just-identifying subsets of instruments? H0: no Estimate eq. 1 with 2SLS, compute residuals ??? (with actual values of ?? in theoretical value computation), and then ?2 form the regression of these variables on all exogenous and instrumental variables. Statistic ? = ? ?2~?2(?) (where k=number of overidentifying restrictions, N=number of observations). For overidentified specification only (which is not our case). of overidentifying overidentifying Sargan) ) model 1 (G on DP) with model on DP www.sgh.waw.pl
Concluding remarks In practice, the elimination of endogeneity problem is the art of being intelligently (and reasonably) suspicious towards the regressors. Some speak of endogeneity police... Frequently, the dispute cannot be resolved on statistical grounds and remains in the theoretical domain of a given social phenomenon (when the DGP is uncertain). In our case, futher questions could be asked: Would it be plausible in practice to assume that the girlfriend of Pawe really does not take into account his studying workload, when planning her visits? www.sgh.waw.pl