Fixed and Random Effects Models for Sociologists
Dive into the world of fixed and random effects models to analyze data methods crucial for sociologists. Explore their application, significance, and implementation in sociological research, paving the way for in-depth analysis and interpretation based on panel data.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
SC968 Panel data methods for sociologists Lecture 3 Fixed and random effects models continued
Overview Review Between- and within-individual variation Types of variables: time-invariant, time-varying and trend Individual heterogeneity Within and between estimators The implementation of fixed and random effects models in STATA Statistical properties of fixed and random effects models Choosing between fixed and random effects: the Hausman test Estimating coefficients on time-invariant variables in FE Thinking about specification
Between- and within-individual variation If you have a sample with repeated observations on the same individuals, there are two sources of variance within the sample: The fact that individuals are systematically different from one another (between- individual variation) Joe lives in Colchester, Jane lives in Wivenhoe The fact that individuals behaviour varies between observations (within-individual variation) Joe moves from Colchester to Wivenhoe
How to think about two sources of variation in panel data... Between variation Within variation i = 2) ( W x x i = 2) i ( B x x ij i j j How does an individual vary, on average, from the sample mean? How does an individual vary at any particular time point from his individual mean? W1 W2 W3 W4 W5 Person Mean 20 Jane 20 20 20 20 20 Joe 15 5 6 20 4 10 Average income for sample: 10 per year
xtsum in STATA Similar to ordinary sum command . xtset pid wave panel variable: p pi id d ( (u un nb ba al la an nc ce ed d) ) time variable: w wa av ve e, , 1 1 t to o 1 15 5, , b bu ut t w wi it th h g ga ap ps s delta: 1 1 u un ni it t . xtsum female partner age ue_sick LIKERT wave if nwaves == 15 Variable Mean Std. Dev. Min Max Observations female overall . .5 53 39 97 75 57 74 4 . .4 49 98 84 43 32 21 1 0 0 1 1 N = 16324 between . .4 49 98 89 90 05 59 9 0 0 1 1 n = 1237 within 0 0 . .5 53 39 97 75 57 74 4 . .5 53 39 97 75 57 74 4 T-bar = 13.1964 partner overall . .6 68 89 92 29 95 54 4 . .4 46 62 27 79 96 63 3 0 0 1 1 N = 16292 between . .4 42 21 17 78 84 42 2 0 0 1 1 n = 1234 within . .2 24 43 35 53 31 1 - -. .2 24 44 40 03 38 8 1 1. .6 62 22 26 62 29 9 T-bar = 13.2026 age overall 4 40 0. .0 03 33 34 49 9 1 19 9. .7 74 43 33 32 2 0 0 9 98 8 N = 19410 between 1 19 9. .2 27 72 23 38 8 6 6. .4 4 9 90 0. .9 93 33 33 33 3 n = 1294 within 4 4. .3 31 17 76 63 3 3 31 1. .3 30 00 01 15 5 5 54 4. .3 30 00 01 15 5 T = 15 ue_sick overall . .0 06 67 72 29 92 24 4 . .2 25 50 05 53 35 53 3 0 0 1 1 N = 16302 between . .1 17 73 38 89 93 38 8 0 0 1 1 n = 1237 within . .1 18 85 52 27 75 56 6 - -. .8 86 66 60 04 41 1 1 1. .0 00 00 06 62 26 6 T-bar = 13.1787 LIKERT overall 1 11 1. .2 26 61 16 67 7 5 5. .3 34 44 48 82 25 5 0 0 3 36 6 N = 15661 between 3 3. .6 60 09 96 66 65 5 0 0 2 29 9. .6 69 92 23 31 1 n = 1225 within 4 4. .0 03 30 09 97 74 4 - -6 6. .7 73 38 83 33 31 1 3 35 5. .1 12 28 83 34 4 T-bar = 12.7845 wave overall 8 8 4 4. .3 32 20 06 60 05 5 1 1 1 15 5 N = 19410 between 0 0 8 8 8 8 n = 1294 within 4 4. .3 32 20 06 60 05 5 1 1 1 15 5 T = 15
xtsum in STATA Similar to ordinary sum command . xtset pid wave panel variable: p pi id d ( (u un nb ba al la an nc ce ed d) ) time variable: w wa av ve e, , 1 1 t to o 1 15 5, , b bu ut t w wi it th h g ga ap ps s Have chosen a balanced sample delta: 1 1 u un ni it t . xtsum female partner age ue_sick LIKERT wave if nwaves == 15 All variation is between Variable Mean Std. Dev. Min Max Observations female overall . .5 53 39 97 75 57 74 4 . .4 49 98 84 43 32 21 1 0 0 1 1 N = 16324 between . .4 49 98 89 90 05 59 9 0 0 1 1 n = 1237 Most variation is between , because it s fairly rare to switch between having and not having a partner within 0 0 . .5 53 39 97 75 57 74 4 . .5 53 39 97 75 57 74 4 T-bar = 13.1964 partner overall . .6 68 89 92 29 95 54 4 . .4 46 62 27 79 96 63 3 0 0 1 1 N = 16292 between . .4 42 21 17 78 84 42 2 0 0 1 1 n = 1234 within . .2 24 43 35 53 31 1 - -. .2 24 44 40 03 38 8 1 1. .6 62 22 26 62 29 9 T-bar = 13.2026 age overall 4 40 0. .0 03 33 34 49 9 1 19 9. .7 74 43 33 32 2 0 0 9 98 8 N = 19410 between 1 19 9. .2 27 72 23 38 8 6 6. .4 4 9 90 0. .9 93 33 33 33 3 n = 1294 within 4 4. .3 31 17 76 63 3 3 31 1. .3 30 00 01 15 5 5 54 4. .3 30 00 01 15 5 T = 15 ue_sick overall . .0 06 67 72 29 92 24 4 . .2 25 50 05 53 35 53 3 0 0 1 1 N = 16302 between . .1 17 73 38 89 93 38 8 0 0 1 1 n = 1237 within . .1 18 85 52 27 75 56 6 - -. .8 86 66 60 04 41 1 1 1. .0 00 00 06 62 26 6 T-bar = 13.1787 LIKERT overall 1 11 1. .2 26 61 16 67 7 5 5. .3 34 44 48 82 25 5 0 0 3 36 6 N = 15661 between 3 3. .6 60 09 96 66 65 5 0 0 2 29 9. .6 69 92 23 31 1 n = 1225 within 4 4. .0 03 30 09 97 74 4 - -6 6. .7 73 38 83 33 31 1 3 35 5. .1 12 28 83 34 4 T-bar = 12.7845 wave overall 8 8 4 4. .3 32 20 06 60 05 5 1 1 1 15 5 N = 19410 between 0 0 8 8 8 8 n = 1294 within 4 4. .3 32 20 06 60 05 5 1 1 1 15 5 T = 15 All variation is within, because this is a balanced sample
More on xtsum. . xtset pid wave panel variable: p pi id d ( (u un nb ba al la an nc ce ed d) ) time variable: w wa av ve e, , 1 1 t to o 1 15 5, , b bu ut t w wi it th h g ga ap ps s delta: 1 1 u un ni it t Observations with non-missing variable . xtsum female partner age ue_sick LIKERT wave if nwaves == 15 Variable Mean Std. Dev. Min Max Observations female overall . .5 53 39 97 75 57 74 4 . .4 49 98 84 43 32 21 1 0 0 1 1 N = 16324 Number of individuals between . .4 49 98 89 90 05 59 9 0 0 1 1 n = 1237 within 0 0 . .5 53 39 97 75 57 74 4 . .5 53 39 97 75 57 74 4 T-bar = 13.1964 partner overall . .6 68 89 92 29 95 54 4 . .4 46 62 27 79 96 63 3 0 0 1 1 N = 16292 between . .4 42 21 17 78 84 42 2 0 0 1 1 n = 1234 within . .2 24 43 35 53 31 1 - -. .2 24 44 40 03 38 8 1 1. .6 62 22 26 62 29 9 T-bar = 13.2026 Average number of time-points age overall 4 40 0. .0 03 33 34 49 9 1 19 9. .7 74 43 33 32 2 0 0 9 98 8 N = 19410 between 1 19 9. .2 27 72 23 38 8 6 6. .4 4 9 90 0. .9 93 33 33 33 3 n = 1294 within 4 4. .3 31 17 76 63 3 3 31 1. .3 30 00 01 15 5 5 54 4. .3 30 00 01 15 5 T = 15 ue_sick overall . .0 06 67 72 29 92 24 4 . .2 25 50 05 53 35 53 3 0 0 1 1 N = 16302 between . .1 17 73 38 89 93 38 8 0 0 1 1 n = 1237 within . .1 18 85 52 27 75 56 6 - -. .8 86 66 60 04 41 1 1 1. .0 00 00 06 62 26 6 T-bar = 13.1787 LIKERT overall 1 11 1. .2 26 61 16 67 7 5 5. .3 34 44 48 82 25 5 0 0 3 36 6 N = 15661 Min & max refer to xi-bar between 3 3. .6 60 09 96 66 65 5 0 0 2 29 9. .6 69 92 23 31 1 n = 1225 within 4 4. .0 03 30 09 97 74 4 - -6 6. .7 73 38 83 33 31 1 3 35 5. .1 12 28 83 34 4 T-bar = 12.7845 wave overall 8 8 4 4. .3 32 20 06 60 05 5 1 1 1 15 5 N = 19410 between 0 0 8 8 8 8 n = 1294 within 4 4. .3 32 20 06 60 05 5 1 1 1 15 5 T = 15 Min & max refer to individual deviation from own averages, with global averages added back in.
Types of variable Those which vary between individuals but hardly ever over time Sex Ethnicity Parents social class when you were 14 The type of primary school you attended (once you ve become an adult) Those which vary over time, but not between individuals The retail price index National unemployment rates Age, in a cohort study Those which vary both over time and between individuals Income Health Psychological wellbeing Number of children you have Marital status Trend variables Vary between individuals and over time, but in highly predictable ways: Age Year
Within and between estimators Individual-specific, fixed over time Varies over time, usual assumptions apply (mean zero, homoscedastic, uncorrelated with x or u or itself) = + + + y x u it it i it mean of observatio all person for ns i = + + + y x u This is the between estimator i i i i subtractin g : And this is the within estimator fixed effects = + ( ) ( ) ( ) y y x x i i it it it i And finally, random the effects estimator a is weighted average of within the between and estimators measures the weight given to between-group variation, and is derived from the variances of ui and i = + + + ( ) 1 ( ) ( ) {( 1 ) ( )} y y x x u i i it it i it i
Individual heterogeneity: one reason to used fixed effects A very simple concept: people are different! In social science, when we talk about heterogeneity, we are really talking about unobservable (or unobserved) heterogeneity. Observed heterogeneity: differences in education levels, or parental background, or anything else that we can measure and control for in regressions Unobserved heterogeneity: anything which is fundamentally unmeasurable, or which is rather poorly measured, or which does not happen to be measured in the particular data set we are using. Time invariant heterogeneity Height (among adults) Innate intelligence Antenatal care of mother Time variant kinds of heterogeneity Social network size Beauty Weight
Unobserved heterogeneity = + 1 + 2 + 3 + + + + ......... y x x x x u 1 2 3 i i i i iK K i i Extend the OLS equation we used in Week 1, breaking the error term down into two components: one representing the time invariant, unobservable characteristics of the person, and the other representing genuine error . In cross-sectional analysis, there is no way of distinguishing between the two. But in panel data analysis, we have repeated observations and this allows us to distinguish between them.
Fixed effects (within estimator) = + + + y x u it it i it = + ( ) ( ) ( ) y y x x i i it it it i Allows us to net out time-invariant unobserved characteristics Ignores between-group variation so it s an inefficient estimator However, few assumptions are required, so FE is generally consistent and unbiased Disadvantage: can t estimate the effects of any time-invariant variables Also called least squares dummy variable model (LDV) Analysis of covariance (CV) model
Between estimator = + + + y x u it it i it = + + y x u i i i i Not much used Except to calculate the parameter for random effects, but STATA does this, not you! It s inefficient compared to random effects It doesn t use as much information as is available in the data (only uses means) Assumption required: that ui is uncorrelated with xi Easy to see why: if they were correlated, how could one decide how much of the variation in y to attribute to the x s (via the betas) as opposed to the correlation? Can t estimate effects of variables where mean is invariant over individuals Age in a cohort study Macro-level variables
Random effects estimator = + + + y x u it it i it = + + + ( ) 1 ( ) ( ) {( 1 ) ( )} y y x x u i i it it i it i Weighted average of within and between models Assumption required: that ui is uncorrelated with xi Rather heroic assumption think of examples Will see a test for this later Uses both within- and between-group variation, so makes best use of the data and is efficient But unless the assumption holds that ui is uncorrelated with xi , it is inconsistent AKA one-way error components model, variance component model, GLS estimator (STATA also allows ML random effects)
Consistency versus efficiency. Random effects clearly does worse here .. True value of betas Inconsistent but efficient Consistent but inefficient
. But arguably, random effects do a better job of getting close to the true coefficient here. True value of betas Random effects Fixed effects
Testing between FE and RE Hausman test Hypothesis H0: ui is uncorrelated with xi Hypothesis H1: ui is correlated with xi Fixed effects is consistent under both H0 and H1 Random effects is efficient, and consistent under H0 (but inconsistent under H1) . quietly xtreg LIKERT female ue_sick partner age age2 badh, fe . estimates store fixed . quietly xtreg LIKERT female ue_sick partner age age2 badh, re . hausman fixed . Sex does not appear Coefficients (b) (B) (b-B) sqrt(diag(V_b-V_B)) fixed . Difference S.E. ue_sick 1 1. .9 95 51 14 48 85 5 2 2. .0 04 45 53 30 02 2 - -. .0 09 93 38 81 17 75 5 . .0 05 57 72 28 84 45 5 partner - -. .2 29 98 86 66 68 8 - -. .1 19 94 47 76 69 91 1 - -. .1 10 03 38 89 98 89 9 . .0 06 67 77 76 69 93 3 age . .1 11 14 41 17 74 48 8 . .1 10 05 58 80 03 38 8 . .0 00 08 83 37 71 1 . .0 01 15 57 75 53 31 1 age2 - -. .0 00 01 11 18 83 33 3 - -. .0 00 01 11 10 06 62 2 - -. .0 00 00 00 07 77 71 1 . .0 00 00 01 16 62 24 4 badhealth 1 1. .2 23 30 08 83 31 1 1 1. .4 43 33 31 11 15 5 - -. .2 20 02 22 28 84 48 8 . .0 01 18 87 72 20 02 2 b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Example from last week Test: Ho: difference in coefficients not systematic chi2(5 5) = (b-B)'[(V_b-V_B)^(-1)](b-B) Random effects rejected (inconsistent) in favour of fixed effects (consistent but inefficient) = 1 12 23 3. .9 96 6 Prob>chi2 = 0 0. .0 00 00 00 0
HOWEVER Big disciplinary divide Economists swear by the Hausman test and rarely report random effects Other disciplines (eg psychology) consider other factors such as explanatory power.
Estimating FE in STATA . xtreg LIKERT female ue_sick partner age age2 badh, fe Fixed-effects (within) regression Number of obs = 2 24 42 20 04 4 Group variable: p pi id d Number of groups = 3 33 31 17 7 R-square-like statistic R-sq: within = 0 0. .0 05 50 01 1 Obs per group: min = 1 1 between = 0 0. .1 19 90 06 6 avg = 7 7. .3 3 overall = 0 0. .1 12 28 85 5 max = 1 14 4 F(5 5,2 20 08 88 82 2) = 2 22 20 0. .4 44 4 corr(u_i, Xb) = 0 0. .1 15 56 61 1 Prob > F = 0 0. .0 00 00 00 0 LIKERT Coef. Std. Err. t P>|t| [95% Conf. Interval] female ( (d dr ro op pp pe ed d) ) ue_sick 1 1. .9 95 51 14 48 85 5 . .1 13 39 94 41 16 64 4 1 14 4. .0 00 0 0 0. .0 00 00 0 1 1. .6 67 78 82 21 18 8 2 2. .2 22 24 47 75 52 2 partner - -. .2 29 98 86 66 68 8 . .1 11 18 86 63 35 5 - -2 2. .5 52 2 0 0. .0 01 12 2 - -. .5 53 31 12 20 01 18 8 - -. .0 06 66 61 13 34 42 2 Peaks at age 48 age . .1 11 14 41 17 74 48 8 . .0 02 21 14 44 40 03 3 5 5. .3 33 3 0 0. .0 00 00 0 . .0 07 72 21 15 50 01 1 . .1 15 56 61 19 99 94 4 age2 - -. .0 00 01 11 18 83 33 3 . .0 00 00 02 22 20 09 9 - -5 5. .3 36 6 0 0. .0 00 00 0 - -. .0 00 01 16 61 16 63 3 - -. .0 00 00 07 75 50 03 3 badhealth 1 1. .2 23 30 08 83 31 1 . .0 04 42 28 85 55 56 6 2 28 8. .7 72 2 0 0. .0 00 00 0 1 1. .1 14 46 68 83 3 1 1. .3 31 14 48 83 31 1 _cons 6 6. .2 25 52 29 97 75 5 . .4 49 93 32 29 97 77 7 1 12 2. .6 68 8 0 0. .0 00 00 0 5 5. .2 28 86 60 07 73 3 7 7. .2 21 19 98 87 77 7 sigma_u 3 3. .9 99 93 34 45 56 65 5 u and e are the two parts of the error term sigma_e 4 4. .0 05 52 25 56 61 18 8 rho . .4 49 92 26 65 54 44 49 9 (fraction of variance due to u_i) F test that all u_i=0: F(3 33 31 16 6, 2 20 08 88 82 2) = 4 4. .5 56 6 Prob > F = 0 0. .0 00 00 00 0
Between regression: Not much used, but useful to compare coefficients with fixed effects . xtreg LIKERT female ue_sick partner age age2 badh, be Between regression (regression on group means) Number of obs = 2 24 42 20 04 4 Group variable: p pi id d Number of groups = 3 33 31 17 7 R-sq: within = 0 0. .0 04 48 80 0 Obs per group: min = 1 1 between = 0 0. .2 23 32 22 2 avg = 7 7. .3 3 overall = 0 0. .1 14 48 82 2 max = 1 14 4 F(6 6,3 33 31 10 0) = 1 16 66 6. .8 80 0 Coefficient on partner was negative and significant in FE model. sd(u_i + avg(e_i.))= 3 3. .8 83 33 33 35 57 7 Prob > F = 0 0. .0 00 00 00 0 LIKERT Coef. Std. Err. t P>|t| [95% Conf. Interval] female 1 1. .4 47 76 66 65 59 9 . .1 13 35 50 02 22 26 6 1 10 0. .9 94 4 0 0. .0 00 00 0 1 1. .2 21 11 19 92 23 3 1 1. .7 74 41 13 39 95 5 ue_sick 2 2. .0 03 38 81 19 92 2 . .3 31 12 21 19 91 1 6 6. .5 53 3 0 0. .0 00 00 0 1 1. .4 42 26 60 08 85 5 2 2. .6 65 50 02 29 99 9 partner - -. .0 01 10 01 19 94 41 1 . .1 17 77 77 74 42 23 3 - -0 0. .0 06 6 0 0. .9 95 54 4 - -. .3 35 58 86 69 9 . .3 33 38 83 30 01 19 9 In FE, the partner coeff really measures the events of gaining or losing a partner age . .0 08 82 27 73 33 35 5 . .0 02 21 19 90 02 26 6 3 3. .7 78 8 0 0. .0 00 00 0 . .0 03 39 97 78 89 95 5 . .1 12 25 56 67 77 75 5 age2 - -. .0 00 00 09 94 48 89 9 . .0 00 00 02 22 26 63 3 - -4 4. .1 19 9 0 0. .0 00 00 0 - -. .0 00 01 13 39 92 27 7 - -. .0 00 00 05 50 05 52 2 badhealth 2 2. .2 27 75 58 83 32 2 . .0 09 92 26 65 52 21 1 2 24 4. .5 56 6 0 0. .0 00 00 0 2 2. .0 09 94 41 17 71 1 2 2. .4 45 57 74 49 93 3 _cons 3 3. .9 95 53 39 94 41 1 . .4 44 43 30 09 90 09 9 8 8. .9 92 2 0 0. .0 00 00 0 3 3. .0 08 85 51 18 81 1 4 4. .8 82 22 27 70 01 1
Random effects regression . xtreg LIKERT female ue_sick partner age age2 badh, re theta Random-effects GLS regression Number of obs = 2 24 42 20 04 4 Group variable: p pi id d Number of groups = 3 33 31 17 7 R-sq: within = 0 0. .0 05 50 00 0 Obs per group: min = 1 1 between = 0 0. .2 22 23 39 9 avg = 7 7. .3 3 overall = 0 0. .1 14 47 71 1 max = 1 14 4 Random effects u_i ~ G Ga au us ss si ia an n Wald chi2(6 6) = 2 20 01 13 3. .3 32 2 corr(u_i, X) = 0 0 (assumed) Prob > chi2 = 0 0. .0 00 00 00 0 theta Option theta gives a summary of weights min 5% median 95% max 0 0. .1 19 98 86 6 0 0. .1 19 98 86 6 0 0. .5 54 48 82 2 0 0. .6 66 62 29 9 0 0. .6 66 62 29 9 LIKERT Coef. Std. Err. z P>|z| [95% Conf. Interval] female 1 1. .4 49 93 34 43 31 1 . .1 12 25 59 99 93 31 1 1 11 1. .8 85 5 0 0. .0 00 00 0 1 1. .2 24 46 64 48 89 9 1 1. .7 74 40 03 37 73 3 ue_sick 2 2. .0 04 45 53 30 02 2 . .1 12 27 71 10 03 39 9 1 16 6. .0 09 9 0 0. .0 00 00 0 1 1. .7 79 96 61 18 83 3 2 2. .2 29 94 44 42 22 2 partner - -. .1 19 94 47 76 69 91 1 . .0 09 97 73 37 73 34 4 - -2 2. .0 00 0 0 0. .0 04 45 5 - -. .3 38 85 56 61 17 75 5 - -. .0 00 03 39 92 20 07 7 age . .1 10 05 58 80 03 38 8 . .0 01 14 45 54 44 4 7 7. .2 27 7 0 0. .0 00 00 0 . .0 07 77 72 29 98 81 1 . .1 13 34 43 30 09 94 4 age2 - -. .0 00 01 11 10 06 62 2 . .0 00 00 01 14 49 98 8 - -7 7. .3 39 9 0 0. .0 00 00 0 - -. .0 00 01 13 39 99 98 8 - -. .0 00 00 08 81 12 26 6 badhealth 1 1. .4 43 33 31 11 15 5 . .0 03 38 85 55 50 06 6 3 37 7. .1 17 7 0 0. .0 00 00 0 1 1. .3 35 57 75 55 58 8 1 1. .5 50 08 86 67 73 3 _cons 5 5. .1 18 81 18 86 64 4 . .3 31 13 37 76 66 62 2 1 16 6. .5 52 2 0 0. .0 00 00 0 4 4. .5 56 66 68 89 94 4 5 5. .7 79 96 68 83 35 5 sigma_u 3 3. .0 02 24 48 85 56 63 3 sigma_e 4 4. .0 05 52 25 56 61 18 8 rho . .3 35 57 77 78 89 95 5 (fraction of variance due to u_i)
And what about OLS? OLS simply treats within- and between-group variation as the same Pools data across waves . reg LIKERT female ue_sick partner age age2 badh Source SS df MS Number of obs = 2 24 42 20 04 4 F( 6, 24197) = 7 70 06 6. .5 54 4 Model 1 10 03 35 58 83 3. .5 50 05 5 6 6 1 17 72 26 63 3. .9 91 17 75 5 Prob > F = 0 0. .0 00 00 00 0 Residual 5 59 91 12 23 39 9. .6 69 94 4 2 24 41 19 97 7 2 24 4. .4 43 34 44 42 21 14 4 R-squared = 0 0. .1 14 49 91 1 Adj R-squared = 0 0. .1 14 48 89 9 Total 6 69 94 48 82 23 3. .1 19 99 9 2 24 42 20 03 3 2 28 8. .7 70 08 81 14 43 36 6 Root MSE = 4 4. .9 94 43 31 1 LIKERT Coef. Std. Err. t P>|t| [95% Conf. Interval] female 1 1. .4 40 09 94 46 66 6 . .0 06 64 40 06 65 51 1 2 22 2. .0 00 0 0 0. .0 00 00 0 1 1. .2 28 83 38 89 95 5 1 1. .5 53 35 50 03 38 8 ue_sick 2 2. .0 03 31 18 81 15 5 . .1 12 24 40 07 75 57 7 1 16 6. .3 38 8 0 0. .0 00 00 0 1 1. .7 78 88 86 61 19 9 2 2. .2 27 75 50 01 11 1 partner - -. .0 07 75 51 12 29 96 6 . .0 07 76 69 92 27 71 1 - -0 0. .9 98 8 0 0. .3 32 29 9 - -. .2 22 25 59 91 11 16 6 . .0 07 75 56 65 52 24 4 age . .0 09 98 83 37 74 46 6 . .0 01 10 03 33 31 16 6 9 9. .5 52 2 0 0. .0 00 00 0 . .0 07 78 81 12 24 4 . .1 11 18 86 62 25 52 2 age2 - -. .0 00 01 10 06 61 13 3 . .0 00 00 01 10 04 49 9 - -1 10 0. .1 12 2 0 0. .0 00 00 0 - -. .0 00 01 12 26 67 7 - -. .0 00 00 08 85 55 57 7 badhealth 1 1. .8 84 41 17 79 96 6 . .0 03 35 57 71 16 65 5 5 51 1. .5 57 7 0 0. .0 00 00 0 1 1. .7 77 71 17 78 89 9 1 1. .9 91 11 18 80 02 2 _cons 4 4. .4 45 50 03 39 93 3 . .2 22 21 12 27 73 33 3 2 20 0. .1 11 1 0 0. .0 00 00 0 4 4. .0 01 16 66 68 84 4 4 4. .8 88 84 41 10 02 2
Comparing models Compare coefficients between models Reasonably similar differences in partner and badhealth coeffs R-squareds are similar Within and between estimators maximise within and between r-2 respectively. Female Ue_sick Partner Age Age-2 Badhealth Cons Within R2 Between r2 Overall r2 FE RE BE OLS - 1.95 *** -0.30 ** 0.11 *** -0.00 *** 1.23 *** 6.25 *** 0.050 0.191 0.129 1.49 *** 2.04 *** -1.94 *** 0.11 ** -0.00 *** 1.43 ** 5.18 *** 0.050 0.224 0.147 1.47 *** 2.03 *** -0.01 0.08 *** -0.00 *** 2.28 *** 3.95 *** 0.048 0.232 0.148 1.41 *** 2.03 *** -0.08 0.10 *** -0.00 *** 1.84 *** 4.45 *** - - 0.149
Test whether pooling data is valid = + + + y x u it it i it If the ui do not vary between individuals, they can be treated as part of and OLS is fine. Breusch-Pagan Lagrange multiplier test H0 Variance of ui = 0 H1 Variance of ui not equal to zero If H0 is not rejected, you can pool the data and use OLS Post-estimation test after random effects . quietly xtreg LIKERT female ue_sick partner age age2 badh, re . xttest0 Breusch and Pagan Lagrangian multiplier test for random effects LIKERT[pid,t] = Xb + u[pid] + e[pid,t] Estimated results: Var sd = sqrt(Var) LIKERT 2 28 8. .7 70 08 81 14 4 5 5. .3 35 57 79 99 98 8 e 1 16 6. .4 42 23 32 26 6 4 4. .0 05 52 25 56 62 2 u 9 9. .1 14 49 97 75 56 6 3 3. .0 02 24 48 85 56 6 Test: Var(u) = 0 chi2(1) = 1 10 08 81 16 6. .4 48 8 Prob > chi2 = 0 0. .0 00 00 00 0
Thinking about the within and between estimators .. = + + + y x u i i i i = + ( ) ( ) ( ) y y x x i i it it it i Both between and FE models written with the same coefficient vector , but no reason why they should be the same. Between: j measures the difference in y associated with a one-unit difference in the average value of variable xj between individuals essentially a cross-sectional concept Within: j measures the difference associated with a one-unit increase in variable xj at individual level essentially a longitudinal concept Random effects, as a weighted average of the two, constrains both s to be the same. Excellent article at http://www.stata.com/support/faqs/stat/xt.html And lots more at http://www.stata.com/support/faqs/stat/#models
Examples Example 1: Consider estimating a wage equation, and including a set of regional dummies, with S-E the omitted group. Wages in (eg) the N-W are lower, so the estimated between coefficient on N-W will be negative. However, in the within regression, we observe the effects of people moving to the N-W. Presumably they wouldn t move without a reasonable incentive. So, the estimated within coefficient may even be positive or at least, it s likely to be a lot less negative. Example 2: Estimate the relationship between family income and children s educational outcomes The between-group estimates measure how well the children of richer families do, relative to the children of poorer families we know this estimate is likely to be large and significant. The within-group estimates measure how children s outcomes change as their own family s income changes. This coefficient may well be much smaller.
FE and time-invariant variables Reformulating the regression equation to distinguish between time-varying and time-invariant variables: u z x + + + + = y it it i i it Residual Time- varying variables: income, health Time- invariant variables eg sex, race Individual-specific fixed effect Inconveniently, fixed effects washes out the z s, so does not produce estimates of . But there is a way! Requires the z variable to be uncorrelated with u s
Coefficients on time-invariant variables Run FE in the normal way Use estimates to predict the residuals Use the between estimator to regress the residuals on the time-invariant variables Done! Only use this if RE is rejected: otherwise, RE provides best estimates of all coefficients Going back to the previous example, . quietly xtreg LIKERT female ue_sick partner age age2 badh, fe . predict FE_RESID, ue (13352 missing values generated) . xtreg FE_RESID female, be Between regression (regression on group means) Number of obs = 2 24 42 20 04 4 Group variable: p pi id d Number of groups = 3 33 31 17 7 R-sq: within = 0 0. .0 00 00 00 0 Obs per group: min = 1 1 between = 0 0. .0 04 40 00 0 avg = 7 7. .3 3 overall = 0 0. .0 02 21 12 2 max = 1 14 4 F(1 1,3 33 31 15 5) = 1 13 38 8. .2 24 4 sd(u_i + avg(e_i.))= 3 3. .9 91 13 32 29 98 8 Prob > F = 0 0. .0 00 00 00 0 FE_RESID Coef. Std. Err. t P>|t| [95% Conf. Interval] female 1 1. .5 59 99 95 51 18 8 . .1 13 36 60 04 42 26 6 1 11 1. .7 76 6 0 0. .0 00 00 0 1 1. .3 33 32 27 78 82 2 1 1. .8 86 66 62 25 54 4 _cons - -. .7 72 28 88 88 89 92 2 . .0 09 98 84 41 18 86 6 - -7 7. .4 41 1 0 0. .0 00 00 0 - -. .9 92 21 18 85 56 64 4 - -. .5 53 35 59 92 21 19 9
From previous slide Female Ue_sick Partner Age Age-2 Badhealth Cons Within R2 Between r2 Overall r2 FE RE BE OLS - 1.95 *** -0.30 ** 0.11 *** -0.00 *** 1.23 *** 6.25 *** 0.050 0.191 0.129 1.49 *** 2.04 *** -1.94 *** 0.11 ** -0.00 *** 1.43 ** 5.18 *** 0.050 0.224 0.147 1.47 *** 2.03 *** -0.01 0.08 *** -0.00 *** 2.28 *** 3.95 *** 0.048 0.232 0.148 1.41 *** 2.03 *** -0.08 0.10 *** -0.00 *** 1.84 *** 4.45 *** - - 0.149 Our estimate of 1.60 for the coefficient on female is slightly higher than, but definitely in the same ball-park as, those produced by the other methods.
Improving specification Recall our problem with the partner coefficient OLS and between estimates show no significant relationship between partnership status and LIKERT scores FE and RE show a significant negative relationship. FE estimates coefficient on deviation from mean likely to reflect moving in together (which makes you temporarily happy) and splitting up (which makes you temporarily sad). Investigate this by including variables to capture these events Female Ue_sick Partner Age Age-2 Badhealth Cons Within R2 Between r2 Overall r2 FE RE BE OLS - 1.95 *** -0.30 ** 0.11 *** -0.00 *** 1.23 *** 6.25 *** 0.050 0.191 0.129 1.49 *** 2.04 *** -1.94 *** 0.11 ** -0.00 *** 1.43 ** 5.18 *** 0.050 0.224 0.147 1.47 *** 2.03 *** -0.01 0.08 *** -0.00 *** 2.28 *** 3.95 *** 0.048 0.232 0.148 1.41 *** 2.03 *** -0.08 0.10 *** -0.00 *** 1.84 *** 4.45 *** - - 0.149
Generate variables reflecting changes . sort pid wave . gen get_pnr = (partner == 1 & partner[_n-1] == 0) if pid == pid[_n-1] & wave == wave[_n-1] + 1 (5078 missing values generated) . gen lose_pnr = (partner == 0 & partner[_n-1] == 1) if pid == pid[_n-1] & wave == wave[_n-1] + 1 (5078 missing values generated) Note: we will lose some observations
Fixed effects . . xtreg LIKERT partner get_pnr lose_pnr female ue_sick age age2 badh, fe Fixed-effects (within) regression Number of obs = 2 21 12 26 64 4 Group variable: p pi id d Number of groups = 2 27 76 64 4 R-sq: within = 0 0. .0 05 57 74 4 Obs per group: min = 1 1 between = 0 0. .1 18 83 39 9 avg = 7 7. .7 7 overall = 0 0. .1 13 33 33 3 max = 1 13 3 Coeff on having a partner now slightly positive; getting a partner is insignificant; losing a partner is now large and positive F(7 7,1 18 84 49 93 3) = 1 16 60 0. .8 80 0 corr(u_i, Xb) = 0 0. .1 14 46 60 0 Prob > F = 0 0. .0 00 00 00 0 LIKERT Coef. Std. Err. t P>|t| [95% Conf. Interval] partner . .3 31 18 86 64 42 29 9 . .1 14 43 31 11 12 2 2 2. .2 23 3 0 0. .0 02 26 6 . .0 03 38 81 13 30 01 1 . .5 59 99 91 15 55 57 7 get_pnr - -. .0 07 79 93 39 95 52 2 . .2 21 11 16 67 73 39 9 - -0 0. .3 38 8 0 0. .7 70 08 8 - -. .4 49 94 42 29 95 56 6 . .3 33 35 55 50 05 53 3 lose_pnr 2 2. .6 64 40 01 16 6 . .2 23 37 71 12 25 52 2 1 11 1. .1 13 3 0 0. .0 00 00 0 2 2. .1 17 75 53 37 72 2 3 3. .1 10 04 49 94 47 7 female ( (d dr ro op pp pe ed d) ) ue_sick 1 1. .8 89 94 46 65 59 9 . .1 15 53 30 03 31 11 1 1 12 2. .3 38 8 0 0. .0 00 00 0 1 1. .5 59 94 47 70 04 4 2 2. .1 19 94 46 61 14 4 age . .0 07 73 34 42 27 74 4 . .0 02 24 40 08 82 22 2 3 3. .0 05 5 0 0. .0 00 02 2 . .0 02 26 62 22 24 41 1 . .1 12 20 06 63 30 08 8 age2 - -. .0 00 00 08 87 79 99 9 . .0 00 00 02 24 46 64 4 - -3 3. .5 57 7 0 0. .0 00 00 0 - -. .0 00 01 13 36 62 29 9 - -. .0 00 00 03 39 96 69 9 badhealth 1 1. .2 28 84 45 59 93 3 . .0 04 45 59 96 67 7 2 27 7. .9 95 5 0 0. .0 00 00 0 1 1. .1 19 94 44 49 94 4 1 1. .3 37 74 46 69 93 3 _cons 6 6. .7 79 96 66 60 02 2 . .5 55 57 70 02 24 47 7 1 12 2. .2 20 0 0 0. .0 00 00 0 5 5. .7 70 04 47 78 82 2 7 7. .8 88 88 84 42 22 2 sigma_u 3 3. .7 78 85 57 73 33 35 5 sigma_e 4 4. .0 03 30 05 51 19 9 rho . .4 46 68 87 71 13 31 19 9 (fraction of variance due to u_i) F test that all u_i=0: F(2 27 76 63 3, 1 18 84 49 93 3) = 4 4. .8 83 3 Prob > F = 0 0. .0 00 00 00 0
Random effects . xtreg LIKERT partner get_pnr lose_pnr female ue_sick age age2 badh, re Random-effects GLS regression Number of obs = 2 21 12 26 64 4 Group variable: p pi id d Number of groups = 2 27 76 64 4 R-sq: within = 0 0. .0 05 57 71 1 Obs per group: min = 1 1 between = 0 0. .2 22 21 13 3 avg = 7 7. .7 7 overall = 0 0. .1 15 54 45 5 max = 1 13 3 Random effects u_i ~ G Ga au us ss si ia an n Wald chi2(8 8) = 1 19 92 22 2. .4 41 1 corr(u_i, X) = 0 0 (assumed) Prob > chi2 = 0 0. .0 00 00 00 0 LIKERT Coef. Std. Err. z P>|z| [95% Conf. Interval] similar partner . .2 28 81 13 37 75 5 . .1 11 13 32 25 51 1 2 2. .4 48 8 0 0. .0 01 13 3 . .0 05 59 94 40 07 72 2 . .5 50 03 33 34 42 28 8 get_pnr - -. .0 08 89 97 73 33 35 5 . .2 20 04 45 54 47 7 - -0 0. .4 44 4 0 0. .6 66 61 1 - -. .4 49 90 06 63 38 82 2 . .3 31 11 11 17 71 13 3 lose_pnr 2 2. .7 76 66 62 26 6 . .2 22 28 84 43 33 31 1 1 12 2. .1 11 1 0 0. .0 00 00 0 2 2. .3 31 18 85 53 39 9 3 3. .2 21 13 39 98 8 female 1 1. .4 45 50 07 74 48 8 . .1 13 32 24 46 67 75 5 1 10 0. .9 95 5 0 0. .0 00 00 0 1 1. .1 19 91 11 11 16 6 1 1. .7 71 10 03 37 79 9 ue_sick 1 1. .8 89 92 23 35 52 2 . .1 13 38 88 88 82 21 1 1 13 3. .6 63 3 0 0. .0 00 00 0 1 1. .6 62 20 01 14 48 8 2 2. .1 16 64 45 55 56 6 age . .0 07 71 19 91 13 39 9 . .0 01 15 59 92 22 22 2 4 4. .5 52 2 0 0. .0 00 00 0 . .0 04 40 07 70 06 69 9 . .1 10 03 31 12 20 09 9 age2 - -. .0 00 00 07 77 74 48 8 . .0 00 00 01 16 62 21 1 - -4 4. .7 78 8 0 0. .0 00 00 0 - -. .0 00 01 10 09 92 26 6 - -. .0 00 00 04 45 57 7 badhealth 1 1. .4 47 70 03 35 53 3 . .0 04 41 14 40 03 36 6 3 35 5. .5 51 1 0 0. .0 00 00 0 1 1. .3 38 89 92 20 03 3 1 1. .5 55 51 15 50 02 2 _cons 5 5. .4 45 57 72 21 17 7 . .3 34 43 36 68 85 51 1 1 15 5. .8 88 8 0 0. .0 00 00 0 4 4. .7 78 83 36 60 06 6 6 6. .1 13 30 08 82 27 7 sigma_u 2 2. .9 96 60 04 40 04 42 2 Proportion of total residual variance attributable to the u s - c.f. random slopes models later sigma_e 4 4. .0 03 30 05 51 19 9 rho . .3 35 50 04 43 32 25 5 (fraction of variance due to u_i)
Collating the coefficients: Partner Get partner Lose partner Partner FE RE BE OLS 0.32 ** -0.07 2.64 *** FE -0.30 ** 0.28 ** -0.09 ** 2.77 *** RE -1.94 *** 0.29 -2.85 ** 7.17 *** BE -0.01 0.17 ** -0.10 3.19 *** OLS -0.08
Hausman test again Have we cleaned up the specification sufficiently that the Hausman test will now fail to reject random effects? . quietly xtreg LIKERT partner get_pnr lose_pnr female ue_sick age age2 badh, fe . estimates store fixed . quietly xtreg LIKERT partner get_pnr lose_pnr female ue_sick age age2 badh, re . hausman fixed . Coefficients (b) (B) (b-B) sqrt(diag(V_b-V_B)) fixed . Difference S.E. partner . .3 31 18 86 64 42 29 9 . .2 28 81 13 37 75 5 . .0 03 37 72 26 67 79 9 . .0 08 87 74 49 94 44 4 get_pnr - -. .0 07 79 93 39 95 52 2 - -. .0 08 89 97 73 33 35 5 . .0 01 10 03 33 38 83 3 . .0 05 54 44 46 64 45 5 lose_pnr 2 2. .6 64 40 01 16 6 2 2. .7 76 66 62 26 6 - -. .1 12 26 60 09 99 99 9 . .0 06 63 36 61 13 36 6 ue_sick 1 1. .8 89 94 46 65 59 9 1 1. .8 89 92 23 35 52 2 . .0 00 02 23 30 07 72 2 . .0 06 64 42 26 67 73 3 age . .0 07 73 34 42 27 74 4 . .0 07 71 19 91 13 39 9 . .0 00 01 15 51 13 35 5 . .0 01 18 80 06 67 75 5 age2 - -. .0 00 00 08 87 79 99 9 - -. .0 00 00 07 77 74 48 8 - -. .0 00 00 01 10 05 51 1 . .0 00 00 01 18 85 55 5 badhealth 1 1. .2 28 84 45 59 93 3 1 1. .4 47 70 03 35 53 3 - -. .1 18 85 57 75 59 94 4 . .0 01 19 99 96 67 76 6 b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(7 7) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 1 11 16 6. .0 04 4 Prob>chi2 = 0 0. .0 00 00 00 0 No! Although the chi-squared statistic is smaller now (at 116.04), than previously (at 123.96)
Thinking about time Under FE, including wave or year as a continuous variable is not very useful, since it is treated as the deviation from the individual s mean. We may not want to treat time as a linear trend (for example, if we are looking for a cut point related to social policy) Also, wave is very much correlated with individuals ages Can do FE or RE including time periods as dummies May be referred to as two-way fixed effects Generate each dummy variable separately, or . local i = 1 while `i' <= 15 { gen byte W`i' = (wave == `i') local i = `i' + 1 }
Time variables insignificant here (as we would expect) . xtreg LIKERT partner get_pnr lose_pnr female ue_sick age age2 badh W*, fe Fixed-effects (within) regression Number of obs = 2 21 12 26 64 4 Group variable: p pi id d Number of groups = 2 27 76 64 4 R-sq: within = 0 0. .0 05 58 80 0 Obs per group: min = 1 1 between = 0 0. .1 18 81 11 1 avg = 7 7. .7 7 overall = 0 0. .1 13 32 23 3 max = 1 13 3 F(1 19 9,1 18 84 48 81 1) = 5 59 9. .9 92 2 corr(u_i, Xb) = 0 0. .1 14 42 23 3 Prob > F = 0 0. .0 00 00 00 0 LIKERT Coef. Std. Err. t P>|t| [95% Conf. Interval] partner . .3 31 19 93 34 45 54 4 . .1 14 43 31 14 49 96 6 2 2. .2 23 3 0 0. .0 02 26 6 . .0 03 38 87 75 59 9 . .5 59 99 99 93 31 17 7 get_pnr - -. .0 07 72 25 55 53 3 . .2 21 11 17 71 18 86 6 - -0 0. .3 34 4 0 0. .7 73 32 2 - -. .4 48 87 75 54 41 1 . .3 34 42 24 43 34 49 9 lose_pnr 2 2. .6 64 48 87 72 29 9 . .2 23 37 72 22 29 93 3 1 11 1. .1 17 7 0 0. .0 00 00 0 2 2. .1 18 83 37 73 37 7 3 3. .1 11 13 37 72 2 female ( (d dr ro op pp pe ed d) ) ue_sick 1 1. .8 89 94 48 83 34 4 . .1 15 53 31 10 00 05 5 1 12 2. .3 38 8 0 0. .0 00 00 0 1 1. .5 59 94 47 74 43 3 2 2. .1 19 94 49 92 25 5 age . .0 07 71 14 42 27 7 . .1 12 20 00 08 86 67 7 0 0. .5 59 9 0 0. .5 55 52 2 - -. .1 16 63 39 95 54 41 1 . .3 30 06 68 80 08 81 1 age2 - -. .0 00 00 08 88 82 21 1 . .0 00 00 02 24 46 64 4 - -3 3. .5 58 8 0 0. .0 00 00 0 - -. .0 00 01 13 36 65 51 1 - -. .0 00 00 03 39 99 91 1 badhealth 1 1. .2 28 82 29 99 99 9 . .0 04 46 60 01 17 78 8 2 27 7. .8 88 8 0 0. .0 00 00 0 1 1. .1 19 92 28 8 1 1. .3 37 73 31 19 99 9 W2 - -. .0 01 14 40 07 73 37 7 1 1. .5 54 40 04 44 43 3 - -0 0. .0 01 1 0 0. .9 99 93 3 - -3 3. .0 03 33 34 48 85 5 3 3. .0 00 05 53 33 38 8 W3 - -. .0 05 55 54 47 75 59 9 1 1. .4 42 22 27 78 81 1 - -0 0. .0 04 4 0 0. .9 96 69 9 - -2 2. .8 84 44 42 25 57 7 2 2. .7 73 33 33 30 06 6 W4 . .1 12 27 73 31 19 98 8 1 1. .3 30 03 38 81 12 2 0 0. .1 10 0 0 0. .9 92 22 2 - -2 2. .4 42 28 82 27 72 2 2 2. .6 68 82 29 91 11 1 W5 - -. .0 07 76 61 15 56 69 9 1 1. .1 18 85 53 39 96 6 - -0 0. .0 06 6 0 0. .9 94 49 9 - -2 2. .3 39 99 96 64 43 3 2 2. .2 24 47 73 32 29 9 W6 . .0 08 86 65 51 11 11 1 1 1. .0 07 73 34 44 4 0 0. .0 08 8 0 0. .9 93 36 6 - -2 2. .0 01 17 75 53 3 2 2. .1 19 90 05 55 53 3 W7 - -. .0 01 10 04 42 28 89 9 . .9 95 56 62 29 92 25 5 - -0 0. .0 01 1 0 0. .9 99 91 1 - -1 1. .8 88 84 48 85 51 1 1 1. .8 86 63 39 99 93 3 W8 - -. .1 11 12 20 06 62 29 9 . .8 84 40 00 04 40 02 2 - -0 0. .1 13 3 0 0. .8 89 94 4 - -1 1. .7 75 58 86 61 19 9 1 1. .5 53 34 44 49 93 3 W9 ( (d dr ro op pp pe ed d) ) W10 . .2 27 73 39 97 76 67 7 . .6 60 08 86 62 29 95 5 0 0. .4 45 5 0 0. .6 65 53 3 - -. .9 91 18 89 99 93 33 3 1 1. .4 46 66 69 94 47 7 W11 . .0 08 88 81 17 72 23 3 . .4 49 96 63 31 14 43 3 0 0. .1 18 8 0 0. .8 85 59 9 - -. .8 88 84 46 64 49 95 5 1 1. .0 06 60 09 99 94 4 W12 - -. .0 03 35 58 88 82 24 4 . .3 38 85 58 87 74 4 - -0 0. .0 09 9 0 0. .9 92 26 6 - -. .7 79 92 22 23 31 12 2 . .7 72 20 04 46 66 63 3 W13 - -. .0 06 67 71 17 72 28 8 . .2 27 79 92 28 83 3 - -0 0. .2 24 4 0 0. .8 81 10 0 - -. .6 61 14 45 59 93 32 2 . .4 48 80 02 24 47 77 7 W14 . .0 06 61 10 01 15 56 6 . .1 18 89 98 87 79 93 3 0 0. .3 32 2 0 0. .7 74 48 8 - -. .3 31 11 11 16 65 54 4 . .4 43 33 31 19 96 66 6 W15 ( (d dr ro op pp pe ed d) ) _cons 6 6. .8 87 73 30 03 39 9 6 6. .0 06 64 47 71 19 9 1 1. .1 13 3 0 0. .2 25 57 7 - -5 5. .0 01 14 43 37 7 1 18 8. .7 76 60 04 45 5 sigma_u 3 3. .7 79 90 04 44 48 87 7 sigma_e 4 4. .0 03 30 04 42 24 44 4 rho . .4 46 69 93 34 44 48 86 6 (fraction of variance due to u_i) F test that all u_i=0: F(2 27 76 63 3, 1 18 84 48 81 1) = 4 4. .8 83 3 Prob > F = 0 0. .0 00 00 00 0
Extending panel data models to discrete dependent variables Panel data extensions to logit and probit models Recap from Week 1: These models cover discrete (categorical) outcomes, eg psychological morbidity; whether one has a job;. Think of other examples. Outcome variable is always 0 or 1. Estimate: ) 1 = = Pr( ( , ) Y F X = = Pr( ) 0 1 ( , ) Y F X OLS (linear probability model) would set F(X, ) = X + Inappropriate because: Heteroscedasticity: the outcome variable is always 0 or 1, so only takes the value -x or 1-x More seriously, one cannot constrain estimated probabilities to lie between 0 and 1.
Extension of logit and probit to panel data: We won t do the maths! But essentially, STATA maximises a likelihood function derived from the panel data specification Both random effects and fixed effects First, generate the categorical variable indicating psychological morbidity . gen byte PM = (hlghq2 > 2) if hlghq2 >= 0 & hlghq2 != .
Fixed effects estimates xtlogit (clogit) . xtlogit PM partner get_pnr lose_pnr female ue_sick age age2 badh, fe note: multiple positive outcomes within groups encountered. note: 1221 groups (6462 obs) dropped because of all positive or all negative outcomes. note: female omitted because of no within-group variance. Iteration 0: log likelihood = - -5 58 84 44 4. .5 51 16 65 5 Iteration 1: log likelihood = - -5 58 82 29 9. .2 21 17 79 9 Iteration 2: log likelihood = - -5 58 82 29 9. .2 21 12 22 2 Iteration 3: log likelihood = - -5 58 82 29 9. .2 21 12 22 2 Conditional fixed-effects logistic regression Number of obs = 1 14 48 80 02 2 Group variable: p pi id d Number of groups = 1 15 54 43 3 Obs per group: min = 2 2 avg = 9 9. .6 6 max = 1 13 3 LR chi2(7 7) = 5 51 17 7. .0 04 4 Log likelihood = - -5 58 82 29 9. .2 21 12 22 2 Prob > chi2 = 0 0. .0 00 00 00 0 PM Coef. Std. Err. z P>|z| [95% Conf. Interval] Is losing a partner necessarily causing the psychological morbidity? partner . .0 09 96 60 01 12 28 8 . .0 09 91 17 71 13 39 9 1 1. .0 05 5 0 0. .2 29 95 5 - -. .0 08 83 37 74 43 32 2 . .2 27 75 57 76 68 88 8 get_pnr . .0 03 36 68 85 56 68 8 . .1 13 35 58 87 7 0 0. .2 27 7 0 0. .7 78 86 6 - -. .2 22 29 94 44 43 36 6 . .3 30 03 31 15 57 72 2 lose_pnr 1 1. .2 23 31 14 47 75 5 . .1 14 46 69 99 96 64 4 8 8. .3 38 8 0 0. .0 00 00 0 . .9 94 43 33 36 67 72 2 1 1. .5 51 19 95 58 83 3 ue_sick . .7 75 53 33 39 96 68 8 . .0 09 97 70 01 11 11 1 7 7. .7 77 7 0 0. .0 00 00 0 . .5 56 63 32 25 58 86 6 . .9 94 43 35 53 35 51 1 age - -. .0 03 33 38 83 3 . .0 01 16 62 28 80 08 8 - -2 2. .0 08 8 0 0. .0 03 38 8 - -. .0 06 65 57 73 39 98 8 - -. .0 00 01 19 92 20 03 3 age2 . .0 00 00 00 08 89 94 4 . .0 00 00 01 17 71 15 5 0 0. .5 52 2 0 0. .6 60 02 2 - -. .0 00 00 02 24 46 68 8 . .0 00 00 04 42 25 56 6 badhealth . .5 53 38 86 68 85 58 8 . .0 02 29 98 83 36 61 1 1 18 8. .0 05 5 0 0. .0 00 00 0 . .4 48 80 02 20 08 81 1 . .5 59 97 71 16 63 36 6 Losing a partner, being unemployed or sick, and being in bad health are associated with psychological morbidity Negative in age throughout the human life span
Adding some more variables: We know that women sometimes suffer from post-natal depression. Try total number of children, and children aged 0-2 Total number of children is insignificant, but children 0-2 is significant. . xtlogit PM partner get_pnr lose_pnr female ue_sick age age2 badh nch02, fe note: multiple positive outcomes within groups encountered. note: 1221 groups (6462 obs) dropped because of all positive or all negative outcomes. note: female omitted because of no within-group variance. Iteration 0: log likelihood = - -5 58 83 39 9. .5 51 11 18 8 Iteration 1: log likelihood = - -5 58 82 24 4. .2 20 03 36 6 Iteration 2: log likelihood = - -5 58 82 24 4. .1 19 97 75 5 Iteration 3: log likelihood = - -5 58 82 24 4. .1 19 97 75 5 Next step??? Conditional fixed-effects logistic regression Number of obs = 1 14 48 80 02 2 Group variable: p pi id d Number of groups = 1 15 54 43 3 Obs per group: min = 2 2 avg = 9 9. .6 6 max = 1 13 3 LR chi2(8 8) = 5 52 27 7. .0 07 7 Log likelihood = - -5 58 82 24 4. .1 19 97 75 5 Prob > chi2 = 0 0. .0 00 00 00 0 PM Coef. Std. Err. z P>|z| [95% Conf. Interval] partner . .0 04 47 70 02 25 55 5 . .0 09 93 31 13 31 17 7 0 0. .5 50 0 0 0. .6 61 14 4 - -. .1 13 35 55 50 09 92 2 . .2 22 29 95 56 60 03 3 get_pnr . .0 06 67 79 91 18 86 6 . .1 13 36 63 33 36 61 1 0 0. .5 50 0 0 0. .6 61 18 8 - -. .1 19 99 92 29 95 52 2 . .3 33 35 51 13 32 24 4 lose_pnr 1 1. .2 21 17 77 75 56 6 . .1 14 47 72 20 09 94 4 8 8. .2 27 7 0 0. .0 00 00 0 . .9 92 29 92 23 31 11 1 1 1. .5 50 06 62 28 82 2 ue_sick . .7 74 49 97 72 27 7 . .0 09 97 70 05 53 36 6 7 7. .7 72 2 0 0. .0 00 00 0 . .5 55 59 95 50 05 54 4 . .9 93 39 99 94 48 87 7 age - -. .0 02 29 95 57 73 34 4 . .0 01 16 63 34 45 56 6 - -1 1. .8 81 1 0 0. .0 07 70 0 - -. .0 06 61 16 61 10 02 2 . .0 00 02 24 46 63 35 5 age2 . .0 00 00 00 05 58 82 2 . .0 00 00 01 17 71 19 9 0 0. .3 34 4 0 0. .7 73 35 5 - -. .0 00 00 02 27 78 87 7 . .0 00 00 03 39 95 51 1 badhealth . .5 53 37 75 54 45 5 . .0 02 29 98 83 37 74 4 1 18 8. .0 02 2 0 0. .0 00 00 0 . .4 47 79 90 06 64 47 7 . .5 59 96 60 02 25 53 3 nch02 . .2 24 49 94 44 48 8 . .0 07 78 85 57 73 37 7 3 3. .1 17 7 0 0. .0 00 01 1 . .0 09 95 54 44 46 64 4 . .4 40 03 34 44 49 97 7
Yes, we should separate men and women sort female by female: xtlogit PM partner get_pnr lose_pnr female ue_sick age age2 badh nch02, fe Men PM Coef. Std. Err. z P>|z| [95% Conf. Interval] partner - -. .0 02 26 62 25 59 95 5 . .1 15 51 17 73 35 5 - -0 0. .1 17 7 0 0. .8 86 63 3 - -. .3 32 23 36 65 54 47 7 . .2 27 71 11 13 35 57 7 get_pnr . .2 20 04 42 20 06 66 6 . .2 21 16 65 58 86 68 8 0 0. .9 94 4 0 0. .3 34 46 6 - -. .2 22 20 02 29 95 57 7 . .6 62 28 87 70 08 89 9 lose_pnr 1 1. .3 33 35 56 69 93 3 . .2 23 31 14 42 29 95 5 5 5. .7 77 7 0 0. .0 00 00 0 . .8 88 82 20 09 99 97 7 1 1. .7 78 89 92 28 87 7 ue_sick . .9 90 00 09 94 42 21 1 . .1 13 39 97 74 47 74 4 6 6. .4 45 5 0 0. .0 00 00 0 . .6 62 27 70 04 42 21 1 1 1. .1 17 74 48 84 42 2 age . .0 01 14 41 17 78 81 1 . .0 02 26 65 58 83 37 7 0 0. .5 53 3 0 0. .5 59 94 4 - -. .0 03 37 79 92 25 5 . .0 06 66 62 28 81 12 2 age2 - -. .0 00 00 04 48 86 64 4 . .0 00 00 02 28 80 04 4 - -1 1. .7 73 3 0 0. .0 08 83 3 - -. .0 00 01 10 03 35 59 9 . .0 00 00 00 06 63 32 2 badhealth . .5 56 62 28 84 40 03 3 . .0 04 47 79 93 39 9 1 11 1. .7 74 4 0 0. .0 00 00 0 . .4 46 68 88 88 81 17 7 . .6 65 56 67 79 99 9 nch02 . .0 04 45 58 89 96 65 5 . .1 12 26 68 88 80 08 8 0 0. .3 36 6 0 0. .7 71 18 8 - -. .2 20 02 27 78 85 54 4 . .2 29 94 45 57 78 84 4 Women PM Coef. Std. Err. z P>|z| [95% Conf. Interval] partner . .0 09 93 30 01 16 61 1 . .1 11 18 81 17 74 43 3 0 0. .7 79 9 0 0. .4 43 31 1 - -. .1 13 38 86 60 01 13 3 . .3 32 24 46 63 33 36 6 get_pnr - -. .0 01 12 22 23 30 03 3 . .1 17 75 51 12 24 43 3 - -0 0. .0 07 7 0 0. .9 94 44 4 - -. .3 35 55 54 46 67 76 6 . .3 33 31 10 00 06 69 9 lose_pnr 1 1. .1 13 30 01 12 2 . .1 19 90 01 18 84 42 2 5 5. .9 94 4 0 0. .0 00 00 0 . .7 75 57 73 36 65 57 7 1 1. .5 50 02 28 87 74 4 ue_sick . .6 60 03 32 28 88 82 2 . .1 13 35 57 73 31 16 6 4 4. .4 44 4 0 0. .0 00 00 0 . .3 33 37 72 25 59 91 1 . .8 86 69 93 31 17 74 4 age - -. .0 05 57 70 04 44 41 1 . .0 02 20 08 80 06 69 9 - -2 2. .7 74 4 0 0. .0 00 06 6 - -. .0 09 97 78 82 24 48 8 - -. .0 01 16 62 26 63 33 3 age2 . .0 00 00 04 40 03 39 9 . .0 00 00 02 21 18 85 5 1 1. .8 85 5 0 0. .0 06 65 5 - -. .0 00 00 00 02 24 45 5 . .0 00 00 08 83 32 22 2 badhealth . .5 52 22 22 22 25 59 9 . .0 03 38 82 21 13 35 5 1 13 3. .6 67 7 0 0. .0 00 00 0 . .4 44 47 73 32 28 88 8 . .5 59 97 71 12 23 3 nch02 . .3 38 84 40 07 78 88 8 . .1 10 01 11 10 09 92 2 3 3. .8 80 0 0 0. .0 00 00 0 . .1 18 85 59 90 08 84 4 . .5 58 82 22 24 49 93 3 Relationship between PM and young children is confined to women Any other gender differences?
Back to random effects Random-effects logistic regression Number of obs = 2 21 12 26 64 4 Group variable: p pi id d Number of groups = 2 27 76 64 4 Random effects u_i ~ G Ga au us ss si ia an n Obs per group: min = 1 1 avg = 7 7. .7 7 max = 1 13 3 Wald chi2(9 9) = 9 95 59 9. .5 52 2 Log likelihood = - -1 10 03 37 77 7. .0 05 58 8 Prob > chi2 = 0 0. .0 00 00 00 0 PM Coef. Std. Err. z P>|z| [95% Conf. Interval] partner . .0 05 56 65 53 39 92 2 . .0 06 69 95 54 47 74 4 0 0. .8 81 1 0 0. .4 41 16 6 - -. .0 07 79 97 77 71 12 2 . .1 19 92 28 84 49 96 6 get_pnr . .0 03 32 24 45 54 4 . .1 13 32 20 02 28 81 1 0 0. .2 25 5 0 0. .8 80 06 6 - -. .2 22 26 63 31 16 63 3 . .2 29 91 12 22 24 44 4 lose_pnr 1 1. .3 30 09 97 73 34 4 . .1 13 38 89 93 37 71 1 9 9. .4 43 3 0 0. .0 00 00 0 1 1. .0 03 37 74 42 22 2 1 1. .5 58 82 20 04 46 6 female . .6 68 86 64 48 86 6 . .0 07 71 12 27 76 69 9 9 9. .6 63 3 0 0. .0 00 00 0 . .5 54 46 67 78 85 59 9 . .8 82 26 61 18 86 62 2 ue_sick . .7 71 13 31 12 28 87 7 . .0 08 83 39 91 16 62 2 8 8. .5 50 0 0 0. .0 00 00 0 . .5 54 48 86 65 55 59 9 . .8 87 77 76 60 01 15 5 age - -. .0 01 13 30 06 65 5 . .0 00 09 94 45 55 52 2 - -1 1. .3 38 8 0 0. .1 16 67 7 - -. .0 03 31 15 59 96 68 8 . .0 00 05 54 46 66 67 7 age2 . .0 00 00 00 03 33 37 7 . .0 00 00 00 09 96 61 1 0 0. .3 35 5 0 0. .7 72 26 6 - -. .0 00 00 01 15 54 46 6 . .0 00 00 02 22 22 21 1 badhealth . .6 66 61 13 35 52 26 6 . .0 02 26 61 11 18 88 8 2 25 5. .3 32 2 0 0. .0 00 00 0 . .6 61 10 01 16 60 07 7 . .7 71 12 25 54 44 46 6 nch02 . .2 26 65 53 31 16 62 2 . .0 07 74 43 31 18 85 5 3 3. .5 57 7 0 0. .0 00 00 0 . .1 11 19 96 65 54 46 6 . .4 41 10 09 97 77 79 9 _cons - -2 2. .8 87 71 16 64 45 5 . .2 20 03 33 36 65 51 1 - -1 14 4. .1 12 2 0 0. .0 00 00 0 - -3 3. .2 27 70 02 23 33 3 - -2 2. .4 47 73 30 05 57 7 /lnsig2u . .6 64 49 96 63 37 76 6 . .0 05 57 71 18 87 76 6 . .5 53 37 75 55 52 2 . .7 76 61 17 72 23 32 2 sigma_u 1 1. .3 38 83 37 78 8 . .0 03 39 95 56 67 75 5 1 1. .3 30 08 83 36 62 2 1 1. .4 46 63 35 54 45 5 rho . .3 36 67 79 90 06 62 2 . .0 01 13 32 29 99 9 . .3 34 42 22 24 47 73 3 . .3 39 94 43 33 35 55 5 Likelihood-ratio test of rho=0: chibar2(01) = 2 20 03 38 8. .5 50 0 Prob >= chibar2 = 0 0. .0 00 00 0 Estimates are VERY similar to FE
Testing between FE and RE quietly xtlogit PM partner get_pnr lose_pnr female ue_sick age age2 badh nch02, fe estimates store fixed quietly xtlogit PM partner get_pnr lose_pnr female ue_sick age age2 badh nch02, re hausman fixed . Coefficients (b) (B) (b-B) sqrt(diag(V_b-V_B)) fixed . Difference S.E. partner . .0 04 47 70 02 25 55 5 . .0 05 56 65 53 39 92 2 - -. .0 00 09 95 51 13 37 7 . .0 06 61 19 94 40 09 9 get_pnr . .0 06 67 79 91 18 86 6 . .0 03 32 24 45 54 4 . .0 03 35 54 46 64 46 6 . .0 03 34 40 00 01 15 5 lose_pnr 1 1. .2 21 17 77 75 56 6 1 1. .3 30 09 97 73 34 4 - -. .0 09 91 19 97 77 76 6 . .0 04 48 86 65 52 29 9 ue_sick . .7 74 49 97 72 27 7 . .7 71 13 31 12 28 87 7 . .0 03 36 65 59 98 83 3 . .0 04 48 87 75 59 94 4 age - -. .0 02 29 95 57 73 34 4 - -. .0 01 13 30 06 65 5 - -. .0 01 16 65 50 08 83 3 . .0 01 13 33 33 33 34 4 age2 . .0 00 00 00 05 58 82 2 . .0 00 00 00 03 33 37 7 . .0 00 00 00 02 24 45 5 . .0 00 00 01 14 42 25 5 badhealth . .5 53 37 75 54 45 5 . .6 66 61 13 35 52 26 6 - -. .1 12 23 38 80 07 76 6 . .0 01 14 44 42 25 5 nch02 . .2 24 49 94 44 48 8 . .2 26 65 53 31 16 62 2 - -. .0 01 15 58 86 68 82 2 . .0 02 25 55 50 06 66 6 b = consistent under Ho and Ha; obtained from xtlogit B = inconsistent under Ha, efficient under Ho; obtained from xtlogit Test: Ho: difference in coefficients not systematic chi2(8 8) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 1 14 49 9. .7 76 6 Prob>chi2 = 0 0. .0 00 00 00 0 Random effects is rejected again.
Random effects probit No fixed effects command available, as there does not exist a sufficient statistic allowing the fixed effects to be conditioned out of the likelihood. Random-effects probit regression Number of obs = 2 21 12 26 64 4 Group variable: p pi id d Number of groups = 2 27 76 64 4 Random effects u_i ~ G Ga au us ss si ia an n Obs per group: min = 1 1 avg = 7 7. .7 7 max = 1 13 3 Wald chi2(9 9) = 9 99 95 5. .5 53 3 Log likelihood = - -1 10 03 37 70 0. .5 50 01 1 Prob > chi2 = 0 0. .0 00 00 00 0 PM Coef. Std. Err. z P>|z| [95% Conf. Interval] partner . .0 03 33 34 40 01 17 7 . .0 03 39 99 93 31 11 1 0 0. .8 84 4 0 0. .4 40 03 3 - -. .0 04 44 48 86 61 18 8 . .1 11 11 16 66 65 51 1 get_pnr . .0 01 18 83 35 51 13 3 . .0 07 75 57 74 42 28 8 0 0. .2 24 4 0 0. .8 80 09 9 - -. .1 13 30 01 10 01 19 9 . .1 16 66 68 80 04 45 5 lose_pnr . .7 76 64 46 66 65 56 6 . .0 08 80 00 07 77 72 2 9 9. .5 55 5 0 0. .0 00 00 0 . .6 60 07 77 71 17 73 3 . .9 92 21 16 61 14 4 female . .3 39 92 24 42 27 76 6 . .0 04 40 07 75 55 52 2 9 9. .6 63 3 0 0. .0 00 00 0 . .3 31 12 25 54 48 88 8 . .4 47 72 23 30 06 63 3 ue_sick . .4 41 18 89 97 77 77 7 . .0 04 48 86 68 81 1 8 8. .6 61 1 0 0. .0 00 00 0 . .3 32 23 35 56 64 48 8 . .5 51 14 43 39 90 06 6 age - -. .0 00 07 77 73 30 06 6 . .0 00 05 54 43 30 09 9 - -1 1. .4 42 2 0 0. .1 15 55 5 - -. .0 01 18 83 37 75 5 . .0 00 02 29 91 13 38 8 age2 . .0 00 00 00 02 20 01 1 . .0 00 00 00 05 55 52 2 0 0. .3 36 6 0 0. .7 71 15 5 - -. .0 00 00 00 08 88 8 . .0 00 00 01 12 28 83 3 badhealth . .3 38 82 25 58 89 95 5 . .0 01 14 49 93 31 17 7 2 25 5. .6 62 2 0 0. .0 00 00 0 . .3 35 53 33 32 23 39 9 . .4 41 11 18 85 55 51 1 nch02 . .1 15 53 30 02 23 39 9 . .0 04 43 31 12 23 33 3 3 3. .5 55 5 0 0. .0 00 00 0 . .0 06 68 85 50 03 39 9 . .2 23 37 75 54 44 4 _cons - -1 1. .6 65 57 78 89 95 5 . .1 11 16 65 50 01 19 9 - -1 14 4. .2 23 3 0 0. .0 00 00 0 - -1 1. .8 88 86 62 23 35 5 - -1 1. .4 42 29 95 55 56 6 /lnsig2u - -. .4 44 47 75 55 52 25 5 . .0 05 55 52 29 92 27 7 - -. .5 55 55 59 92 24 43 3 - -. .3 33 39 91 18 80 07 7 sigma_u . .7 79 99 94 49 94 4 . .0 02 22 21 10 03 31 1 . .7 75 57 73 32 25 55 5 . .8 84 44 40 01 10 05 5 rho . .3 38 89 99 94 42 28 8 . .0 01 13 31 15 53 34 4 . .3 36 64 44 49 91 1 . .4 41 16 60 00 08 85 5 Likelihood-ratio test of rho=0: chibar2(01) = 2 20 05 56 6. .2 20 0 Prob >= chibar2 = 0 0. .0 00 00 0
Why arent the sets of coefficients more similar? Partner Get partner Lose partner Female UE/sick Age Age-squared Bad health Kids 0-2 Cons Logit 0.057 0.032 1.310 *** 0.686 *** 0.713 *** -0.013 0.000 0.661 *** 0.265 *** -2.871 *** Probit 0.033 *** 0.018 *** 0.765 *** 0.392 ** 0.419 *** -0.007 -0.000 0.383 ** 0.153 *** -1.658 *** Remember the conversion scale from Week 1