Maximum Likelihood Estimation for Random Effects Models in Econometric Analysis

part 6 mle for re models 1 34 n.w
1 / 36
Embed
Share

Learn about the Maximum Likelihood Estimation (MLE) process for Random Effects Models in Econometric Analysis of Panel Data, as explained by William Greene from the University of South Florida. Topics covered include the Random Effects Model, Error Components Model, Notation for Generalized Regression Model, and more. Understand the algebra and calculations involved in MLE for panel data analysis.

  • Econometrics
  • Panel Data
  • Random Effects Models
  • Maximum Likelihood Estimation
  • William Greene

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Part 6: MLE for RE Models [ 1/34] Econometric Analysis of Panel Data William Greene Department of Economics University of South Florida

  2. Part 6: MLE for RE Models [ 2/34] The Random Effects Model The random effects model it +c + , observation for person i at time t +c + , T observations in group i + + , note (c ,c ,...,c ) + + , T observations in the sample c=( , ,... ) , c c i=1 y = = = y x X X c X c it i it y i i i i i i = c i i i i i i i N i=1 c = i 1 2 N iT by 1 vector N ciis uncorrelated with xitfor all t; E[ci|Xi] = 0 E[ it|Xi,ci]=0

  3. Part 6: MLE for RE Models [ 3/34] Error Components Model Generalized Regression Model it = y E[ | E[ | E[u | E[u | = y + +u 0 0 ] + +u for T observations X i x b X X X X it it i 2 2 u 2 u 2 u 2 u + + = = = = ] ] it 2 it i 2 u 2 2 u 2 = Var[ +u ] i i ] i i i 2 i i 2 u 2 u 2 2 u + 2 u i i i i i i

  4. Part 6: MLE for RE Models [ 4/34] Notation for GR Model 2 2 u 2 u 2 u 2 u + 2 u 2 2 u + = Var[ +u ] i i i 2 u 2 u 2 2 u + 2 2 u + = I ii T T T i i i 2 2 u + = I ii T i = 0 i 0 0 0 1 (Note these differ only in the dimension T) 2 = Var[ w X | ] i 0 0 N

  5. Part 6: MLE for RE Models [ 5/34] Maximum Likelihood of the REM Assuming normality of Treat T joint observations on [( and u. it i , ,... ),u ] as one T i1 i2 iT i i i ( . ) + variate observation. The mean vector of u i is zero i i 2 2 u ( + and the covariance matrix is The j oint density for ) = = I ii i ( + u i y - X ) is i i exp i i T /2 1/2 -1 i + = f( u ) i (2 ) | | ( y - X ) ( y - X ) 1 2 i i i i i i i i N i=1 lnL= lnL where i -1 2 2 2 u -1 i + + lnL ( , , ) = T ln2 ln| | ( y - X ) ( y - X ) i i i i i i i - 2 1 ( ) ( ) -1 i + + + + = T ln2 ln| | u i u i i i i i i i

  6. Part 6: MLE for RE Models [ 6/34] MLE Panel Data Algebra Second term in LnLi + + t t t 1 Roots are (real since is symmetric) solutions to = = + or ( Any vector who se elements sum to zero ( is a characteristic vector that corresponds to root = 1. There are T -1 such vectors, so T - 1 of the roots are 1. Suppose 0. Premultiply by t i c i ( ) ( 1) = T ( )=( divide by it to obtain the remaining root =1+T , | | i 2 2 u 2 2 2 = I ii = [ I ii ]= A i T T 2 | |=( i ) , = a characteristic root of A i i = A Ac = c t t 2 2 Ac c c ii c i i c = ) ( - 1) c = =0). I.e., ( i c - 1) 0 i i i - o find 1) i c 2 2 i i i c = - i c i c . Since i c 0, 2 . i + ln(1 T T T 2 2 2 = t T + i Therefore, | |=( i ) ( ) (1 T ) i i i ) = t 1 2 2 + Then, ln| | = T ln i i i

  7. Part 6: MLE for RE Models [ 7/34] MLE Panel Data Algebra Third term in LogLi ( ) i i y - X -1 i ( y - X ) i i 2 1 -1 i = I ii T T 2 2 2 u + i i So, 2 1 i i i -1 i = I ii T i i i 2 2 2 u + (T ) T i 2 2 + 1 i = 2 i i i 2 2 u i

  8. Part 6: MLE for RE Models [ 8/34] MLE Panel Data Algebra Combine Terms in lnLi -1 2 -1 2 i -1 i = + + logL T ln2 ln| | i i i i 2 2 + i i T (T ) 1 i 2 2 = + + + i + T ln2 T ln ln(1 T ) i i i 2 2 2 u i N = lnL lnL = i 1 i 2 2 + i i T (T ) -1 2 1 i 2 N N 2 N = + + i [(ln2 ln ) T + ln(1 T )] = i 1 = i 1 = i 1 i i 2 2 2 u 2 i 2 2 2 2 2 + i i T 2 i i 2 (T ) (T ) i i (T ) 1 2 2 u 2 = = = since / , 2 2 u 2 2 + + T T i i i 2 i i -T 2 (T ) 1 1 i 2 2 = + + i logL [(ln2 ln ) +ln(1 T )] i i i 2 2 + 2 T i

  9. Part 6: MLE for RE Models [ 9/34] Maximizing the Log Likelihood by Iterated FGLS Difficult: Brute force + some elegant theoretical results: See Baltagi, pp. 22-23. (Back and forth from GLS to 2and u2.) Somewhat less difficult and more practical: At any iteration, given estimates of 2and u2the estimator of is GLS (of course), so we iterate back and forth between these. See Hsiao, pp. 39-40. 2 2 u ) 0. Begin iterations with, say, FGLS estimates of , , . 2 2 u,r 2 2 u,r 1. Given and , compute by FGLS ( , ,r r+1 ,r ( ) 2 T t=1 N i=1 i + + it,r 1 (T i,r 1 it 2 = 2. Given compute = , y x + r+1 ,r+1 it,r 1 it r+1 N =1 1) i i N N i=1 i.r 1 2 1 T T t=1 it,r 1 2 u,r+1 = compute = , + i + + i,r 1 i Return to step 1 and repeat until - = . 0 r+1 r

  10. Part 6: MLE for RE Models [ 10/34] Direct Maximization of LogL by Nonlinear Optimization Simpler : Take advantage of the invariance of maximum likelihood estimators to transformations of the parameters. Let = 1/ , = / , R lnL (1 / 2)[ ( Q (T ) ) i Can be maximized using ordinary optimization methods (not Newton, as suggested by Hsiao). Treat as a standard nonlinear optimization problem. Solve with iterative, gradient methods. Estimate , and , then compute and use the delta method to obtain standard errors. 2 2 2 u 2 = = + lnR = T 1, Q /R , + i i i i T ln2 ] i 2 2 2 i i + + T ln i i i i i 2 2 2 2 2 = = u 1 / and /

  11. Part 6: MLE for RE Models [ 11/34] 2

  12. Part 6: MLE for RE Models [ 12/34] 2 Step FGLS vs. MLE. The estimates of u2are quite different. The large difference in the estimate of u2 explains the difference between MLE and 2 step FGLS

  13. Part 6: MLE for RE Models [ 13/34] Maximum Simulated Likelihood Assuming where v ~ N[0,1]. Then y = observed data, all observations would be independent, and ln f(y | ,v ) 1 / it x + = Let 1 / The log of the joint densi ty for T observations with common v is and u are normally distributed. Write u = x v it i i u i it + + v . If v were i it u i it i it 2 2 2 = + X v 2[ln2 ln (y - x - v ) / ] it i it u y i We would estimate ( , we would then estimate ) by linear regression of on ( , ) and by the average squared residual. u 2 2 2 i i 2 it T t 1 2 2 2 u = + lnL ( , The conditional log likelihood for the sample is then , | v ) ( 1 / 2)[ln2 ln (y - x - v ) ] i = i i it u i it T t 1 2 N 2 2 2 u = + lnL( , , | ) v ( 1 / 2)[ln2 l n (y - x - v ) ] i = i 1 = it u i

  14. Part 6: MLE for RE Models [ 14/34] Likelihood Function for Individual i The conditional log likelihood for the sample is then it iT t 1 2 N 2 2 2 u = + lnL( , , | ) v ( 1 / 2)[ln2 ln (y - x - v ) ] = i 1 = it u i 2 The unconditional log likelihood is obtained by integrating v out of L ( , , | v ); i i u i it 2 2 2 exp[ ( / 2)(y - 2 x - v ) ] T t 1 2 u = lnL ( , , ) ln (v )dv it u i i = i i i 2 = u ln E L ( , , | v ) v i i i

  15. Part 6: MLE for RE Models [ 15/34] Log Likelihood Function The full log likelihood function that needs to be maximized is N 2 = u = lnL lnL ( , , ) i = i 1 it 2 2 2 exp[ ( / 2)(y - 2 x - v ) ] N T t 1 = ln (v )dv it u i i = i i = i 1 ( ) N 2 = lnE L ( , , | v ) v i u i = i 1 i , This is the function to be maximized to obtain the MLE of [ , ] u

  16. Part 6: MLE for RE Models [ 16/34] Computing the Expected LogL 2 i = How to compute the integral: First note, (v ) exp( v / 2) / 2 i it 2 2 2 2 i exp[ ( / 2)(y - 2 | v ) x - v ) ] exp( v / 2) T t 1 dv it u i i = i 2 2 = u E L ( , , v i i i (1) Numerical (Gauss-Hermite) quad remarkably accurate; rature for integrals of this form is H 2 v g(v)e dv w g(a ) h h = h 1 Example: Hermite Quadrature Nodes and Weights, H=5 Nodes: -2.02018,-0.95857, 0.00000, 0.95857, 2.02018 Weights: 1.99532,0.39362, 0.94531, 0.39362, 1.99532 Applications usually use many more points, up to 96 and Much more accurate (more digits) representations.

  17. Part 6: MLE for RE Models [ 17/34] 32 Point Hermite Quadrature Nodes: use zh and zh 0.194840741569399326708741289532, 0.584978765435932448466957544011, 0.976500463589682838484704871982, 1.37037641095287183816170564864, 1.76765410946320160462767325853, 2.16949918360611217330570559502, 2.57724953773231745403092930114, 2.99249082500237420628549407606, 3.41716749281857073587392729564, 3.85375548547144464388787292109, 4.30554795335119844526348653193, 4.77716450350259639303579405689, 5.27555098651588012781906048140, 5.81222594951591383276596615366, 6.40949814926966041217376374153, 7.12581390983072757279520760342/ Weights wh with zh and -zh 3.75238352592802392866818389D-1, 2.77458142302529898137698919D-1, 1.51269734076642482575147115D-1, 6.04581309559126141865857608D-2, 1.75534288315734303034378446D-2, 3.65489032665442807912565712D-3, 5.36268365527972045970238102D-4, 5.41658406181998255800193939D-5, 3.65058512956237605737032419D-6, 1.57416779254559402926869258D-7, 4.09883216477089661823504101D-9, 5.93329146339663861451156822D-11, 4.21501021132644757296944521D-13,1.19734401709284866582868190D-15, 9.23173653651829223349442007D-19,7.31067642738416239327427846D-23/

  18. Part 6: MLE for RE Models [ 18/34] Quadrature A change of variable is needed to get it into the right form: exp( v / 2) 1 L = 2 it 2 2 2 2 i exp[ ( / 2)(y - 2 x - v ) ] T t 1 dv it u i i = i i it 2 2 2 exp[ ( / 2)(y - x - u h z ) ] 1 H T t 1 = L w it i = i,Q h = h 1 2 2 , and the problem is solved by maximizing with respect to , u N = lnL lnL Q i,Q = i 1 it 2 2 2 exp[ ( / 2)(y - x - u h z ) ] 1 N H T t 1 = ln w it i = h = i 1 = h 1 2 (Maximization will be co ntinued later in the semester.)

  19. Part 6: MLE for RE Models [ 19/34] Compute the Integral by Simulation The unconditional log likelihood is an expected value; lnL ( , , ) 2 u i it 2 2 2 exp[ ( / 2)(y - 2 x - v ) ] T t 1 = ln ( )d v v it u i i = i i 2 = u alue c n b lnE L ( , , | v ) = E g( ) v v i i v i i An expected v on and averaging the functions of them v a e 'estimated' by sampling observations i it 2 2 2 exp[ ( / 2)(y - x - v ) ] 1 R E g(v ) R T t 1 = it u ir i = v i = r 1 2 The unconditional log likelihood function is then This is a function of ( The random draws on v become part of the data, and the function is maxim ized with respect to the unknown parameters. it 2 2 2 exp[ ( / 2)(y - x - v ) ] 1 N R T t 1 lnR it u ir i = = i 1 = r 1 ,v ,...,v ),i 2 2 = , , | , y X 1,...,N u i i i,1 i,R i,r

  20. Part 6: MLE for RE Models [ 20/34] Convergence Results for MSL 2 , Target is expected log likelihood: lnE [L( , |v )] v u i i Simulation estimator based on random sampling from population of v i it 2 2 2 exp[ ( / 2)(y - x - v ) ] 1 R N R T t 1 2 , = lnL ( , ) ln it u ir i = S u = = i 1 r 1 2 The essential result is plim(R 2 2 , = , )lnL ( , ) lnE [L( , |v )] S u v u i i Conditions: (1) General regularity and smoothness of the log likelihood (2) R increases faster than N. Result: Maximizer of lnL ( u , , ) converges to the maximizer of lnE [L( , 2 2 , |v )]. S v u i i

  21. Part 6: MLE for RE Models [ 21/34] MSL vs. ML FGLS MLE MSL 2 .023119 .023534 .023779 u2 .102531 .708869 .576658

  22. Part 6: MLE for RE Models [ 22/34] Two Level Panel Data Nested by construction Unbalanced panels No real obstacle to estimation Some inconvenient algebra. In 2 step FGLS of the RE, need 1/T to solve for an estimate of u2. What to use? = Q 1 / T (1/Mean of T) i N (1/T)=(1/N) Q =[ (1 / T) (Arithmetic mean of 1/T) (1/T)] (Stata, harmonic mean of 1/T) = i 1 i i N i=1 1/N H i i

  23. Part 6: MLE for RE Models [ 23/34] Balanced Nested Panel Data Zi,j,k,t = test score for student ijkt; student t, teacher k, school j, district i L = 2 school districts, i = 1, ,L Mi= 3 schools in each district, j = 1, ,Mi Nij= 4 teachers in each school, k = 1, ,Nij Tijk= 20 students in each class, t = 1, ,Tijk Antweiler, W., Nested Random Effects Estimation in Unbalanced Panel Data, Journal of Econometrics, 101, 2001, pp. 295-313.

  24. Part 6: MLE for RE Models [ 24/34] Nested Effects Model ijkt = + + + + y Strict exogeneity, all parts uncorrelated. Normality assumption added later Var[u w v Overall covariance matrix is block diagonal over i, eac is block diagonal over j, each of these, in turn, is block diagonal over k, and each lowest level block has the form of we saw earlier; x u w v ijkt ijk i ij ijkt 2 2 2 v 2 + + + + + + u w ]= ijk i ij ijkt h diagonal block 2 = + [ I ii ] ijk

  25. Part 6: MLE for RE Models [ 25/34] GLS with Nested Effects Define = = = 2 2 1 2 2 2 v 2 u + + + 2 2 2 u + = + = + T NT MNT T NT + + = + T NT MNT 2 v 2 u 2 2 1 2 2 2 v 2 w 2 v 2 u 2 2 w T GLS is equivalent to OLS regression of = on the same transformation of obtained by "three group-wise between estimators and the within estimator for the innermost group." x y y 1 y . y .. y ... ijkt ijkt ijk ij i 1 1 2 2 3 . FGLS estimates are ijkt

  26. Part 6: MLE for RE Models [ 26/34] Unbalanced Nested Data With unbalanced panels, all the preceding results fall apart. GLS, FGLS, even fixed effects become analytically intractable. (Unless you just compute all the dummy variables.) The log likelihood is very tractable Note a collision of practicality with nonrobustness. (Normality must be assumed.)

  27. Part 6: MLE for RE Models [ 27/34] Log Likelihood (1) 2 u 2 2 v 2 2 w 2 = = = Define: , , . u v w T N k 1 ijk = + = Construct: 1 T , ij = ijk ijk u ij ijk ij M j 1 = = + = 1 , i ij ij v i ij = + 1 i w i T t 1 ijkt 2 ijkt = = Sums of squares: A e , e y x ijk = ijk ijk t ijkt B B T t 1 N k 1 ijk ij M j 1 = = = = B e , B , B ijk = ij = i ijk ijkt ij i ijk ij

  28. Part 6: MLE for RE Models [ 28/34] Log Likelihood (2) total number of observations -1 lnL= [Hln(2 ) 2 ln = H 2 L + { = i 1 M j 1 + i { i = N k 1 A + ij ln { ij = 2 ijk 2 2 ij 2 B B 2 i 2 B ijk 2 + ln } - }- }] u v w ijk ijk ij i = 0.) (For 3 levels instead of 4, set L = 1 and w

  29. Part 6: MLE for RE Models [ 29/34] Maximizing Log L Antweiler provides analytic first derivatives for gradient methods of optimization. Ugly to program. Numerical derivatives: Let be the full vector of K+4 parameters. Let perturbation vector, with =max( , in the rth position and zero in the other K+3 positions. lnL( ) lnL( lnL 2 = r 0 | |) 1 r + ) r r r

  30. Part 6: MLE for RE Models [ 30/34] Asymptotic Covariance Matrix "Even with an analytic gradient, however, the Hessian matrix, is typically obtained through numeric approximation methods." Read "the second derivatives are too complicated to derive, much less prog ram." Also, since logL is not a sum of terms, the BHHH estimator is not useable. Numerical second derivatives were used.

  31. Part 6: MLE for RE Models [ 31/34] An Appropriate Asymptotic Covariance Matrix The expected Hessian is block diagonal. We can isolate . lnL 1 - x x 2 N k 1 T t 1 ijkt M j 1 L = ij ijk i = i 1 = = = ijkt 2 ( )( ) 1 N k 1 T t 1 T t 1 ijkt M j 1 L x x W 2 ij ijk ijk i = i 1 = = = = ijkt ijk ( ) ( ) 1 1 1 N k 1 T t 1 N k 1 T t 1 ijkt M j 1 L x x v 2 ij ijk ij ijk i = i 1 = = = = = ijkt ij ijk ijk ( ) ( ) 1 1 1 1 N k 1 T t 1 N k 1 T t 1 ijkt M j 1 M j 1 L x x u 2 ij ijk ij ijk i i = i 1 = = = = = = ijkt ij ijk ij ijk The inverse of this, evaluated at the MLEs provides the appropriate estimated asymptotic covariance matrix for . Standard errors for the variance estimators are not needed.

  32. Part 6: MLE for RE Models [ 32/34] Some Observations Assuming the wrong (e.g., nonnested) error structure Still consistent GLS with the wrong weights Standard errors (apparently) biased downward (Moulton bias) Adding time effects or other nonnested effects is very challenging. Perhaps do with fixed effects (dummy variables).

  33. Part 6: MLE for RE Models [ 33/34] An Application Y1jkt = ln of atmospheric sulfur dioxide concentration at observation station k at time t, in country i. H = 2621, 293 stations, 44 countries, various numbers of observations, not equally spaced Three levels, not 4 as in article. Xjkt =1,ln(GDP/km2),ln(K/L),ln(Income), Suburban, Rural,Communist,ln(Oil price), average temperature, time trend.

  34. Part 6: MLE for RE Models [ 34/34] Estimates Dimension Random Effects Nested Effects x1 . . . -10.787 (12.03) -7.103 (5.613) x2 C S T 0.445 (7.921) 0.202 (2.531) x3 C . T 0.255 (1.999) 0.371 (2.345) x4 C . T -0.714 (5.005) -0.477 (2.620) x5 C S T -0.627 (3.685) -0.720 (4.531) x6 C S T -0.834 (2.181) -1.061 (3.439) x7 C . . 0.471 (2.241) 0.613 (1.443) x8 . . T -0.831 (2.267) -0.08 x9 C S T -0.045 (4.299) -0.044 (3.719) x10 . . T -0.043 (1.666) -0.046 (10.927) 0.330 v 1.347 lnL -2645.4 -2606.0 (t ratios in parentheses) 9 (2.410) 2 0.329 1.807 1.017 u

  35. Part 6: MLE for RE Models [ 35/34] APPENDIX APPENDIX

  36. Part 6: MLE for RE Models [ 36/34] Gauss-Hermite Quadrature Change of Variable it 2 2 2 exp[ ( / 2)(y - 2 x - v ) ] T t 1 (v )dv it u i i = i i 2 i = (v ) exp( v / 2) / 2 i = Make a change of variable to a v / 2 , v = 2 a , dv = 2 da i i i i i i it 2 2 2 exp[ ( / 2)(y - x - 2a ) ] 2 1 1 T t 1 it 2 u i 2 i exp( a ) 2 da i = i 2 it 2 2 2 exp[ ( / 2)(y - x - 2a ) ] 2 1 2 T t 1 it 2 u i 2 i exp( a ) ] da i = i 2 ( ) 1 it T t 1 2 i 2 2 2 exp( a ) exp[ ( / 2)(y - x - 2a ) ] da i = it u i i 2 1 1 ( ) 2 i H h 1 exp( a )g a da w g(a ) = i i h h 2 2

Related


More Related Content