Bayesian Econometric Analysis of Panel Data: A Comprehensive Overview

1 / 58

Embed Share

This material delves into Bayesian econometric analysis of panel data, exploring Bayesian econometric models, relevant sources, software tools, philosophical underpinnings, objectivity vs. subjectivity, and paradigms in classical and Bayesian approaches. It discusses the use of new information to update existing beliefs about probabilities of events, the interplay between objectivity and subjectivity in econometrics, and the contrasting methodologies of classical and Bayesian paradigms in theory formulation and evidence assessment.

sekougr Follow

Uploaded on Mar 17, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Part 25: Bayesian [1/58] Econometric Analysis of Panel Data William Greene Department of Economics University of South Florida

Econometric Analysis of Panel Data 25. Bayesian Econometric Models for Panel Data

Part 25: Bayesian [3/58] Sources Lancaster, T.: An Introduction to Modern Bayesian Econometrics, Blackwell, 2004 Koop, G.: Bayesian Econometrics, Wiley, 2003 Bayesian Methods, Bayesian Data Analysis, (many books in statistics) Papers in Marketing: Allenby, Ginter, Lenk, Kamakura, Papers in Statistics: Sid Chib, Books and Papers in Econometrics: Arnold Zellner, Gary Koop, Mark Steel, Dale Poirier, John Geweke

Part 25: Bayesian [4/58] Software Stata, Limdep, SAS, etc. R, Matlab, Gauss WinBUGS Bayesian inference Using Gibbs Sampling

Part 25: Bayesian [5/58] http://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-winbugs/

Part 25: Bayesian [6/58] A Philosophical Underpinning A method of using new information to update existing beliefs about probabilities of events Bayes Theorem for events. (Conceived for updating beliefs about games of chance) Pr(A,B) Pr(B) Pr(B| A)Pr(A) Pr(B) = Pr(A |B) =

Part 25: Bayesian [7/58] On Objectivity and Subjectivity Objectivity and Frequentist methods in Econometrics The data speak Subjectivity and Beliefs Priors Evidence Posteriors Science and the Scientific Method

Part 25: Bayesian [8/58] Paradigms Classical Formulate the theory Gather evidence Evidence consistent with theory? Theory stands and waits for more evidence to be gathered Evidence conflicts with theory? Theory falls Bayesian Formulate the theory Assemble existing evidence on the theory Form beliefs based on existing evidence Gather evidence Combine beliefs with new evidence Revise beliefs regarding the theory

Part 25: Bayesian [9/58] Applications of the Paradigm Classical econometricians doggedly cling to their theories even when the evidence conflicts with them that is what specification searches are all about. Bayesian econometricians NEVER incorporate prior evidence in their estimators priors are always studiously noninformative. (Informative priors taint the analysis.) As practiced, Bayesian analysis is not Bayesian.

Part 25: Bayesian [10/58] Likelihoods (Frequentist) The likelihood is the density of the observed data conditioned on the parameters Inference based on the likelihood is usually maximum likelihood (Bayesian) A function of the parameters and the data that forms the basis for inference not a probability distribution The likelihood embodies the current information about the parameters and the data

Part 25: Bayesian [11/58] The Likelihood Principle The likelihood embodies ALL the current information about the parameters and the data Proportional likelihoods should lead to the same inferences

Part 25: Bayesian [12/58] Application: (1) 20 Bernoulli trials, 7 successes (Binomial) 20 L( ;N 20,s 7) = = = 7 13 (1 ) 7 (2) N Bernoulli trials until the 7thsuccess (Negative Binomial) 19 L( ;N 20,s 7) 6 7 13 = = = (1 )

Part 25: Bayesian [13/58] Inference Classical: (1) The MLE is =7/20 (2) There is no estimator. We have a sample of 1 from the distribution of N. What can be said about ? Apparently nothing. Bayesian: The post 0 Inference about , whatever it is, is the same. A. Bayesian analysis adheres to the likelihood principle B. Data and parameters are treated the same. erior for both scenarios is L( ;N=20,s=7)P( ) 7 13 (1- ) P( ) = 1 1 7 13 L( ;N=20,s=7)P( )d (1- ) P( )d 0

Part 25: Bayesian [14/58] The Bayesian Estimator The posterior distribution embodies all that is believed about the model. Posterior = f(model|data) = Likelihood( ,data) * prior( ) / P(data) Estimation amounts to examining the characteristics of the posterior distribution(s). Mean, variance Distribution Intervals containing specified probabilities

Part 25: Bayesian [15/58] Priors and Posteriors The Achilles heel of Bayesian Econometrics Noninformative and Informative priors for estimation of parameters Noninformative (diffuse) priors: How to incorporate the total lack of prior belief in the Bayesian estimator. The estimator becomes solely a function of the likelihood Informative prior: Some prior information enters the estimator. The estimator mixes the information in the likelihood with the prior information. Improper and Proper priors P( ) is uniform over the allowable range of Cannot integrate to 1.0 if the range is infinite. Salvation improper, but noninformative priors will fall out of the posterior.

Part 25: Bayesian [16/58] Diffuse (Flat) Priors N s s N s E.g., the binomial example: L( ;N,s)= (1 ) Uninformative Prior (?): Uniform (flat) P( )=1, 0 N (1 s P( |N,s)= N (1 1 s + (s s N s ) 1 s N s ) + 1) (s (N (1 s = 2) + (N 1) 1 s N s ) 1d + 0 + (N 2) s N s ) = = (1 a Beta distribution + 1) (N s 1) s+1 s+1 N+2 Posterior mean = = (N-s+1)+(s+1) For the example, N=20, s=7. MLE = 7/20=.35. Posterior Mean =8/22=.3636 > MLE. Why? The prior was informative. (Prior mean = .5)

Part 25: Bayesian [17/58] Conjugate Prior Mathematical device to produce a tractable posterior This is a typical application N L( ;N,s)= (1 ) s + (N 1) 1) (N N s N s s s = (1 ) + + (s s 1) (a+b) (a) (b) a 1 b 1 , p( )= Use a conjugate beta prior (1 ) + (N 2) (a+b) (a) (b) (a+b) (a) (b) a 1 b 1 s N s ) (1 (1 ) + + (s 1) (N (N 1) (N (1 s 1) = Po sterior + 2) 1 a 1 b 1 s N s ) (1 (1 ) d + + (s s 1) 0 s a 1 + + N s b 1 ) = = a Beta distribution. 1 s a 1 + + N s b ) 1 (1 d 0 s+a N+a+b Posterior mean = (we used a = b = 1 before)

Part 25: Bayesian [18/58] THE Question Where does the prior come from?

Part 25: Bayesian [19/58] Large Sample Properties of Posteriors Under a uniform prior, the posterior is proportional to the likelihood function Bayesian estimator is the mean of the posterior MLE equals the mode of the likelihood In large samples, the likelihood becomes approximately normal the mean equals the mode Thus, in large samples, the posterior mean will be approximately equal to the MLE.

Part 25: Bayesian [20/58] Reconciliation A Theorem (Bernstein-Von Mises) The posterior distribution converges to normal with covariance matrix equal to 1/N times the information matrix (same as classical MLE). (The distribution that is converging is the posterior, not the sampling distribution of the estimator of the posterior mean.) The posterior mean (empirical) converges to the mode of the likelihood function. Same as the MLE. A proper prior disappears asymptotically. Asymptotic sampling distribution of the posterior mean is the same as that of the MLE.

Part 25: Bayesian [21/58] Mixed Model Estimation MLWin: Multilevel modeling for Windows http://www.bristol.ac.uk/cmm/software/mlwin/ Uses mostly Bayesian, MCMC methods Markov Chain Monte Carlo (MCMC) methods allow Bayesian models to be fitted, where prior distributions for the model parameters are specified. By default MLwin sets diffuse priors which can be used to approximate maximum likelihood estimation. (From their website.)

Part 25: Bayesian [22/58]

Part 25: Bayesian [23/58] Bayesian Estimators First generation: Do the integration (math) f(data| )p( ) f(data) = E( |data) d Contemporary - Simulation: (1) Deduce the posterior (2) Draw random samples of draws from the posterior and compute the sample means and variances of the samples. (Relies on the law of large numbers.)

Part 25: Bayesian [24/58] The Linear Regression Model Likelihood 2 2 2 -n/2 -[(1/(2 ))(y-X ) (y-X )] L( , |y,X)=[2 ] e 2 = Transformation using d=(N-K) and s 1 ( ) ( 2 (1/d)( 1 y Xb ) ( y Xb ) 1 2 1 2 1 2 = y X y X ) ds ( b ) X X ( b ) 2 2 2 2 Diffuse uniform prior for , conjugate gamma prior for Joint Posterior d 1 + + 2 v 2 [ds ] (d )'[ 1 2 2 1/2 2 ds (1/ ) K /2 2 1 f( , | , ) y X e [2 ] | ( X X ) | 2 + 2) 2 1 1 exp{ (1/2)( -b ( X X ) ] ( -b )}

Part 25: Bayesian [25/58] Marginal Posterior for 2 After integrating out of the joint posterior: ( /2)[2 ] ( 2) ( ) ds + b X X + 2 v 2 + [ ds ] d + K 1/2 K /2 | X X | d ( | , ) f y X . d K + 2 /2 [ ( b )] 1 2 n-K K 2 1 Multivariate t with mean and variance matrix [ s ( ) ] b X'X n 2 The Bayesi noninformative. The only information available is in the likelihood. an 'estimator' equals the MLE. Of course; the prior was

Part 25: Bayesian [26/58] Nonlinear Models and Simulation Bayesian inference over parameters in a nonlinear model: 1. Parameterize the model 2. Form the likelihood conditioned on the parameters 3. Develop the priors joint prior for all model parameters 4. Posterior is proportional to likelihood times prior. (Usually requires conjugate priors to be tractable.) 5. Draw observations from the posterior to study its characteristics.

Part 25: Bayesian [27/58] Simulation Based Inference Form the likelihood L( ,data) Form the prior p( ) Form the posterior K is a constant that makes the whole thing integrate to 1. Posterior mean = K p( )L( ,data)d p( )L( ,data) where K 1 R S r Estimate the pos terior mean by E( |data )=R = r 1 by simulating draws from the posterior.

Part 25: Bayesian [28/58] A Practical Problem Sampling from the joint posterior may be impossible. E.g., linear regression. + v 1 + 2 v 2 [vs ] (v ) [ b 1 2 2 2 vs (1/ ) K /2 2 1 1/2 f( , | , ) y X e [2 ] | ( X X ) | 2 + 2) 2 1 1 What is this??? T o do 'simulation based estimation' here, we need joint observations on ( , ). exp( (1/2)( ( X X ) ] ( b )) 2

Part 25: Bayesian [29/58] A Solution to the Sampling Problem 2 The joint posterior, p( , For inference about , a sample from the marginal posterior, p( |data) would suffice. For inference about , a sample from the marginal p osterior of , p( |data) would suffice. Can we deduce these? For this problem, we do have conditionals: p( | ,data) = N[ , ( ) ] (y p( | ,data) = K 2 |data) is intractable. But, 2 2 2 2 2 1 b X'X i 2 x ) 2 = a gamma distributio n i i Can we use this information to sample from p( |data) and p( |data)? 2

Part 25: Bayesian [30/58] The Gibbs Sampler Target: Sample from marginals of f(x1, x2) = joint distribution Joint distribution is unknown or it is not possible to sample from the joint distribution. Assumed: f(x1|x2) and f(x2|x1) both known and samples can be drawn from both. Gibbs sampling: Obtain one draw from x1,x2by many cycles between x1|x2 and x2|x1. Start x1,0anywhere in the right range. Draw x2,0from x2|x1,0. Return to x1,1from x1|x2,0and so on. Several thousand cycles produces the draws Discard the first several thousand to avoid initial conditions. (Burn in) Average the draws to estimate the marginal means.

Part 25: Bayesian [31/58] Bivariate Normal Sampling 1 0 0 1 Draw a random sample from bivariate normal , v v u u u u 1 1 1 = (1) Direct approach: where are two 2 2 2 r r 1 0 independent standard normal draws (easy) and = 1 2 1 1 2 = = 1 , such that '= . 1 . 2 2 (2) Gibbs sampler: v | v ~ N v , 1 1 2 2 2 v | v ~ N v , 1 2 1 1

Part 25: Bayesian [32/58] Gibbs Sampling for the Linear Regression Model 2 The joint posterior, p( , For inference about , a sample from the marginal posterior, p( |data) would suffice. For inference about , a sample from the marginal posterior Can we deduce these? For this problem, we do have conditionals: (1) p( | ,data) = N[ , ( ) ] (y ) (2) p( | ,data) = K a gamma distribution 2 |data) is intractable. But, 2 2 2 of , p( |data) would suffice. 2 2 1 b X'X i 2 x 2 = i i 2 C Gibbs Sampler: (a) Draw (b) Draw (c) Draw (d) Draw an we use this information to sample from p( |data) and p( |data)? 2 from (1) using s from (2) using from (1) using from (2) using 0 2 1 0 2 1 1 ... and so on. Use several thousand draws. 1 2 2

Part 25: Bayesian [33/58] Application the Probit Model i = (a) y * ~ N[0,1] x + i i (b) y Consider estimation of and y * (data augmentation) (1) If y* were observed, this would be a linear regression (y would not be useful since We saw in the linear model before, p( (2) If (only) were observed, y * would be a draw from the normal distribution with mean But, y gives the sign of y *. y *| ,y is a draw from the truncated normal (above if y=0, below if y=1) i = 1 if y * > 0, 0 otherwise i i i it is just sgn(y *).) | i i y *,y ) i i i i and variance 1. x i i i i

Part 25: Bayesian [34/58] Gibbs Sampling for the Probit Model (1) Choose an initial value for (maybe the MLE) (2) Generate y * by sampling N observations from the truncated normal with mean truncated above 0 if y (3) Generate by drawing a random normal vector with mean vector ( ) * and variance matrix ( (4) Return to 2 10,000 times, retaining the last 5,000 draws - first 5,000 are the (5) Estimate the posterior mean of by averaging the last 5,000 draws. (This corresponds to a uniform prior over .) i i and variance 1, x = = 0, from below if y 1. i i -1 -1 X'X X'y X'X ) 'burn in.'

Part 25: Bayesian [35/58] Generating Random Draws from f(X) The inverse probability method of sampling random draws: If F(x) is the CDF of random variable x, then a random draw on x may be obtained as F (u) where u is a draw from the standard uniform (0,1). Exampl i Truncated Normal: x= + x i = + -1 es: Exponential: f(x)= exp(- x); F(x)=1-exp(- x) x = -(1/ )log(1-u) Normal: F(x) = (x); x = -1 [1-(1-u)* ( )] for y=1; i [u (- )] for y=0. (u) -1 i -1

Part 25: Bayesian [36/58] Example: Simulated Probit ? Generate raw data Sample ; 1 - 1000 $ Create ; x1=rnn(0,1) ; x2 = rnn(0,1) $ Create ; ys = .2 + .5*x1 - .5*x2 + rnn(0,1) ; y = ys > 0 $ Namelist; x=one,x1,x2$ Matrix ; xx=x'x ; xxi = <xx> $ Calc ; Rep = 200 ; Ri = 1/Rep$ Probit ; lhs=y;rhs=x$ ? Gibbs sampler Matrix ; beta=[0/0/0] ; bbar=init(3,1,0);bv=init(3,3,0)$$ Proc = gibbs$ Do for ; simulate ; r =1,Rep $ Create ; mui = x'beta ; f = rnu(0,1) ; if(y=1) ysg = mui + inp(1-(1-f)*phi( mui)); (else) ysg = mui + inp( f *phi(-mui)) $ Matrix ; mb = xxi*x'ysg ; beta = rndm(mb,xxi) ; bbar=bbar+beta ; bv=bv+beta*beta'$ Enddo ; simulate $ Endproc $ Execute ; Proc = Gibbs $ (Note, did not discard burn-in) Matrix ; bbar=ri*bbar ; bv=ri*bv-bbar*bbar' $ Matrix ; Stat(bbar,bv); Stat(b,varb) $

Part 25: Bayesian [37/58] Example: Probit MLE vs. Gibbs --> Matrix ; Stat(bbar,bv); Stat(b,varb) $ +---------------------------------------------------+ |Number of observations in current sample = 1000 | |Number of parameters computed here = 3 | |Number of degrees of freedom = 997 | +---------------------------------------------------+ +---------+--------------+----------------+--------+---------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | +---------+--------------+----------------+--------+---------+ BBAR_1 .21483281 .05076663 4.232 .0000 BBAR_2 .40815611 .04779292 8.540 .0000 BBAR_3 -.49692480 .04508507 -11.022 .0000 +---------+--------------+----------------+--------+---------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | +---------+--------------+----------------+--------+---------+ B_1 .22696546 .04276520 5.307 .0000 B_2 .40038880 .04671773 8.570 .0000 B_3 -.50012787 .04705345 -10.629 .0000

Part 25: Bayesian [38/58]

Part 25: Bayesian [39/58] A Random Parameters Approach to Modeling Heterogeneity Allenby and Rossi, Marketing Models of Consumer Heterogeneity, Journal of Econometrics, 89, 1999. Discrete Choice Model Brand Choice Hierarchical Bayes Multinomial Probit Panel Data: Purchases of 4 brands of Ketchup

Part 25: Bayesian [40/58] Structure Conditional data generation mechanism * , = + = = = it j x ~ [0, ], 1 = it j j N x . y Utility for consumer i, choice t, brand j , , , it j i it j it j maximumutility among the J choices eatured") 1[ * ] Y y , , it j it j (constant, log price, "availability," "f , , 1 Implies a J outcome multinomial probit model.

Part 25: Bayesian [41/58] Bayesian Priors Prior Densities N [ , V ~ ], = i i + [ , w w V , ~ ] Implies N i i = ~ [ , ] (looks like chi-squared), =3, 1 Inverse Gamma v s v s j j j Priors over structural model parameters ~ [ , = V 0 ], N a = = 1 V V V I ~ [ , ], 8, 8 Wishart v v 0 0 0 0

Part 25: Bayesian [42/58] Bayesian Estimator Joint posterior mean= , , [ ,..., , ,..., | ] E V data 1 1 N J Integral does not exist in closed form. Estimate by random samples from the joint posterior. Full joint posterior is not known, so not possible to sample from the joint posterior.

Part 25: Bayesian [43/58] Gibbs Cycles for the MNP Model Samples from the marginal posteriors Marginal posterior for the individual parameters (Known and can be sampled) | , , , has a known normal distribution i data V Marginal posterior for the common parameters (Each known and each can be sa | , , data data V mpled) V | , , | V ,, data

Part 25: Bayesian [44/58] Bayesian Fixed Effects Application: Koop, et al., Hospital Cost Efficiency, Journal of Econometrics, 1997, 76, pp. 77-106 Treat individual constants as first level parameters Model=f( 1, , N, , ,data) Formal Bayesian treatment of K+N+1 parameters in the model. Stochastic Frontier as in latent variable application Bayesian counterparts to fixed effects and random effects models ??? Incidental parameters? (Almost surely, or something like it.) How do you deal with it Irrelevant There are no asymptotic properties Must be relevant estimates are numerically unstable

Part 25: Bayesian [45/58] Comparison of Maximum Simulated Likelihood Comparison of Maximum Simulated Likelihood and Hierarchical Bayes and Hierarchical Bayes Ken Train: A Comparison of Hierarchical Bayes and Maximum Simulated Likelihood for Mixed Logit Mixed Logit = = = = + ( , , ), i t j x ( , , ) 1,..., 1,..., choice situations 1,..., alternatives (may also vary) J ( , , ) i t j U i t j i t j i individuals, N T i

Part 25: Bayesian [46/58] Stochastic Structure Conditional Likelihood i x exp( ) , , i j t = Prob( , , ) i j t J x exp( ) , , i j t x i = 1 j exp( ) T , *, i j t x i = Likelihood J = 1 t exp( ) , *, i j t i = 1 j = * indicator for the specific choice made by i at time t. j Note individual specific parameter vector, i

Part 25: Bayesian [47/58] Classical Approach [ , ] b b + w b + v ~ = = N i i i = = 1/2 where ( ) ( ) diag uncorrelated i j + + b w ) x + + exp[( ] T N , *, i i j t = w log Log likelihood d i J = = 1 i 1 t w b w ) x exp[( ] , *, i j t i i = 1 j b, Maximize over (rando m parameters model) using maximum simulated likelihood

Part 25: Bayesian [48/58] Bayesian Approach Gibbs Sampling and Metropolis-Hastings N Posterior= L(data| ) priors i i=1 Prior=N( ,..., |b, ) (normal), = diagonal( ,..., ) IG( ,..., |parameters) (Inverse gamma with 1 d.f. parameter ) g(b|assumed parameters) (Un 1 N 1 K 1 N iform (flat) with very large range)

Part 25: Bayesian [49/58] Gibbs Sampling from Posteriors: b b , ) = [ ,(1/ ) ] ( | b ,..., p Normal N 1 N N = (1/ ) N i = 1 i Easy to sample from Normal with known mean and variance by transforming a set of draws from standard normal.

Part 25: Bayesian [50/58] Gibbs Sampling from Posteriors: + + ( | , b ,..., ) ~ [1 ,1 ] p Inverse Gamma N NV 1 k N k N = 2 (1/ ) ( ) for each k=1,...,K V N b , k k i k = 1 i Draw from inverse gamma for each k: Draw 1+N draws from N[0,1] = h , r,k (1+N ) V then the draw is k R 2 r,k h = 1 r

Bayesian Econometric Analysis of Panel Data: A Comprehensive Overview

Download Presentation

Presentation Transcript

Related

More Related Content