Understanding Bayesian Analysis in Econometrics

discrete choice discrete choice modeling modeling n.w
1 / 77
Embed
Share

Explore the concepts of Bayesian analysis in econometrics, including Bayesian estimation, inference, paradigms, and the interplay between objectivity and subjectivity. Discover how Bayesian methods update beliefs with new evidence, contrasting with classical inference approaches.

  • Bayesian Analysis
  • Econometrics
  • Statistical Inference
  • Data Science
  • Bayesian Estimation

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Discrete Choice Discrete Choice Modeling Modeling William Greene Stern School of Business New York University [Topic 5-Bayesian Analysis] 1/77

  2. 5. BAYESIAN 5. BAYESIAN ECONOMETRICS ECONOMETRICS [Topic 5-Bayesian Analysis] 2/77

  3. Bayesian Estimation Bayesian Estimation Philosophical underpinnings: The meaning of statistical information How to combine information contained in the sample with prior information [Topic 5-Bayesian Analysis] 3/77

  4. Classical Inference Classical Inference Population Measurement Econometrics Characteristics Behavior Patterns Choices Imprecise inference about the entire population sampling theory and asymptotics [Topic 5-Bayesian Analysis] 4/77

  5. Bayesian Inference Bayesian Inference Population Measurement Econometrics Characteristics Behavior Patterns Choices Sharp, exact inference about only the sample the posterior density. [Topic 5-Bayesian Analysis] 5/77

  6. Paradigms Paradigms Classical Formulate the theory Gather evidence Evidence consistent with theory? Theory stands and waits for more evidence to be gathered Evidence conflicts with theory? Theory falls Bayesian Formulate the theory Assemble existing evidence on the theory Form beliefs based on existing evidence (*) Gather new evidence Combine beliefs with new evidence Revise beliefs regarding the theory Return to (*) [Topic 5-Bayesian Analysis] 6/77

  7. On Objectivity and Subjectivity On Objectivity and Subjectivity Objectivity and Frequentist methods in Econometrics The data speak Subjectivity and Beliefs Priors Evidence Posteriors Science and the Scientific Method [Topic 5-Bayesian Analysis] 7/77

  8. Foundational Result Foundational Result A method of using new information to update existing beliefs about probabilities of events Bayes Theorem for events. (Conceived for updating beliefs about games of chance) Pr(A,B) Pr(B) Pr(B| A)Pr(A) Pr(B) Pr(Evidence|Nature)Pr(Nature) Pr(Evidence) = = Pr(A |B) = Pr(Nature|Evidence) [Topic 5-Bayesian Analysis] 8/77

  9. Likelihoods Likelihoods (Frequentist) The likelihood is the density of the observed data conditioned on the parameters Inference based on the likelihood is usually maximum likelihood (Bayesian) A function of the parameters and the data that forms the basis for inference not a probability distribution The likelihood embodies the current information about the parameters and the data [Topic 5-Bayesian Analysis] 9/77

  10. The Likelihood Principle The Likelihood Principle The likelihood embodies ALL the current information about the parameters and the data Proportional likelihoods should lead to the same inferences, even given different interpretations. [Topic 5-Bayesian Analysis] 10/77

  11. Estimation Estimation Assembling information Prior information = out of sample. Literally prior or outside information Sample information is embodied in the likelihood Result of the analysis: Posterior belief = blend of prior and likelihood [Topic 5-Bayesian Analysis] 11/77

  12. Bayesian Investigation Bayesian Investigation No fixed parameters. is a random variable. Data are realizations of random variables. There is a marginal distribution p(data) Parameters are part of the random state of nature, p( ) = distribution of independently (prior to) the data, as understood by the analyst. (Two analysts could legitimately bring different priors to the study.) Investigation combines sample information with prior information. Outcome is a revision of the prior based on the observed information (data) [Topic 5-Bayesian Analysis] 12/77

  13. The Bayesian Estimator The Bayesian Estimator The posterior distribution embodies all that is believed about the model. Posterior = f(model|data) = Likelihood( ,data) * prior( ) / P(data) Estimation amounts to examining the characteristics of the posterior distribution(s). Mean, variance Distribution Intervals containing specified probabilities [Topic 5-Bayesian Analysis] 13/77

  14. Priors and Posteriors Priors and Posteriors The Achilles heel of Bayesian Econometrics Noninformative and Informative priors for estimation of parameters Noninformative (diffuse) priors: How to incorporate the total lack of prior belief in the Bayesian estimator. The estimator becomes solely a function of the likelihood Informative prior: Some prior information enters the estimator. The estimator mixes the information in the likelihood with the prior information. Improper and Proper priors P( ) is uniform over the allowable range of Cannot integrate to 1.0 if the range is infinite. Salvation improper, but noninformative priors will fall out of the posterior. [Topic 5-Bayesian Analysis] 14/77

  15. Symmetrical Treatment of Data and Symmetrical Treatment of Data and Parameters Parameters Likelihood is p(data| ) Prior summarizes nonsample information about in p( ) Joint distribution is p(data, ) P(data, ) = p(data| )p( ) Use Bayes theorem to get p( |data) = posterior distribution [Topic 5-Bayesian Analysis] 15/77

  16. The Posterior Distribution The Posterior Distribution Sample information L( | ) Prior information p( ) Joint density for and = p( , Conditional density for given the data p( , ) L( p( | ) = p( ) L data data data )p( ) data data| ) = L( )p( ) )p( ) data data| data| = data = posterior density ( d Information obtained from the investigation E[ |data] = posterior mean = the Bayesian "estimate" Var[ |data] = posterior variance used for form interval estimates Quan tiles of |data such as median, or 2.5th and 97.5th quantiles [Topic 5-Bayesian Analysis] 16/77

  17. Priors Where do they come from? What does the prior contain? Informative priors real prior information Noninformative priors Mathematical complications Diffuse Uniform Normal with huge variance Improper priors Conjugate priors data data L( L( )p( ) )p( )d = data p( | ) [Topic 5-Bayesian Analysis] 17/77

  18. Application Application Estimate , the probability that a production process will produce a defective product. Sampling design: Choose N = 25 items from the production line. D = the number of defectives. Result of our experiment D = 8 Likelihood for the sample of data is L( | data) = D(1 ) 25 D, 0 < < 1. Maximum likelihood estimator of is q = D/25 = 0.32, Asymptotic variance of the MLE is estimated by q(1 q)/25 = 0.008704. [Topic 5-Bayesian Analysis] 18/77

  19. Application: Posterior Density Application: Posterior Density Posterior density D N-D (1- ) (1- ) p( ) p( )d = p( | N,D) = data p( | ) . D N-D Noninformative prior : ( ) All allowable values of are equally likely. Uniform distribution over 0,1 . ( ) p 1, 0 1. Prior mean = 1/2. Prior variance = 1/12. = Posterio r densit y + D D + D D N D (1 1) ( 1 + + + ) ( | = d at a ) p + + ( 1) 1) N N D D ( D N D ( ( 2) 1) ( (1 N ) N = + 1) D 1 D N D Note: (1 ) 1 = A beta integral with a = D+1 and b = N-D+1 d 0 + + + ( 1) ( 1 + + 1) 1) D D N N D D = (D,N) = ( [Topic 5-Bayesian Analysis] 19/77

  20. Posterior Moments Posterior Moments Posterior Density with uniform noninformative prior + D N D ( ( 2) 1) ( + (1 N ) N D = p( |N,D) + 1) D Posterior Mean + D N D ( ( 2) 1) ( + (1 N ) N D 1 E[ |data] = d + 1) D 0 This is a beta integral. The posterior is a beta density with =D+1, =N-D+1. The mean of a beta variable = + + D 1 N + 9/ 27 = .3333 = Posterior mean = 2 Prior mean = .5000. MLE = 8/25 = .3200. ( = ) ( 3 N ) D 1 / N + + D 1 + + Posterior variance 2 0.007936 ( )( ) N 2 Prior variance = 1/12 = .08333; Variance of the MLE = .008704. [Topic 5-Bayesian Analysis] 20/77

  21. Informative prior Informative prior Beta is a common conjugate prior for a proportion or probability ( ) (1 ) p( ) = , Prior mean is E[ ]= ( ) ( ) Posterior is ( (1 ) p( |N,D)= ( (1 ) 0 + 1 1 + + 1 1 ) ( ) ( ) ) ( ) ( ) (1 ) N D D + 1 1 ( 1 ) 1 N D D d + N D + D 1 1 (1 ) = 1 + N D + D 1 1 (1 ) d 0 This is a beta density with parameters (D+ ,N-D+ ) + D ; = =1 in earlier example. The posterior mean is E[ |N,D] = + + N [Topic 5-Bayesian Analysis] 21/77

  22. Mixing Prior and Sample Mixing Prior and Sample Information Information A typical result (exact for sampling from the normal distribution with known variance) Posterior mean w Prior Mean + (1-w) MLE = w (Prior Mean - MLE) + MLE Posterior Mean - MLE .3333 .32 Prior Mean - MLE .5 .32 Approximate Result Prior Mean Prior Variance Asymptotic Variance Posterior Mean 1 Prior Variance Asymptotic Variance 1 Prior Variance = 1 1 Prior Variance Asymptotic Variance = = = w = .073889 MLE + Prior + (1- )MLE = 1 + 1/ (1/12) + = = .09547 1/ (1/12) 1/ (.008704) + [Topic 5-Bayesian Analysis] 22/77

  23. Modern Bayesian Analysis Modern Bayesian Analysis data Posterior Mean = ( | ) p d Integral is often complicated, or does not exist in closed form. Alternative strategy: Draw a random sample from the posterior distribution and examine moments, quantile s, etc. Example: Our posterior is Beta(9,18). Based on a random sample of 5,000 draws from this population: Bayesian Estimate of Distribution of (Posterior mean was .333333) Observations = 5000 (Posterior variance was .007936) Sample Mean = .334017 Sample variance = .007454 Standard Deviation = .086336 Skewness = .248077 Kurtosis-3 (excess)= -.161478 Minimum = .066214 Maximum = .653625 .025 Percentile = .177090 .975 Percentile - .510028 [Topic 5-Bayesian Analysis] 23/77

  24. Bayesian Estimator Bayesian Estimator First generation: Do the integration (math) f(data| )p( ) f(data) = E( |data) d [Topic 5-Bayesian Analysis] 24/77

  25. The Linear Regression Model The Linear Regression Model Likelihood 2 2 2 -n/2 -[(1/(2 ))(y-X ) (y-X )] L( , |y,X)=[2 ] e 2 = Transformation using d=(N-K) and s 1 ( ) ( 2 (1/d)( 1 y Xb ) ( y Xb ) 1 2 1 2 1 2 = y X y X ) ds ( b ) X X ( b ) 2 2 2 Diffuse uniform prior for Diffuse uniform prior for , conjugate g , conjugate gamma prior for amma prior for 2 2 Joint Posterior + d 1 + 2 v 2 [ds ] (d )'[ -b 1 X X ) ] ( 2 2 2 ds (1/ ) K/2 2 1 1/2 f( , | , ) y X e [2 ] | ( X X ) | 2 + 2) 2 1 1 exp{ (1 / 2)( ( -b )} [Topic 5-Bayesian Analysis] 25/77

  26. Marginal Posterior for Marginal Posterior for 2 After integrating out of the joint posterior: ( /2)[2 ] ( 2) ( ) ds + b X X + 2 v 2 + [ ds ] d + K 1/2 K /2 | X X | d ( | , ) f y X . d K + 2 /2 [ ( b )] 1 2 n-K K 2 1 Multivariate t with mean and variance matrix [ s ( ) ] b X'X n 2 The Bayesi noninformative. The only information available is in the likelihood. an 'estimator' equals the MLE. Of course; the prior was [Topic 5-Bayesian Analysis] 26/77

  27. Modern Bayesian Analysis Modern Bayesian Analysis Multiple parameter settings Derivation of exact form of expectations and variances for p( 1, 2, , K |data) is hopelessly complicated even if the density is tractable. Strategy: Sample joint observations ( 1, 2, , K) from the posterior population and use marginal means, variances, quantiles, etc. How to sample the joint observations??? (Still hopelessly complicated.) [Topic 5-Bayesian Analysis] 27/77

  28. A Practical Problem A Practical Problem Sampling from the joint posterior may be impossible. E.g., linear regression. + v 1 + 2 v 2 [vs ] (v ) [ b 1 2 2 2 vs (1/ ) K /2 2 1 1/2 f( , | , ) y X e [2 ] | ( X X ) | 2 + 2) 2 1 1 What is this??? T o do 'simulation based estimation' here, we need joint observations on ( , ). exp( (1/2)( ( X X ) ] ( b )) 2 [Topic 5-Bayesian Analysis] 28/77

  29. A Solution to the Sampling Problem A Solution to the Sampling Problem 2 The joint posterior, p( , For inference about , a sample from the marginal posterior, p( |data) would suffice. For inference about , a sample from the marginal p osterior of , p( |data) would suffice. Can we deduce these? For this problem, we do have conditionals: p( | ,data) = N[ , ( ) ] (y p( | ,data) K 2 |data) is intractable. But, 2 2 2 2 2 1 b X'X i 2 x ) 2 = a gamma distributi on i i 2 Can we use this information to sample from p( |data) and p( |data)? [Topic 5-Bayesian Analysis] 29/77

  30. Magic Tool: The Gibbs Sampler Magic Tool: The Gibbs Sampler Problem: How to sample observations from the a population, p( 1, 2, , K |data). Solution: The Gibbs Sampler. Target: Sample from f(x1, x2) = joint distribution Joint distribution is unknown or it is not possible to sample from the joint distribution. Assumed: Conditional distributions f(x1|x2) and f(x2|x1) are both known and marginal samples can be drawn from both. Gibbs sampling: Obtain one draw from x1,x2 by many cycles between x1|x2 and x2|x1. Start x1,0 anywhere in the right range. Draw x2,0 from x2|x1,0. Return to x1,1 from x1|x2,0 and so on. Several thousand cycles produces a draw Repeat several thousand times to produce a sample Average the draws to estimate the marginal means. [Topic 5-Bayesian Analysis] 30/77

  31. Bivariate Normal Sampling Bivariate Normal Sampling 1 0 0 1 Draw a random sample from bivariate normal , v v u u u u 1 1 1 = (1) Direct approach: where are two 2 2 2 r r 1 0 independent standard normal draws (easy) and = 1 2 1 1 2 = = 1 , such that '= . 1 . 2 [Topic 5-Bayesian Analysis] 31/77

  32. Application: Bivariate Normal Application: Bivariate Normal Obtain a bivariate normal sample (x,y) from Normal[(0,0),(1,1, )]. N = 5000. Conditionals: x|y is N[ y,(1- 2)] y|x is N[ x,(1- 2)]. Gibbs sampler: y0=0. x1 = y0 + sqr(1- 2)v where v is a N(0,1) draw y1 = x1 + sqr(1- 2)w where w is a N(0,1) draw Repeat cycle 60,000 times. Drop first 10,000. Retain every 10th observation of the remainder. [Topic 5-Bayesian Analysis] 32/77

  33. Gibbs Sampling for the Linear Gibbs Sampling for the Linear Regression Model Regression Model 2 2 1 p( | ,data) = N[ , b ( X'X ) ] x i 2 (y ) 2 p( | ,data) K i i 2 = a gamma distribution Iterate back and forth between these two distributions [Topic 5-Bayesian Analysis] 33/77

  34. More General Gibbs Sampler More General Gibbs Sampler Objective: Sample joint observations on 1, 2, , K. from p( 1, 2, , K|data) (Let K = 3) Derive p( 1| 2, 3,data) p( 2| 1, 3,data) p( 3| 1, 2,data) Gibbs Cycles produce joint observations 0. Start 1, 2, 3 at some reasonable values 1. Sample a draw from p( 1| 2, 3,data) using the draws of 1, 2 in hand 2. Sample a draw from p( 2| 1, 3,data) using the draw at step 1 for 1 3. Sample a draw from p( 3| 1, 2,data) using the draws at steps 1 and 2 4. Return to step 1. After a burn in period (a few thousand), start collecting the draws. The set of draws ultimately gives a sample from the joint distribution. Order within the chain does not matter. [Topic 5-Bayesian Analysis] 34/77

  35. Using the Gibbs Sampler to Estimate a Probit Using the Gibbs Sampler to Estimate a Probit Model Model x Probit Model: y* = Implication: Prob[y=1| , ] = ( Prob[y=0| , ] = 1 - ( + ; y = 1[y* > 0]; ~ N[0,1]. ) 1 y X x x x x ) N 1 y [ ( y y X x x Likelihood Function L( | , ) = [1 - ( )] )] i i i i i 1 = Uninformative prior p( ) N 1 y [ ( y x x [1 - ( )] )] 1 i i i i i 1 = = y X Posterior density p( | , ) N 1 y [ ( y x x [1 - ( )] )] 1 d i i i i i 1 = N 1 y [ ( y x x [1 - ( )] )] 1 d i i i i i 1 = = Posterior Mean = E[ | , ] N 1 y [ ( y x x [1 - ( )] )] 1 d i i i i i 1 = [Topic 5-Bayesian Analysis] 35/77

  36. Strategy: Data Augmentation Strategy: Data Augmentation Treat yi* as unknown parameters with Estimate = ( ,y1*, ,yN*) = ( ,y*) Draw a sample of R observations from the joint population ( ,y*). Use the marginal observations on to estimate the characteristics (e.g., mean) of the distribution of |y,X [Topic 5-Bayesian Analysis] 36/77

  37. Gibbs Sampler Strategy Gibbs Sampler Strategy p( |y*,(y,X)). If y* is known, y is known. p( |y*,(y,X)) = p( |y*,X). p( |y*,X) defines a linear regression with N(0,1) normal disturbances. Known result for |y*: p( |y*,(y,X), =N[0,I]) = N[b*,(X X)-1] b* = (X X)-1X y* Deduce a result for y*| [Topic 5-Bayesian Analysis] 37/77

  38. Gibbs Sampler, Continued Gibbs Sampler, Continued yi*| ,xiis Normal[xi ,1] yi is informative about yi*: If yi = 1 , then yi* > 0; p(yi*| ,xiyi = 1) is truncated normal: p(yi*| ,xiyi = 1) = (xi )/[1- (xi )] Denoted N+[xi ,1] If yi = 0, then yi* < 0; p(yi*| ,xiyi = 0) is truncated normal: p(yi*| ,xiyi = 0) = (xi )/ (xi ) Denoted N-[xi ,1] [Topic 5-Bayesian Analysis] 38/77

  39. Generating Random Draws from f(x) Generating Random Draws from f(x) The inverse probability method of sampling random draws: If F(x) is the CDF of random variable x, then a random draw on x may be obtained as F (u) where u is a draw from the standard uniform (0,1). Exampl es: : f(x)= exp(- x); F(x)=1-exp(- x) x = -(1/ )log(1-u) : F(x) = (x); x = : x= Truncated Normal x= + -1 Exponential -1 Normal (u) -1 + [1-(1-u)* ( )] for y=1; i i -1 [u (- )] for y=0. i i [Topic 5-Bayesian Analysis] 39/77

  40. Sampling from the Truncated Sampling from the Truncated Normal Normal The usual inverse probability transform . Begin with a draw from U[0,1]. U the draw. = r + To obtain a draw y * from N [ ,1] = + r 1 y * [1 (1 U ) ( )] r r To obtain a draw y * from N [ ,1] = + r 1 y * [U ( )] r r [Topic 5-Bayesian Analysis] 40/77

  41. Sampling from the Multivariate Sampling from the Multivariate Normal Normal A multivariate version of the inverse probability tra x L nsform To sample from N[ , ] (K dimensional) Let be the Cholesky matrix such that Let be a column of K independent random normal(0,1) d v Then + is normally distributed with mean and variance = as needed. LIL LL = raws. Lv [Topic 5-Bayesian Analysis] 41/77

  42. Gibbs Sampler Gibbs Sampler Preliminary: Obtain X X then L such that LL = (X X)-1. Preliminary: Choose initial value for such as 0 = 0. Start with r = 1. (y* step) Sample N observations on y*(r) using r-1 , xi and yi and the transformations for the truncated normal distribution. ( step) Compute b*(r) = (X X)-1X y*(r). Draw the observation on (r) from the normal population with mean b*(r) and variance (X X)-1. Cycle between the two steps 50,000 times. Discard the first 10,000 and retain every 10th observation from the retained 40,000. [Topic 5-Bayesian Analysis] 42/77

  43. Frequentist and Bayesian Results Frequentist and Bayesian Results 0.37 Seconds 2 Minutes [Topic 5-Bayesian Analysis] 43/77

  44. Appendix Appendix [Topic 5-Bayesian Analysis] 44/77

  45. Bayesian Model Estimation Bayesian Model Estimation Specification of conditional likelihood: f(data | parameters) = L(parameters|data) Specification of priors: g(parameters) Posterior density of parameters: f(data|parameters)g(parameters) f(parameters|data)= f(data) Posterior mean = E[parameters|data] [Topic 5-Bayesian Analysis] 45/77

  46. The Marginal Density for the Data is The Marginal Density for the Data is Irrelevant Irrelevant f(data| )p( ) f( |data) = f(data) Joint density of and data is f(data, ) = L(data| )p( ) Marginal density of the data is f(data) = f(data, )d = L(data| )p( )d L(data| )p( ) f(data) = L(data| )p( ) L(data| )p( )d Thus, f( |data) = L(data| )p( )d Posterior Mean = p( |data)d = L(data| )p( )d Requires specification of the likeihood and the prior. [Topic 5-Bayesian Analysis] 46/77

  47. Bayesian Estimators Bayesian Estimators Bayesian Random Parameters vs. Classical Randomly Distributed Parameters Models of Individual Heterogeneity Sample Proportion Linear Regression Binary Choice Random Effects: Consumer Brand Choice Fixed Effects: Hospital Costs [Topic 5-Bayesian Analysis] 47/77

  48. A Random Effects Approach A Random Effects Approach Allenby and Rossi, Marketing Models of Consumer Heterogeneity Discrete Choice Model Brand Choice Hierarchical Bayes Multinomial Probit Panel Data: Purchases of 4 brands of ketchup [Topic 5-Bayesian Analysis] 48/77

  49. Structure Structure Conditional data generation mechanism y * = x + = utility for consumer i, choice t, brand j. Y =1[y * =maximum utility among the J choices] x = (constant, log price, "availabili i it,j it,j it,j it,j it,j ty," "featured") it,j Implies a J outcome multinomial probit model. ~N[0, ], =1 it,j j 1 [Topic 5-Bayesian Analysis] 49/77

  50. Priors Priors Prior Densities ~N , , V i Implies ~Inverse Gamma[v,s ] (looks like chi-squared), v =3, s =1 Priors over model parameters = + , ~N[ , ] w w 0 V i i i j j j ~N ,aV , = 0 -1 ~ Wishart [v , ],v =8, =8 V V V I 0 0 0 0 [Topic 5-Bayesian Analysis] 50/77

More Related Content