
Logistic Regression Analysis in Economics
Learn about the application of logistic regression in economic analysis through examples and interpretations. Explore how household income impacts various outcomes and the significance of coefficient estimates in regression models.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Logistic Regression Jonathan Haughton jhaughton@Suffolk.edu Ec 490: Senior Seminar Suffolk University, Boston MA 02108 February 26, 2021
Standard case Regression analysis: y = a + bX Example: Bangladesh 1998 HH_inc = a + b age of head See scatterplot Outcome variable is continuous
Binary dependent variable Regression again, but outcome variable is binary aka 1/0, or dummy variable. Example: Bangladesh 1998 Has electricity = a + b HH_inc See scatterplot OLS line: linear model Better: Logit (the stretched S)
OLS: Easy to interpret . reg hhinc agehead Source | SS df MS Number of obs -------------+---------------------------------- Model | 47614.4898 1 47614.4898 Prob > F = 0.0000 Residual | 1296720.1 512 2532.65644 R-squared = 0.0354 -------------+---------------------------------- Total | 1344334.59 513 2620.53526 Root MSE = 50.326 = 514 F(1, 512) = 18.80 Adj R-squared = 0.0335 ------------------------------------------------------------------------------ hhinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- agehead | .7625424 .1758663 4.34 0.000 .4170341 1.108051 _cons | 16.42401 8.587338 1.91 0.056 -.4467419 33.29477 ------------------------------------------------------------------------------ Marginal effect: Age of head is 1 year more, household income is 763 taka ($15) more p.a.
OLS applied to binary outcome . reg hhelec hhinc Source | SS df MS Number of obs = 514 -------------+---------------------------------- F(1, 512) = 62.25 Model | 8.73073418 1 8.73073418 Prob > F = 0.0000 Residual | 71.8140129 512 .140261744 R-squared = 0.1084 -------------+---------------------------------- Adj R-squared = 0.1067 Total | 80.5447471 513 .157007304 Root MSE = .37452 ------------------------------------------------------------------------------ hhelec | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hhinc | .0025484 .000323 7.89 0.000 .0019138 .003183 _cons | .061034 .0236491 2.58 0.010 .0145727 .1074953 ------------------------------------------------------------------------------ Extra 1,000 taka ($20) in annual hh income associated with a 0.25 percentage point higher probability of having electricity. Equivalent to slope, marginal effect, or first derivative, Y/ X.
Logit applied to binary outcome . logit hhelec hhinc Household income 1,000 taka ($20) higher, higher probability of having electricity. But how much higher? Iteration 0: log likelihood = -253.27723 Iteration 1: log likelihood = -231.27887 Iteration 2: log likelihood = -230.2707 Iteration 3: log likelihood = -230.26439 Iteration 4: log likelihood = -230.26439 Logistic regression Number of obs = 514 LR chi2(1) = 46.03 Prob > chi2 = 0.0000 Log likelihood = -230.26439 Pseudo R2 = 0.0909 ------------------------------------------------------------------------------ hhelec | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hhinc | .0138263 .002289 6.04 0.000 .0093399 .0183126 _cons | -2.240012 .1832558 -12.22 0.000 -2.599186 -1.880837 ------------------------------------------------------------------------------
Solution: compute marginal effects Marginal effect at mean . margins, dydx(*) Average marginal effects Number of obs = 514 Model VCE : OIM Expression : Pr(hhelec), predict() dy/dx w.r.t. : hhinc ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hhinc | .0019272 .0002878 6.70 0.000 .0013632 .0024912 ------------------------------------------------------------------------------ Compare 0.0019 with marginal effect of 0.0025 from linear model.
But it is nonlinear At higher income, an extra 1000 taka is associated with a faster move to electricity. Marginal effects at different values of HH_inc. Quartiles: 24, 36, 64. 90% decile is about 100. margins, dydx(*) at(hhinc=(24, 36, 64, 100)) ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hhinc | _at | 1 | .0015554 .0002096 7.42 0.000 .0011446 .0019663 2 | .0017534 .0002631 6.66 0.000 .0012377 .0022691 3 | .0022536 .0004127 5.46 0.000 .0014448 .0030624 4 | .0028918 .0005987 4.83 0.000 .0017183 .0040652 ------------------------------------------------------------------------------
Can add more variables . reg hhelec i.region i.sexhead agehead educhead famsize hhinc Source | SS df MS Number of obs = 514 -------------+---------------------------------- F(8, 505) = 16.33 Model | 16.5556312 8 2.0694539 Prob > F = 0.0000 Residual | 63.9891159 505 .126711121 R-squared = 0.2055 -------------+---------------------------------- Adj R-squared = 0.1930 Total | 80.5447471 513 .157007304 Root MSE = .35597 logit hhelec i.region i.sexhead agehead educhead famsize hhinc Logistic regression Number of obs = 514 LR chi2(8) = 98.42 Prob > chi2 = 0.0000 Log likelihood = -204.06972 Pseudo R2 = 0.1943 ------------------------------------------------------------------------------ hhelec | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- region | chittagong | .4491554 .3957328 1.13 0.256 -.3264666 1.224777 khulna | -.3105247 .3179168 -0.98 0.329 -.9336302 .3125808 rajshahi | -1.097121 .3668809 -2.99 0.003 -1.816194 -.3780473 | 1.sexhead | -.4556877 .3849678 -1.18 0.237 -1.210211 .2988353 agehead | .0103907 .010771 0.96 0.335 -.01072 .0315015 educhead | .1890894 .0335343 5.64 0.000 .1233634 .2548153 famsize | -.0909478 .0576292 -1.58 0.115 -.2038989 .0220033 hhinc | .012797 .0031084 4.12 0.000 .0067046 .0188895 _cons | -2.020609 .6515049 -3.10 0.002 -3.297535 -.7436825 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ hhelec | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- region | chittagong | .0697895 .0563465 1.24 0.216 -.0409129 .1804918 khulna | -.0421806 .041515 -1.02 0.310 -.123744 .0393828 rajshahi | -.1165697 .040774 -2.86 0.004 -.1966773 -.036462 | 1.sexhead | -.0602426 .0493949 -1.22 0.223 -.1572874 .0368023 agehead | .0010522 .0013383 0.79 0.432 -.0015772 .0036815 educhead | .0303548 .0047211 6.43 0.000 .0210794 .0396301 famsize | -.0134392 .0071766 -1.87 0.062 -.0275389 .0006605 hhinc | .0021181 .000377 5.62 0.000 .0013775 .0028588 _cons | .1345276 .0821795 1.64 0.102 -.0269282 .2959835 ------------------------------------------------------------------------------ . margins, dydx(*) . margins, dydx(*) Average marginal effects Number of obs = 514 Model VCE : OIM Average marginal effects Number of obs = 514 Model VCE : OLS Expression : Pr(hhelec), predict() dy/dx w.r.t. : 2.region 3.region 4.region 1.sexhead agehead educhead famsize hhinc Expression : Linear prediction, predict() dy/dx w.r.t. : 2.region 3.region 4.region 1.sexhead agehead educhead famsize hhinc ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- region | chittagong | .0719255 .0660346 1.09 0.276 -.0575 .2013509 khulna | -.0421427 .0428747 -0.98 0.326 -.1261756 .0418902 rajshahi | -.1226472 .0391581 -3.13 0.002 -.1993958 -.0458987 | 1.sexhead | -.0602304 .0543129 -1.11 0.267 -.1666816 .0462208 agehead | .0012742 .0013194 0.97 0.334 -.0013118 .0038602 educhead | .0231872 .0037498 6.18 0.000 .0158378 .0305367 famsize | -.0111526 .0070157 -1.59 0.112 -.0249031 .002598 hhinc | .0015692 .0003612 4.34 0.000 .0008613 .0022772 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level. ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- region | chittagong | .0697895 .0563465 1.24 0.216 -.0409129 .1804918 khulna | -.0421806 .041515 -1.02 0.310 -.123744 .0393828 rajshahi | -.1165697 .040774 -2.86 0.004 -.1966773 -.036462 | 1.sexhead | -.0602426 .0493949 -1.22 0.223 -.1572874 .0368023 agehead | .0010522 .0013383 0.79 0.432 -.0015772 .0036815 educhead | .0303548 .0047211 6.43 0.000 .0210794 .0396301 famsize | -.0134392 .0071766 -1.87 0.062 -.0275389 .0006605 hhinc | .0021181 .000377 5.62 0.000 .0013775 .0028588 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level.
# Set up your working directory where you have your Excel file. You will need to change this to # reflect the setup on your computer setwd("c:/JHteaching/Ec490/LogisticRegression") library(foreign) bd <- read.dta("hh98big7bs.dta") head(bd) View(bd) table(bd$hhelec) summary(bd$agehead) summary(bd) str(bd) bd$hhinc <- bd$hhincwf + bd$hhincwnf + bd$hhincsf + bd$hhincsnf bd$hhinc <- bd$hhinc/1000 summary(bd$hhinc) bd$agesq <- (bd$agehead)^2 summary (bd$agesq) # Linear regression #reg hhinc i.region i.sexhead agehead agesq educhead famsize reg1 <- lm(hhinc ~ region + sexhead + agehead + agesq + educhead + famsize, data=bd) summary(reg1) reg1a <- lm(hhinc ~ agehead, data=bd) summary(reg1a) reg2a <- lm(hhelec ~ hhinc, data=bd) summary(reg2a) # Logistic regression logit2a <- glm(hhelec ~ hhinc, data=bd, family="binomial") summary(logit2a) library("margins") margins(logit2a) # Very recent command in R, mimicking Stata's command margins(logit2a, at = list(hhinc = 24:36)) margins(logit2a, at = list(hhinc = 24)) R code
* Logistic Regression Example use "C:\JHteaching\Ec490\LogisticRegression\hh98big7bs.dta" des sum tab hhelec tab sexhead tab agehead tab educhead gen hhinc = hhincwf + hhincwnf + hhincsf + hhincsnf replace hhinc = hhinc/1000 label var hhinc "Household income" label var hhelec "Household has electricity" Stata code (do file) gen agesq = agehead^2 label var agesq "Age of hh head squared" reg hhinc i.region i.sexhead agehead agesq educhead famsize margins, dydx(*) twoway (scatter hhinc agehead) twoway (scatter hhelec hhinc) reg hhelec hhinc predict yhatlin1 *twoway (scatter hhelec hhinc)(scatter yhatlin1 hhinc, connect(l) msymbol(i)) twoway (scatter hhelec hhinc)(scatter yhatlin1 hhinc, connect(l) msymbol(i)) logit hhelec hhinc predict yhatlog1 sort hhinc twoway (scatter hhelec hhinc)(scatter yhatlin1 hhinc, connect(l) msymbol(i))(scatter yhatlog1 hhinc, connect(l) msymbol(i) lpattern(dash)) reg hhinc agehead reg hhelec hhinc logit hhelec hhinc margins margins, dydx(*) margins, dydx(*) at(hhinc=(20,50)) margins, dydx(*) at(hhinc=(24, 36, 64, 100)) logit hhelec i.region i.sexhead agehead educhead famsize hhinc margins, dydx(*) margins, dydx(*) at(hhinc=(24, 36, 64, 100)) reg hhelec i.region i.sexhead agehead educhead famsize hhinc margins, dydx(*)