Regression Modelling for Sociologists: OLS, Logit, Probit & More

sc968 panel data methods for sociologists lecture n.w
1 / 65
Embed
Share

Exploring essential concepts like OLS, logit, and probit models for regression modelling in sociological research. Dive into interpreting results, model specification, and post-estimation commands. Check out the basics of OLS, its assumptions, and when it is appropriate to use in your analysis.

  • Regression Modelling
  • Sociologists
  • OLS
  • Logit
  • Probit

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. SC968 Panel data methods for sociologists Lecture 1, part 1 A review of concepts for regression modelling Or things you should know already

  2. Overview Models OLS, logit and probit Mathematically and practically Interpretation of results, measures of fit and regression diagnostics Model specification Post-estimation commands STATA competence

  3. Ordinary Least Squares (OLS) Value of dependent variable for individual i (LHS variable) = + 1 + 2 + 3 + + + ......... y x x x x Residual (disturbance, error term) 1 2 3 i i i i iK K i Intercept (constant) Total no. of explanatory variables (RHS variables or regressors) is K Coefficient on variable 1 Value of explanatory variable 1 for person i Examples yi= mental health x1 = sex x2 = age x3 = marital status x4 = employment status x5 = physical health yi = hourly pay x1 = sex x2 = age x3 = education x4 = job tenure x5 = industry x6 = region

  4. OLS = + 1 + 2 + 3 + + + ......... y x x x x 1 2 3 i i i i iK K i In vector form In matrix form = + ' y x = + ' y X i i i Vector of explanatory variables Vector of coefficients . . x x x x y 11 12 13 1 K 1 1 . . x x x x y 21 22 23 2 2 K 1 1 2 . . y x x x x . 3 31 32 33 3 2 K 3 2 . . . y x x x x . = + * = + 4 3 41 42 43 4 K . . * y x x x x 3 . . . y x x x x 1 2 3 i i i i iK i 5 51 52 53 5 K . . . . . . . . . . . . . . . . . . K . . y x x x x K N N 1 2 3 N N N NK Note: you will often see x written as x

  5. OLS Also called linear regression Assumes dependent variable is a linear combination of dependent variables, plus disturbance Least squares : s estimated so as to minimise the sum of the s. i 2) min (

  6. Basic Assumptions E = ( ) 0 Residuals have zero mean . Follows that s and X s are uncorrelated . violated if a regressor is endogenous Eg, number of children in female labour supply models Cure by (eg) Instrumental Variables Homoscedasticity: all s have same variance Classic example: food consumption and income Cure by using weighted least squares i E E = ( | ) = 0 iX i ( ) 0 iX i = 2 ( ) Var i = ( ) 0 E Nonautocorrelation: s uncorrelated with each other Data sets where the same individual appears multiple times Adjust standard errors: cluster option in STATA Distubances are iid (normally distributed, zero mean, constant variance) i j

  7. When is OLS appropriate? When you have a continuous dependent variable Eg, you would use it to estimate regressions for height, but not for whether a person has a university degree. When the assumptions are not obviously violated As a first step in research to get ball-park estimates We will use them a lot for this purpose Worked examples Coefficients, P-values, t-statistics Measures of fit (R-squared, adjusted R-squared) Thinking about specification Post-estimation commands Regression diagnostics. A note on the data All examples (in lectures and practicals) drawn from a 20% sample of the British Household Panel Survey (BHPS) more about the data later!

  8. Summarize monthly earned income . . s su um m i in nc cm m i if f a ag ge e > >= = 1 17 7 & & a ag ge e < <= = 6 64 4, , d d i in nc cm m P Pe er rc ce en nt ti il le es s S Sm ma al ll le es st t 1 1% % 4 43 3 1 1 5 5% % 1 15 56 6 1 1. .2 25 5 1 10 0% % 2 26 68 8. .6 66 66 67 7 2 2 O Ob bs s 1 16 66 69 96 6 2 25 5% % 6 61 15 5. .3 33 33 33 3 2 2. .4 41 16 66 66 67 7 S Su um m o of f W Wg gt t. . 1 16 66 69 96 6 5 50 0% % 1 10 07 73 3. .0 08 88 8 M Me ea an n 1 12 28 82 2. .8 83 31 1 L La ar rg ge es st t S St td d. . D De ev v. . 1 10 00 08 8. .3 30 08 8 7 75 5% % 1 16 69 90 0 9 92 20 07 7. .0 08 83 3 9 90 0% % 2 24 47 71 1. .8 84 48 8 9 93 33 33 3. .3 33 33 3 V Va ar ri ia an nc ce e 1 10 01 16 66 68 85 5 9 95 5% % 3 30 06 61 1. .3 35 55 5 1 10 00 00 00 0 S Sk ke ew wn ne es ss s 2 2. .1 19 92 29 95 5 9 99 9% % 5 50 00 03 3. .8 84 49 9 1 10 00 00 00 0 K Ku ur rt to os si is s 1 11 1. .9 94 43 32 21 1

  9. For illustrative purposes only. Not an example of good practice. First worked example Monthly labour income, for people whose labour income is >= 1 MS = SS/df Tests whether all coeffs except constant are jointly zero . . d do o " "C C: :\ \D DO OC CU UM ME E~ ~1 1\ \m ma ar ri ia a\ \L LO OC CA AL LS S~ ~1 1\ \T Te em mp p\ \S ST TD D0 03 30 00 00 00 00 00 0. .t tm mp p" " Analysis of variance (ANOVA) table . . r re eg g i in nc cm m f fe em ma al le e a ag ge e a ag ge e2 2 p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g m mt th h_ _i in nt t i if f a ag ge e > >= = 1 17 7 & & a ag ge e < <= = 6 64 4 S So ou ur rc ce e S SS S d df f M MS S N Nu um mb be er r o of f o ob bs s = = 1 16 64 45 58 8 F F( ( 7 7, , 1 16 64 45 50 0) ) = = 9 95 57 7. .9 92 2 M Mo od de el l 4 4. .8 81 14 45 5e e+ +0 09 9 7 7 6 68 87 77 78 85 55 59 97 7 P Pr ro ob b > > F F = = 0 0. .0 00 00 00 0 R Re es si id du ua al l 1 1. .1 18 81 11 1e e+ +1 10 0 1 16 64 45 50 0 7 71 18 80 00 00 0. .6 66 67 7 R R- -s sq qu ua ar re ed d = = 0 0. .2 28 89 96 6 A Ad dj j R R- -s sq qu ua ar re ed d = = 0 0. .2 28 89 93 3 T To ot ta al l 1 1. .6 66 62 26 6e e+ +1 10 0 1 16 64 45 57 7 1 10 01 10 02 24 45 5. .5 5 R Ro oo ot t M MS SE E = = 8 84 47 7. .3 35 5 R-squared = Model SS / Total SS i in nc cm m C Co oe ef f. . S St td d. . E Er rr r. . t t P P> >| |t t| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] f fe em ma al le e - -5 59 94 4. .9 96 64 41 1 1 13 3. .2 26 68 81 12 2 - -4 44 4. .8 84 4 0 0. .0 00 00 0 - -6 62 20 0. .9 97 71 11 1 - -5 56 68 8. .9 95 57 71 1 Root MSE = sqrt(MSR) a ag ge e 1 10 01 1. .0 09 99 94 4 3 3. .8 85 59 96 65 57 7 2 26 6. .1 19 9 0 0. .0 00 00 0 9 93 3. .5 53 34 40 01 1 1 10 08 8. .6 66 64 47 7 a ag ge e2 2 - -1 1. .1 15 55 52 28 81 1 . .0 04 47 79 99 99 92 2 - -2 24 4. .0 07 7 0 0. .0 00 00 0 - -1 1. .2 24 49 93 36 64 4 - -1 1. .0 06 61 11 19 97 7 p pa ar rt tn ne er r 1 15 55 5. .7 79 99 92 2 1 16 6. .6 62 27 70 03 3 9 9. .3 37 7 0 0. .0 00 00 0 1 12 23 3. .2 20 08 85 5 1 18 88 8. .3 39 9 e ed d_ _s se ec c 3 38 80 0. .5 50 03 32 2 1 14 4. .3 36 65 58 82 2 2 26 6. .4 49 9 0 0. .0 00 00 0 3 35 52 2. .3 34 44 46 6 4 40 08 8. .6 66 61 18 8 e ed d_ _d de eg g 1 10 07 76 6. .6 67 74 4 2 20 0. .5 54 45 52 26 6 5 52 2. .4 40 0 0 0. .0 00 00 0 1 10 03 36 6. .4 40 03 3 1 11 11 16 6. .9 94 45 5 m mt th h_ _i in nt t - -5 5. .0 05 59 90 07 72 2 4 4. .0 03 36 64 44 46 6 - -1 1. .2 25 5 0 0. .2 21 10 0 - -1 12 2. .9 97 70 09 94 4 2 2. .8 85 52 28 8 _ _c co on ns s - -8 81 19 9. .9 93 31 1 7 78 8. .8 80 00 06 64 4 - -1 10 0. .4 41 1 0 0. .0 00 00 0 - -9 97 74 4. .3 38 88 88 8 - -6 66 65 5. .4 47 73 32 2 T-stat = coefficient / standard error Coefficients + or 1.96 standard errors

  10. What do the results tell us? . . d do o " "C C: :\ \D DO OC CU UM ME E~ ~1 1\ \m ma ar ri ia a\ \L LO OC CA AL LS S~ ~1 1\ \T Te em mp p\ \S ST TD D0 03 30 00 00 00 00 00 0. .t tm mp p" " . . r re eg g i in nc cm m f fe em ma al le e a ag ge e a ag ge e2 2 p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g m mt th h_ _i in nt t i if f a ag ge e > >= = 1 17 7 & & a ag ge e < <= = 6 64 4 S So ou ur rc ce e S SS S d df f M MS S N Nu um mb be er r o of f o ob bs s = = 1 16 64 45 58 8 F F( ( 7 7, , 1 16 64 45 50 0) ) = = 9 95 57 7. .9 92 2 M Mo od de el l 4 4. .8 81 14 45 5e e+ +0 09 9 7 7 6 68 87 77 78 85 55 59 97 7 P Pr ro ob b > > F F = = 0 0. .0 00 00 00 0 R Re es si id du ua al l 1 1. .1 18 81 11 1e e+ +1 10 0 1 16 64 45 50 0 7 71 18 80 00 00 0. .6 66 67 7 R R- -s sq qu ua ar re ed d = = 0 0. .2 28 89 96 6 A Ad dj j R R- -s sq qu ua ar re ed d = = 0 0. .2 28 89 93 3 T To ot ta al l 1 1. .6 66 62 26 6e e+ +1 10 0 1 16 64 45 57 7 1 10 01 10 02 24 45 5. .5 5 R Ro oo ot t M MS SE E = = 8 84 47 7. .3 35 5 i in nc cm m C Co oe ef f. . S St td d. . E Er rr r. . t t P P> >| |t t| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] f fe em ma al le e - -5 59 94 4. .9 96 64 41 1 1 13 3. .2 26 68 81 12 2 - -4 44 4. .8 84 4 0 0. .0 00 00 0 - -6 62 20 0. .9 97 71 11 1 - -5 56 68 8. .9 95 57 71 1 a ag ge e 1 10 01 1. .0 09 99 94 4 3 3. .8 85 59 96 65 57 7 2 26 6. .1 19 9 0 0. .0 00 00 0 9 93 3. .5 53 34 40 01 1 1 10 08 8. .6 66 64 47 7 a ag ge e2 2 - -1 1. .1 15 55 52 28 81 1 . .0 04 47 79 99 99 92 2 - -2 24 4. .0 07 7 0 0. .0 00 00 0 - -1 1. .2 24 49 93 36 64 4 - -1 1. .0 06 61 11 19 97 7 p pa ar rt tn ne er r 1 15 55 5. .7 79 99 92 2 1 16 6. .6 62 27 70 03 3 9 9. .3 37 7 0 0. .0 00 00 0 1 12 23 3. .2 20 08 85 5 1 18 88 8. .3 39 9 e ed d_ _s se ec c 3 38 80 0. .5 50 03 32 2 1 14 4. .3 36 65 58 82 2 2 26 6. .4 49 9 0 0. .0 00 00 0 3 35 52 2. .3 34 44 46 6 4 40 08 8. .6 66 61 18 8 e ed d_ _d de eg g 1 10 07 76 6. .6 67 74 4 2 20 0. .5 54 45 52 26 6 5 52 2. .4 40 0 0 0. .0 00 00 0 1 10 03 36 6. .4 40 03 3 1 11 11 16 6. .9 94 45 5 m mt th h_ _i in nt t - -5 5. .0 05 59 90 07 72 2 4 4. .0 03 36 64 44 46 6 - -1 1. .2 25 5 0 0. .2 21 10 0 - -1 12 2. .9 97 70 09 94 4 2 2. .8 85 52 28 8 _ _c co on ns s - -8 81 19 9. .9 93 31 1 7 78 8. .8 80 00 06 64 4 - -1 10 0. .4 41 1 0 0. .0 00 00 0 - -9 97 74 4. .3 38 88 88 8 - -6 66 65 5. .4 47 73 32 2 All coefficients except month of interview are significant 29% of variation explained Being female reduces income by nearly 600 per month Income goes up with age and then down 16458 observations ..oops, this is from panel data, so there are repeated observations on individuals.

  11. Add ,cluster(pid) as an option . . r re eg g i in nc cm m f fe em ma al le e a ag ge e a ag ge e2 2 p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g m mt th h_ _i in nt t i if f a ag ge e > >= = 1 17 7 & & a ag ge e < <= = 6 64 4, , c cl lu us st te er r( (p pi id d) ) L Li in ne ea ar r r re eg gr re es ss si io on n N Nu um mb be er r o of f o ob bs s = = 1 16 64 45 58 8 F F( ( 7 7, , 2 24 46 65 5) ) = = 1 13 35 5. .2 26 6 P Pr ro ob b > > F F = = 0 0. .0 00 00 00 0 R R- -s sq qu ua ar re ed d = = 0 0. .2 28 89 96 6 R Ro oo ot t M MS SE E = = 8 84 47 7. .3 35 5 ( (S St td d. . E Er rr r. . a ad dj ju us st te ed d f fo or r 2 24 46 66 6 c cl lu us st te er rs s i in n p pi id d) ) R Ro ob bu us st t i in nc cm m C Co oe ef f. . S St td d. . E Er rr r. . t t P P> >| |t t| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] f fe em ma al le e - -5 59 94 4. .9 96 64 41 1 3 31 1. .8 81 11 17 72 2 - -1 18 8. .7 70 0 0 0. .0 00 00 0 - -6 65 57 7. .3 34 44 45 5 - -5 53 32 2. .5 58 83 36 6 a ag ge e 1 10 01 1. .0 09 99 94 4 7 7. .3 32 23 30 08 88 8 1 13 3. .8 81 1 0 0. .0 00 00 0 8 86 6. .7 73 39 93 32 2 1 11 15 5. .4 45 59 94 4 a ag ge e2 2 - -1 1. .1 15 55 52 28 81 1 . .0 09 93 33 38 81 13 3 - -1 12 2. .3 37 7 0 0. .0 00 00 0 - -1 1. .3 33 38 83 39 95 5 - -. .9 97 72 21 16 66 66 6 p pa ar rt tn ne er r 1 15 55 5. .7 79 99 92 2 3 30 0. .8 87 72 22 27 7 5 5. .0 05 5 0 0. .0 00 00 0 9 95 5. .2 26 60 09 99 9 2 21 16 6. .3 33 37 75 5 e ed d_ _s se ec c 3 38 80 0. .5 50 03 32 2 3 30 0. .3 36 67 74 46 6 1 12 2. .5 53 3 0 0. .0 00 00 0 3 32 20 0. .9 95 54 49 9 4 44 40 0. .0 05 51 16 6 e ed d_ _d de eg g 1 10 07 76 6. .6 67 74 4 6 64 4. .4 45 51 13 31 1 1 16 6. .7 71 1 0 0. .0 00 00 0 9 95 50 0. .2 28 89 98 8 1 12 20 03 3. .0 05 58 8 m mt th h_ _i in nt t - -5 5. .0 05 59 90 07 72 2 4 4. .1 12 26 61 10 02 2 - -1 1. .2 23 3 0 0. .2 22 20 0 - -1 13 3. .1 15 50 00 06 6 3 3. .0 03 31 19 91 12 2 _ _c co on ns s - -8 81 19 9. .9 93 31 1 1 13 32 2. .8 84 45 55 5 - -6 6. .1 17 7 0 0. .0 00 00 0 - -1 10 08 80 0. .4 43 31 1 - -5 55 59 9. .4 43 30 06 6 Coefficients, R-squared etc are unchanged from previous specification But standard errors are adjusted: standard errors larger, t-statistics are lower

  12. Lets get rid of the month variable . . r re eg g i in nc cm m f fe em ma al le e a ag ge e a ag ge e2 2 p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g i if f a ag ge e > >= = 1 17 7 & & a ag ge e < <= = 6 64 4, , c cl lu us st te er r( (p pi id d) ) L Li in ne ea ar r r re eg gr re es ss si io on n N Nu um mb be er r o of f o ob bs s = = 1 16 64 46 60 0 F F( ( 6 6, , 2 24 46 66 6) ) = = 1 15 56 6. .7 78 8 P Pr ro ob b > > F F = = 0 0. .0 00 00 00 0 R R- -s sq qu ua ar re ed d = = 0 0. .2 28 89 95 5 R Ro oo ot t M MS SE E = = 8 84 47 7. .3 33 3 ( (S St td d. . E Er rr r. . a ad dj ju us st te ed d f fo or r 2 24 46 67 7 c cl lu us st te er rs s i in n p pi id d) ) R Ro ob bu us st t i in nc cm m C Co oe ef f. . S St td d. . E Er rr r. . t t P P> >| |t t| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] f fe em ma al le e - -5 59 94 4. .8 85 59 96 6 3 31 1. .8 80 06 68 82 2 - -1 18 8. .7 70 0 0 0. .0 00 00 0 - -6 65 57 7. .2 23 30 04 4 - -5 53 32 2. .4 48 88 87 7 a ag ge e 1 10 00 0. .9 98 82 27 7 7 7. .3 32 25 59 99 95 5 1 13 3. .7 78 8 0 0. .0 00 00 0 8 86 6. .6 61 17 7 1 11 15 5. .3 34 48 85 5 a ag ge e2 2 - -1 1. .1 15 53 38 83 34 4 . .0 09 93 34 41 15 55 5 - -1 12 2. .3 35 5 0 0. .0 00 00 0 - -1 1. .3 33 37 70 01 15 5 - -. .9 97 70 06 65 53 34 4 p pa ar rt tn ne er r 1 15 55 5. .5 56 61 18 8 3 30 0. .8 87 77 77 78 8 5 5. .0 04 4 0 0. .0 00 00 0 9 95 5. .0 01 12 27 75 5 2 21 16 6. .1 11 10 09 9 e ed d_ _s se ec c 3 38 81 1. .0 02 24 47 7 3 30 0. .3 36 61 18 83 3 1 12 2. .5 55 5 0 0. .0 00 00 0 3 32 21 1. .4 48 87 74 4 4 44 40 0. .5 56 62 2 e ed d_ _d de eg g 1 10 07 76 6. .8 83 37 7 6 64 4. .4 44 40 01 19 9 1 16 6. .7 71 1 0 0. .0 00 00 0 9 95 50 0. .4 47 74 45 5 1 12 20 03 3. .1 19 99 9 _ _c co on ns s - -8 86 66 6. .2 28 83 36 6 1 12 25 5. .9 97 78 87 7 - -6 6. .8 88 8 0 0. .0 00 00 0 - -1 11 11 13 3. .3 31 19 9 - -6 61 19 9. .2 24 48 86 6 Think about the female coefficient a bit more. Could it be to do with women working shorter hours?

  13. Control for weekly hours of work . . r re eg g i in nc cm m f fe em ma al le e a ag ge e a ag ge e2 2 p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g h hr rs sm m i if f a ag ge e > >= = 1 17 7 & & a ag ge e < <= = 6 64 4, , c cl lu us st te er r( (p pi id d) ) L Li in ne ea ar r r re eg gr re es ss si io on n N Nu um mb be er r o of f o ob bs s = = 1 13 39 99 98 8 F F( ( 7 7, , 2 22 26 62 2) ) = = 2 24 47 7. .6 67 7 P Pr ro ob b > > F F = = 0 0. .0 00 00 00 0 R R- -s sq qu ua ar re ed d = = 0 0. .4 45 58 80 0 R Ro oo ot t M MS SE E = = 6 69 90 0. .9 95 5 ( (S St td d. . E Er rr r. . a ad dj ju us st te ed d f fo or r 2 22 26 63 3 c cl lu us st te er rs s i in n p pi id d) ) R Ro ob bu us st t i in nc cm m C Co oe ef f. . S St td d. . E Er rr r. . t t P P> >| |t t| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] f fe em ma al le e - -3 31 14 4. .6 68 87 74 4 3 34 4. .3 32 29 95 54 4 - -9 9. .1 17 7 0 0. .0 00 00 0 - -3 38 82 2. .0 00 08 81 1 - -2 24 47 7. .3 36 66 67 7 a ag ge e 7 79 9. .5 55 52 28 89 9 6 6. .3 37 72 29 91 18 8 1 12 2. .4 48 8 0 0. .0 00 00 0 6 67 7. .0 05 55 55 51 1 9 92 2. .0 05 50 02 27 7 a ag ge e2 2 - -. .8 87 73 33 33 35 5 . .0 08 81 17 75 51 18 8 - -1 10 0. .6 68 8 0 0. .0 00 00 0 - -1 1. .0 03 33 36 65 51 1 - -. .7 71 13 30 01 18 86 6 p pa ar rt tn ne er r 1 14 48 8. .0 02 26 65 5 2 26 6. .0 07 78 88 85 5 5 5. .6 68 8 0 0. .0 00 00 0 9 96 6. .8 88 85 55 51 1 1 19 99 9. .1 16 67 75 5 e ed d_ _s se ec c 3 34 40 0. .6 68 8 2 26 6. .6 67 71 17 71 1 1 12 2. .7 77 7 0 0. .0 00 00 0 2 28 88 8. .3 37 76 64 4 3 39 92 2. .9 98 83 35 5 e ed d_ _d de eg g 9 99 96 6. .7 74 43 34 4 5 59 9. .8 88 83 36 69 9 1 16 6. .6 64 4 0 0. .0 00 00 0 8 87 79 9. .3 31 10 07 7 1 11 11 14 4. .1 17 76 6 h hr rs sm m 5 5. .6 65 54 46 68 82 2 . .2 24 46 67 77 77 77 7 2 22 2. .9 91 1 0 0. .0 00 00 0 5 5. .1 17 70 07 74 47 7 6 6. .1 13 38 86 61 16 6 _ _c co on ns s - -1 14 49 95 5. .8 80 05 5 1 11 11 1. .8 82 22 23 3 - -1 13 3. .3 38 8 0 0. .0 00 00 0 - -1 17 71 15 5. .0 09 9 - -1 12 27 76 6. .5 52 2 Is the coefficient on hours of work reasonable? 5.65 for every additional hour worked certainly in the right ball park.

  14. Looking at 2 specifications together L Li in ne ea ar r r re eg gr re es ss si io on n N Nu um mb be er r o of f o ob bs s = = 1 16 64 46 60 0 L Li in ne ea ar r r re eg gr re es ss si io on n N Nu um mb be er r o of f o ob bs s = = 1 13 39 99 98 8 F F( ( 6 6, , 2 24 46 66 6) ) = = 1 15 56 6. .7 78 8 F F( ( 7 7, , 2 22 26 62 2) ) = = 2 24 47 7. .6 67 7 P Pr ro ob b > > F F = = 0 0. .0 00 00 00 0 P Pr ro ob b > > F F = = 0 0. .0 00 00 00 0 R R- -s sq qu ua ar re ed d = = 0 0. .2 28 89 95 5 R R- -s sq qu ua ar re ed d = = 0 0. .4 45 58 80 0 R Ro oo ot t M MS SE E = = 8 84 47 7. .3 33 3 R Ro oo ot t M MS SE E = = 6 69 90 0. .9 95 5 R Ro ob bu us st t R Ro ob bu us st t i in nc cm m C Co oe ef f. . S St td d. . E Er rr r. . t t P P> >| |t t| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] i in nc cm m C Co oe ef f. . S St td d. . E Er rr r. . t t P P> >| |t t| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] f fe em ma al le e - -5 59 94 4. .8 85 59 96 6 3 31 1. .8 80 06 68 82 2 - -1 18 8. .7 70 0 0 0. .0 00 00 0 - -6 65 57 7. .2 23 30 04 4 - -5 53 32 2. .4 48 88 87 7 f fe em ma al le e - -3 31 14 4. .6 68 87 74 4 3 34 4. .3 32 29 95 54 4 - -9 9. .1 17 7 0 0. .0 00 00 0 - -3 38 82 2. .0 00 08 81 1 - -2 24 47 7. .3 36 66 67 7 a ag ge e 1 10 00 0. .9 98 82 27 7 7 7. .3 32 25 59 99 95 5 1 13 3. .7 78 8 0 0. .0 00 00 0 8 86 6. .6 61 17 7 1 11 15 5. .3 34 48 85 5 a ag ge e 7 79 9. .5 55 52 28 89 9 6 6. .3 37 72 29 91 18 8 1 12 2. .4 48 8 0 0. .0 00 00 0 6 67 7. .0 05 55 55 51 1 9 92 2. .0 05 50 02 27 7 a ag ge e2 2 - -1 1. .1 15 53 38 83 34 4 . .0 09 93 34 41 15 55 5 - -1 12 2. .3 35 5 0 0. .0 00 00 0 - -1 1. .3 33 37 70 01 15 5 - -. .9 97 70 06 65 53 34 4 a ag ge e2 2 - -. .8 87 73 33 33 35 5 . .0 08 81 17 75 51 18 8 - -1 10 0. .6 68 8 0 0. .0 00 00 0 - -1 1. .0 03 33 36 65 51 1 - -. .7 71 13 30 01 18 86 6 p pa ar rt tn ne er r 1 15 55 5. .5 56 61 18 8 3 30 0. .8 87 77 77 78 8 5 5. .0 04 4 0 0. .0 00 00 0 9 95 5. .0 01 12 27 75 5 2 21 16 6. .1 11 10 09 9 p pa ar rt tn ne er r 1 14 48 8. .0 02 26 65 5 2 26 6. .0 07 78 88 85 5 5 5. .6 68 8 0 0. .0 00 00 0 9 96 6. .8 88 85 55 51 1 1 19 99 9. .1 16 67 75 5 e ed d_ _s se ec c 3 38 81 1. .0 02 24 47 7 3 30 0. .3 36 61 18 83 3 1 12 2. .5 55 5 0 0. .0 00 00 0 3 32 21 1. .4 48 87 74 4 4 44 40 0. .5 56 62 2 e ed d_ _s se ec c 3 34 40 0. .6 68 8 2 26 6. .6 67 71 17 71 1 1 12 2. .7 77 7 0 0. .0 00 00 0 2 28 88 8. .3 37 76 64 4 3 39 92 2. .9 98 83 35 5 e ed d_ _d de eg g 1 10 07 76 6. .8 83 37 7 6 64 4. .4 44 40 01 19 9 1 16 6. .7 71 1 0 0. .0 00 00 0 9 95 50 0. .4 47 74 45 5 1 12 20 03 3. .1 19 99 9 e ed d_ _d de eg g 9 99 96 6. .7 74 43 34 4 5 59 9. .8 88 83 36 69 9 1 16 6. .6 64 4 0 0. .0 00 00 0 8 87 79 9. .3 31 10 07 7 1 11 11 14 4. .1 17 76 6 _ _c co on ns s - -8 86 66 6. .2 28 83 36 6 1 12 25 5. .9 97 78 87 7 - -6 6. .8 88 8 0 0. .0 00 00 0 - -1 11 11 13 3. .3 31 19 9 - -6 61 19 9. .2 24 48 86 6 h hr rs sm m 5 5. .6 65 54 46 68 82 2 . .2 24 46 67 77 77 77 7 2 22 2. .9 91 1 0 0. .0 00 00 0 5 5. .1 17 70 07 74 47 7 6 6. .1 13 38 86 61 16 6 _ _c co on ns s - -1 14 49 95 5. .8 80 05 5 1 11 11 1. .8 82 22 23 3 - -1 13 3. .3 38 8 0 0. .0 00 00 0 - -1 17 71 15 5. .0 09 9 - -1 12 27 76 6. .5 52 2 R-squared jumps from 29% to 46% Coefficient on female goes from -595 to -315 Almost half the effect of gender is explained by women s shorter hours of work Age, partner and education coefficients are also reduced in magnitude, for similar reasons Number of observations reduces from 16460 to 13998 missing data on hours

  15. Interesting post-estimation activities What age does income peak? L Li in ne ea ar r r re eg gr re es ss si io on n N Nu um mb be er r o of f o ob bs s = = 1 13 39 99 98 8 F F( ( 7 7, , 2 22 26 62 2) ) = = 2 24 47 7. .6 67 7 P Pr ro ob b > > F F = = 0 0. .0 00 00 00 0 R R- -s sq qu ua ar re ed d = = 0 0. .4 45 58 80 0 Income = Y + 1*age + 2*age2 R Ro oo ot t M MS SE E = = 6 69 90 0. .9 95 5 R Ro ob bu us st t d(Income)/d(age) = 1+ 2 2*age i in nc cm m C Co oe ef f. . S St td d. . E Er rr r. . t t P P> >| |t t| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] Derivative = zero when f fe em ma al le e - -3 31 14 4. .6 68 87 74 4 3 34 4. .3 32 29 95 54 4 - -9 9. .1 17 7 0 0. .0 00 00 0 - -3 38 82 2. .0 00 08 81 1 - -2 24 47 7. .3 36 66 67 7 a ag ge e 7 79 9. .5 55 52 28 89 9 6 6. .3 37 72 29 91 18 8 1 12 2. .4 48 8 0 0. .0 00 00 0 6 67 7. .0 05 55 55 51 1 9 92 2. .0 05 50 02 27 7 a ag ge e2 2 - -. .8 87 73 33 33 35 5 . .0 08 81 17 75 51 18 8 - -1 10 0. .6 68 8 0 0. .0 00 00 0 - -1 1. .0 03 33 36 65 51 1 - -. .7 71 13 30 01 18 86 6 age = - 1/2 2 p pa ar rt tn ne er r 1 14 48 8. .0 02 26 65 5 2 26 6. .0 07 78 88 85 5 5 5. .6 68 8 0 0. .0 00 00 0 9 96 6. .8 88 85 55 51 1 1 19 99 9. .1 16 67 75 5 e ed d_ _s se ec c 3 34 40 0. .6 68 8 2 26 6. .6 67 71 17 71 1 1 12 2. .7 77 7 0 0. .0 00 00 0 2 28 88 8. .3 37 76 64 4 3 39 92 2. .9 98 83 35 5 = -79.552/(-0.873*2) e ed d_ _d de eg g 9 99 96 6. .7 74 43 34 4 5 59 9. .8 88 83 36 69 9 1 16 6. .6 64 4 0 0. .0 00 00 0 8 87 79 9. .3 31 10 07 7 1 11 11 14 4. .1 17 76 6 h hr rs sm m 5 5. .6 65 54 46 68 82 2 . .2 24 46 67 77 77 77 7 2 22 2. .9 91 1 0 0. .0 00 00 0 5 5. .1 17 70 07 74 47 7 6 6. .1 13 38 86 61 16 6 _ _c co on ns s - -1 14 49 95 5. .8 80 05 5 1 11 11 1. .8 82 22 23 3 - -1 13 3. .3 38 8 0 0. .0 00 00 0 - -1 17 71 15 5. .0 09 9 - -1 12 27 76 6. .5 52 2 = 45.5 Is the effect of university qualifications statistically different from the effect of secondary education? . . t te es st t e ed d_ _s se ec c = = e ed d_ _d de eg g ( ( 1 1) ) e ed d_ _s se ec c - - e ed d_ _d de eg g = = 0 0 F F( ( 1 1, , 2 22 26 62 2) ) = = 1 11 10 0. .7 75 5 P Pr ro ob b > > F F = = 0 0. .0 00 00 00 0

  16. A closer look at partner coefficient

  17. . . b by ys so or rt t f fe em ma al le e: : r re eg g i in nc cm m a ag ge e a ag ge e2 2 p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g h hr rs sm m i if f a ag ge e > >= = 1 17 7 & & a ag ge e < <= = 6 64 4, , c cl lu us st te er r( (p pi id d) ) - -> > f fe em ma al le e = = 0 0 L Li in ne ea ar r r re eg gr re es ss si io on n N Nu um mb be er r o of f o ob bs s = = 6 67 77 76 6 Men who are part of a couple earn much more than men who are not women less so. F F( ( 6 6, , 1 10 09 95 5) ) = = 1 11 15 5. .5 53 3 P Pr ro ob b > > F F = = 0 0. .0 00 00 00 0 R R- -s sq qu ua ar re ed d = = 0 0. .3 34 45 52 2 R Ro oo ot t M MS SE E = = 7 78 87 7. .9 93 3 ( (S St td d. . E Er rr r. . a ad dj ju us st te ed d f fo or r 1 10 09 96 6 c cl lu us st te er rs s i in n p pi id d) ) R Ro ob bu us st t i in nc cm m C Co oe ef f. . S St td d. . E Er rr r. . t t P P> >| |t t| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] a ag ge e 1 11 13 3. .3 31 11 19 9 1 10 0. .5 56 63 35 56 6 1 10 0. .7 73 3 0 0. .0 00 00 0 9 92 2. .5 58 84 48 8 1 13 34 4. .0 03 39 9 a ag ge e2 2 - -1 1. .2 25 57 73 36 66 6 . .1 13 31 16 62 25 53 3 - -9 9. .5 55 5 0 0. .0 00 00 0 - -1 1. .5 51 15 56 63 33 3 - -. .9 99 99 91 1 p pa ar rt tn ne er r 2 21 13 3. .3 35 51 1 4 46 6. .9 98 81 17 7 4 4. .5 54 4 0 0. .0 00 00 0 1 12 21 1. .1 16 66 67 7 3 30 05 5. .5 53 35 54 4 e ed d_ _s se ec c 3 35 56 6. .7 74 47 72 2 4 41 1. .9 91 11 15 51 1 8 8. .5 51 1 0 0. .0 00 00 0 2 27 74 4. .5 51 11 13 3 4 43 38 8. .9 98 83 32 2 e ed d_ _d de eg g 1 10 08 82 2. .2 25 55 5 8 89 9. .2 21 12 24 41 1 1 12 2. .1 13 3 0 0. .0 00 00 0 9 90 07 7. .2 20 08 87 7 1 12 25 57 7. .3 30 02 2 h hr rs sm m 3 3. .9 93 30 04 41 12 2 . .3 37 78 84 49 92 25 5 1 10 0. .3 38 8 0 0. .0 00 00 0 3 3. .1 18 87 77 76 6 4 4. .6 67 73 30 06 65 5 _ _c co on ns s - -1 19 90 07 7. .1 10 07 7 1 17 75 5. .5 56 68 81 1 - -1 10 0. .8 86 6 0 0. .0 00 00 0 - -2 22 25 51 1. .5 59 95 5 - -1 15 56 62 2. .6 62 2 - -> > f fe em ma al le e = = 1 1 Other coefficients also differ between men and women, but with current specification, we can t test whether differences are significant. L Li in ne ea ar r r re eg gr re es ss si io on n N Nu um mb be er r o of f o ob bs s = = 7 72 22 22 2 F F( ( 6 6, , 1 11 16 66 6) ) = = 1 12 25 5. .3 30 0 P Pr ro ob b > > F F = = 0 0. .0 00 00 00 0 R R- -s sq qu ua ar re ed d = = 0 0. .4 48 83 30 0 R Ro oo ot t M MS SE E = = 5 56 64 4. .6 65 5 ( (S St td d. . E Er rr r. . a ad dj ju us st te ed d f fo or r 1 11 16 67 7 c cl lu us st te er rs s i in n p pi id d) ) R Ro ob bu us st t i in nc cm m C Co oe ef f. . S St td d. . E Er rr r. . t t P P> >| |t t| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] a ag ge e 5 56 6. .2 20 09 98 89 9 7 7. .3 32 27 76 68 88 8 7 7. .6 67 7 0 0. .0 00 00 0 4 41 1. .8 83 32 29 96 6 7 70 0. .5 58 86 68 82 2 a ag ge e2 2 - -. .6 62 22 29 93 37 72 2 . .0 09 96 61 16 60 05 5 - -6 6. .4 48 8 0 0. .0 00 00 0 - -. .8 81 11 16 60 04 41 1 - -. .4 43 34 42 27 70 02 2 p pa ar rt tn ne er r 8 84 4. .1 15 53 36 65 5 2 29 9. .2 27 76 67 77 7 2 2. .8 87 7 0 0. .0 00 04 4 2 26 6. .7 71 12 26 6 1 14 41 1. .5 59 94 47 7 e ed d_ _s se ec c 2 27 77 7. .2 28 82 23 3 3 31 1. .6 66 61 17 75 5 8 8. .7 76 6 0 0. .0 00 00 0 2 21 15 5. .1 16 61 19 9 3 33 39 9. .4 40 02 26 6 e ed d_ _d de eg g 8 81 19 9. .3 30 00 02 2 7 73 3. .7 74 46 63 37 7 1 11 1. .1 11 1 0 0. .0 00 00 0 6 67 74 4. .6 60 09 98 8 9 96 63 3. .9 99 90 06 6 h hr rs sm m 6 6. .8 80 06 69 94 46 6 . .3 30 05 51 10 01 15 5 2 22 2. .3 31 1 0 0. .0 00 00 0 6 6. .2 20 08 83 33 37 7 7 7. .4 40 05 55 55 56 6 _ _c co on ns s - -1 13 38 82 2. .8 84 44 4 1 13 33 3. .0 06 60 07 7 - -1 10 0. .3 39 9 0 0. .0 00 00 0 - -1 16 64 43 3. .9 90 09 9 - -1 11 12 21 1. .7 77 79 9

  18. Logit and Probit Developed for discrete (categorical) dependent variables Eg, psychological morbidity, whether one has a job . Think of other examples. Outcome variable is always 0 or 1. Estimate: = = X Pr( ) 1 = ( , ) Y F X = Pr( ) 0 1 ( , ) Y F OLS (linear probability model) would set F(X, ) = X + Inappropriate because: Heteroscedasticity: the outcome variable is always 0 or 1, so only takes the value -x or 1-x More seriously, one cannot constrain estimated probabilities to lie between 0 and 1.

  19. Logit and Probit Solution: We need a link function that will transform our dichotomous Y into a continuous form Y Looking for a function which lies between 0 and 1: Cumulative normal distribution: Probit model Z scores assuming the cumulative normal distribution ' dt t Y = = x = Pr( ) 1 ( ). ( ' ) X Logistic distribution: Logit (logistic) model Logged odds of probability e Y + x ) 1 = = = x Pr( ( ) x 1 e They are very similar! Note how they lie between 0 and 1 (vertical axis)

  20. Maximum likelihood estimation Likelihood function: product of Pr(y=1) = F(x ) for all observations where y=1 Pr(y=0) = 1- F(x ) for all observations where y=0 (think of the probability of flipping exactly four heads and two tails, with six dice) Log likelihood written as = + ln ln ( ) ln[ 1 ( )] L y F x y F x j j j j j s j s Estimated using an iterative procedure STATA chooses starting values for s Computes slopes of likelihood function at these values Adjusts s accordingly Stops when slope of LF is 0 Can take time!

  21. Lets look at whether a person works . . t ta ab b j jb bs st ta at t, , m m c cu ur rr re en nt t e ec co on no om mi ic c a ac ct ti iv vi it ty y F Fr re eq q. . P Pe er rc ce en nt t C Cu um m. . m mi is ss si in ng g o or r w wi il ld d 1 13 3 0 0. .0 03 3 0 0. .0 03 3 - -7 7 6 66 6 0 0. .1 18 8 0 0. .2 21 1 n no ot t a an ns sw we er re ed d 2 2 0 0. .0 01 1 0 0. .2 22 2 s se el lf f- -e em mp pl lo oy ye ed d 2 2, ,2 20 04 4 5 5. .8 87 7 6 6. .0 08 8 e em mp pl lo oy ye ed d 1 14 4, ,7 70 02 2 3 39 9. .1 15 5 4 45 5. .2 24 4 u un ne em mp pl lo oy ye ed d 1 1, ,1 12 20 0 2 2. .9 98 8 4 48 8. .2 22 2 r re et ti ir re ed d 4 4, ,7 72 26 6 1 12 2. .5 59 9 6 60 0. .8 80 0 m ma at te er rn ni it ty y l le ea av ve e 3 32 20 0 0 0. .8 85 5 6 61 1. .6 66 6 f fa am mi il ly y c ca ar re e 1 1, ,9 96 64 4 5 5. .2 23 3 6 66 6. .8 89 9 f ft t s st tu ud dt t, , s sc ch ho oo ol l 1 1, ,3 39 94 4 3 3. .7 71 1 7 70 0. .6 60 0 l lt t s si ic ck k, , d di is sa ab bl ld d 1 1, ,0 05 57 7 2 2. .8 81 1 7 73 3. .4 41 1 g gv vt t t tr rn ng g s sc ch he em me e 6 67 7 0 0. .1 18 8 7 73 3. .5 59 9 o ot th he er r 1 12 24 4 0 0. .3 33 3 7 73 3. .9 92 2 . . 9 9, ,7 79 93 3 2 26 6. .0 08 8 1 10 00 0. .0 00 0 T To ot ta al l 3 37 7, ,5 55 52 2 1 10 00 0. .0 00 0 gen byte work = (jbstat == 1 | jbstat == 2) if jbstat >= 1 & jbstat != .

  22. Logit regression: whether have a job All the iterations . . l lo og gi it t w wo or rk k f fe em ma al le e a ag ge e a ag ge e2 2 b ba ad dh he ea al lt th h p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g n nk ki id ds s i if f a ag ge e > >= = 2 22 2 & & a ag ge e < <= = 6 60 0, , c cl lu us st te er r( (p pi id d) ) I It te er ra at ti io on n 0 0: : l lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -9 91 17 74 4. .0 03 31 13 3 I It te er ra at ti io on n 1 1: : l lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -7 79 90 09 9. .9 90 06 67 7 2* (LL of this model LL of null model) I It te er ra at ti io on n 2 2: : l lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -7 78 83 38 8. .4 42 28 88 8 I It te er ra at ti io on n 3 3: : l lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -7 78 83 38 8. .2 23 37 72 2 I It te er ra at ti io on n 4 4: : l lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -7 78 83 38 8. .2 23 37 72 2 L Lo og gi is st ti ic c r re eg gr re es ss si io on n N Nu um mb be er r o of f o ob bs s = = 1 17 72 26 68 8 W Wa al ld d c ch hi i2 2( (8 8) ) = = 6 61 13 3. .5 59 9 P Pr ro ob b > > c ch hi i2 2 = = 0 0. .0 00 00 00 0 L Lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -7 78 83 38 8. .2 23 37 72 2 P Ps se eu ud do o R R2 2 = = 0 0. .1 14 45 56 6 ( (S St td d. . E Er rr r. . a ad dj ju us st te ed d f fo or r 2 24 43 30 0 c cl lu us st te er rs s i in n p pi id d) ) Measure of amount explained but less intuitive interpretation R Ro ob bu us st t w wo or rk k C Co oe ef f. . S St td d. . E Er rr r. . z z P P> >| |z z| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] f fe em ma al le e - -. .8 80 00 01 11 15 56 6 . .0 09 90 08 80 02 2 - -8 8. .8 81 1 0 0. .0 00 00 0 - -. .9 97 78 80 08 84 42 2 - -. .6 62 22 21 14 47 7 a ag ge e . .3 35 57 78 82 24 42 2 . .0 02 28 82 28 83 31 1 1 12 2. .6 65 5 0 0. .0 00 00 0 . .3 30 02 23 39 90 04 4 . .4 41 13 32 25 58 81 1 a ag ge e2 2 - -. .0 00 04 46 65 54 46 6 . .0 00 00 03 35 50 04 4 - -1 13 3. .2 28 8 0 0. .0 00 00 0 - -. .0 00 05 53 34 41 14 4 - -. .0 00 03 39 96 67 79 9 b ba ad dh he ea al lt th h - -. .5 52 21 13 38 82 26 6 . .0 04 40 04 40 06 68 8 - -1 12 2. .9 90 0 0 0. .0 00 00 0 - -. .6 60 00 05 57 78 85 5 - -. .4 44 42 21 18 86 68 8 p pa ar rt tn ne er r . .4 46 68 81 12 25 57 7 . .0 09 94 43 33 38 83 3 4 4. .9 96 6 0 0. .0 00 00 0 . .2 28 83 32 22 26 61 1 . .6 65 53 30 02 25 53 3 e ed d_ _s se ec c . .6 60 02 26 65 53 3 . .0 09 90 00 02 28 82 2 6 6. .6 69 9 0 0. .0 00 00 0 . .4 42 26 62 20 00 09 9 . .7 77 79 91 10 05 51 1 e ed d_ _d de eg g . .8 87 73 34 48 89 92 2 . .1 14 46 68 84 46 62 2 5 5. .9 95 5 0 0. .0 00 00 0 . .5 58 85 56 67 76 6 1 1. .1 16 61 13 30 02 2 n nk ki id ds s - -. .4 47 77 77 71 14 4 . .0 03 39 91 11 11 16 6 - -1 12 2. .2 21 1 0 0. .0 00 00 0 - -. .5 55 54 43 37 71 14 4 - -. .4 40 01 10 05 56 67 7 _ _c co on ns s - -3 3. .6 66 66 63 35 52 2 . .5 52 21 16 69 91 13 3 - -7 7. .0 03 3 0 0. .0 00 00 0 - -4 4. .6 68 88 88 84 48 8 - -2 2. .6 64 43 38 85 56 6 From these coefficients, can tell whether estimated effects are positive or negative Whether they re significant Something about effect sizes but difficult to draw inferences from coefficients

  23. Comparing logit and probit Logit -0.800 0.358 -0.005 -0.521 0.468 0.603 0.873 -0.478 -3.666 Probit -0.455 0.206 -0.003 -0.300 0.284 0.343 0.476 -0.275 -2.112 Probit * 1.6 -0.728 female age age2 badhealth partner ed_sec ed_deg nkids _cons 0.330 -0.004 -0.479 0.455 0.548 0.762 -0.441 -3.380 Scaling factor proposed by Amemiya (1981) Multiply Probit coefficients by 1.6 to get an approximation to Logit Other authors have suggested a factor of 1.8

  24. Marginal effects After logit or Probit estimation, use the margins command Calculates marginal effects of each of the RHS variables on the dependent variable Slope of the function for continuous variables Effect of change from 0 to 1 in a dummy variable Can also provide predicted probabilities, linear combinations, plots, and much more! MEM: Marginal Effects at the Means margins, dydx(*) atmeans AME: Average Marginal Effects Margins, dydx(*) MER: Marginal Effects at Representative Values Margins, dydx(*) at(age=20 30 40 50)

  25. Marginal effects Logit -0.118 0.053 -0.001 -0.078 0.076 0.087 0.106 -0.071 Probit -0.122 0.056 -0.001 -0.081 0.082 0.090 0.109 -0.075 OLS female* age age2 badhea~h partner* ed_sec* ed_deg* nkids Constant -0.114 0.057 -0.001 -0.086 0.075 0.094 0.118 -0.077 -0.045 Logit and Probit mfx are very similar indeed OLS is actually not too bad

  26. Odds ratios Only an option with logit Type or in, after the comma as an option Reports odds ratios: that is, how many times more (or less) likely the outcome becomes if the variable is 1 rather than 0, in the case of a dichotomous variable for each unit increase of the variable, for a continuous variable Results >1 show an increased probability, results <1 show decrease . . l lo og gi it t w wo or rk k f fe em ma al le e a ag ge e a ag ge e2 2 b ba ad dh he ea al lt th h p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g n nk ki id ds s i if f a ag ge e > >= = 2 22 2 & & a ag ge e < <= = 6 60 0, , c cl lu us st te er r( (p pi id d) ) o or r I It te er ra at ti io on n 0 0: : l lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -9 91 17 74 4. .0 03 31 13 3 I It te er ra at ti io on n 1 1: : l lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -7 79 90 09 9. .9 90 06 67 7 I It te er ra at ti io on n 2 2: : l lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -7 78 83 38 8. .4 42 28 88 8 I It te er ra at ti io on n 3 3: : l lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -7 78 83 38 8. .2 23 37 72 2 I It te er ra at ti io on n 4 4: : l lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -7 78 83 38 8. .2 23 37 72 2 L Lo og gi is st ti ic c r re eg gr re es ss si io on n N Nu um mb be er r o of f o ob bs s = = 1 17 72 26 68 8 W Wa al ld d c ch hi i2 2( (8 8) ) = = 6 61 13 3. .5 59 9 P Pr ro ob b > > c ch hi i2 2 = = 0 0. .0 00 00 00 0 L Lo og g p ps se eu ud do ol li ik ke el li ih ho oo od d = = - -7 78 83 38 8. .2 23 37 72 2 P Ps se eu ud do o R R2 2 = = 0 0. .1 14 45 56 6 ( (S St td d. . E Er rr r. . a ad dj ju us st te ed d f fo or r 2 24 43 30 0 c cl lu us st te er rs s i in n p pi id d) ) R Ro ob bu us st t w wo or rk k O Od dd ds s R Ra at ti io o S St td d. . E Er rr r. . z z P P> >| |z z| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] f fe em ma al le e . .4 44 49 92 27 77 7 . .0 04 40 07 79 95 52 2 - -8 8. .8 81 1 0 0. .0 00 00 0 . .3 37 76 60 03 30 08 8 . .5 53 36 67 79 90 07 7 a ag ge e 1 1. .4 43 30 02 21 14 4 . .0 04 40 04 45 50 09 9 1 12 2. .6 65 5 0 0. .0 00 00 0 1 1. .3 35 53 30 08 89 9 1 1. .5 51 11 17 73 35 5 a ag ge e2 2 . .9 99 95 53 35 56 62 2 . .0 00 00 03 34 48 88 8 - -1 13 3. .2 28 8 0 0. .0 00 00 0 . .9 99 94 46 67 72 28 8 . .9 99 96 60 04 4 b ba ad dh he ea al lt th h . .5 59 93 36 69 99 91 1 . .0 02 23 39 98 89 95 5 - -1 12 2. .9 90 0 0 0. .0 00 00 0 . .5 54 48 84 49 94 42 2 . .6 64 42 26 62 29 96 6 p pa ar rt tn ne er r 1 1. .5 59 96 69 99 98 8 . .1 15 50 06 65 58 8 4 4. .9 96 6 0 0. .0 00 00 0 1 1. .3 32 27 74 40 05 5 1 1. .9 92 21 13 34 45 5 e ed d_ _s se ec c 1 1. .8 82 26 69 95 59 9 . .1 16 64 44 47 77 79 9 6 6. .6 69 9 0 0. .0 00 00 0 1 1. .5 53 31 14 42 28 8 2 2. .1 17 79 95 52 21 1 e ed d_ _d de eg g 2 2. .3 39 95 52 25 54 4 . .3 35 51 17 73 33 39 9 5 5. .9 95 5 0 0. .0 00 00 0 1 1. .7 79 96 62 20 05 5 3 3. .1 19 94 40 09 91 1 n nk ki id ds s . .6 62 20 01 19 99 95 5 . .0 02 24 42 25 57 7 - -1 12 2. .2 21 1 0 0. .0 00 00 0 . .5 57 74 44 43 33 33 3 . .6 66 69 96 61 12 21 1

  27. Other post-estimation commands Likelihood ratio test lrtest Adding an extra variable to the RHS always increases the likelihood But, does it add enough to the likelihood? LR test calculates L0/L1 (Lrestricted/Lunrestricted) and calculates chi-squared stat with d.f. equal to the number of variables you are dropping. Null hypothesis: restricted specification. Only works on nested models, ie, where the RHS variables in one model are a subset of the RHS variables in the other. How to do it Run the full model Type estimates store NAME Run a smaller model Type estimates store ANOTHERNAME .. And so on for as many models as you like Type lrtest NAME ANOTHERNAME Be careful .. Sample sizes must be the same for both models Won t happen if the dropped variable is missing for some observations Solve problem by running the biggest model first and using e(sample)

  28. . . d do o " "C C: :\ \D DO OC CU UM ME E~ ~1 1\ \m ma ar ri ia a\ \L LO OC CA AL LS S~ ~1 1\ \T Te em mp p\ \S ST TD D0 03 30 00 00 00 00 00 0. .t tm mp p" " I It te er ra at ti io on n 0 0: : l lo og g l li ik ke el li ih ho oo od d = = - -5 54 48 8. .0 06 63 32 25 5 LR test - example . . l lo og gi it t w wo or rk k a ag ge e a ag ge e2 2 b ba ad dh he ea al lt th h p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g n nk ki id ds s r r_ _* * i if f a ag ge e > >= = 2 21 1 & & a ag ge e < <= = 6 60 0 & & w wa av ve e = == = 1 15 5 I It te er ra at ti io on n 1 1: : l lo og g l li ik ke el li ih ho oo od d = = - -4 48 80 0. .9 90 07 75 57 7 I It te er ra at ti io on n 2 2: : l lo og g l li ik ke el li ih ho oo od d = = - -4 47 77 7. .0 04 47 78 83 3 I It te er ra at ti io on n 3 3: : l lo og g l li ik ke el li ih ho oo od d = = - -4 47 77 7. .0 02 29 97 74 4 Similar but not identical regression to previous examples I It te er ra at ti io on n 4 4: : l lo og g l li ik ke el li ih ho oo od d = = - -4 47 77 7. .0 02 29 97 74 4 Add regional variables, decide which ones to keep L Lo og gi is st ti ic c r re eg gr re es ss si io on n N Nu um mb be er r o of f o ob bs s = = 1 10 06 66 6 L LR R c ch hi i2 2( (1 14 4) ) = = 1 14 42 2. .0 07 7 Looks as though Scotland might stay, also possibly SW, NW, N P Pr ro ob b > > c ch hi i2 2 = = 0 0. .0 00 00 00 0 L Lo og g l li ik ke el li ih ho oo od d = = - -4 47 77 7. .0 02 29 97 74 4 P Ps se eu ud do o R R2 2 = = 0 0. .1 12 29 96 6 . . l lo og gi it t w wo or rk k a ag ge e a ag ge e2 2 b ba ad dh he ea al lt th h p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g n nk ki id ds s r r_ _* * i if f a ag ge e > >= = 2 21 1 & & a ag ge e < <= = 6 60 0 & & w wa av ve e = == = 1 15 5 w wo or rk k C Co oe ef f. . S St td d. . E Er rr r. . z z P P> >| |z z| | [ [9 95 5% % C Co on nf f. . I In nt te er rv va al l] ] a ag ge e . .3 30 09 93 30 03 37 7 . .0 05 59 99 92 29 95 5 5 5. .1 16 6 0 0. .0 00 00 0 . .1 19 91 18 84 44 41 1 . .4 42 26 67 76 63 33 3 a ag ge e2 2 - -. .0 00 03 39 91 10 01 1 . .0 00 00 07 72 29 9 - -5 5. .3 36 6 0 0. .0 00 00 0 - -. .0 00 05 53 33 38 89 9 - -. .0 00 02 24 48 81 14 4 b ba ad dh he ea al lt th h - -. .5 53 33 37 71 10 05 5 . .0 08 89 93 34 49 98 8 - -5 5. .9 97 7 0 0. .0 00 00 0 - -. .7 70 08 88 83 33 3 - -. .3 35 58 85 58 88 8 p pa ar rt tn ne er r . .2 24 44 44 43 36 6 . .1 19 98 84 43 39 92 2 1 1. .2 23 3 0 0. .2 21 18 8 - -. .1 14 44 44 49 97 77 7 . .6 63 33 33 36 69 98 8 e ed d_ _s se ec c . .7 77 73 37 77 74 44 4 . .1 17 74 49 98 88 84 4 4 4. .4 42 2 0 0. .0 00 00 0 . .4 43 30 08 80 03 35 5 1 1. .1 11 16 67 74 45 5 e ed d_ _d de eg g 1 1. .3 35 56 68 81 18 8 . .2 27 73 34 46 63 31 1 4 4. .9 96 6 0 0. .0 00 00 0 . .8 82 20 08 84 40 05 5 1 1. .8 89 92 27 79 96 6 n nk ki id ds s - -. .3 35 58 89 96 65 58 8 . .0 08 81 13 31 10 07 7 - -4 4. .4 41 1 0 0. .0 00 00 0 - -. .5 51 18 83 33 31 18 8 - -. .1 19 99 95 59 99 98 8 r r_ _l lo on n - -. .5 53 36 63 39 94 41 1 . .3 32 24 47 75 59 94 4 - -1 1. .6 65 5 0 0. .0 09 99 9 - -1 1. .1 17 72 29 91 11 1 . .1 10 00 01 12 22 26 6 r r_ _m mi id d - -. .3 37 79 96 66 68 83 3 . .2 28 80 07 78 85 51 1 - -1 1. .3 35 5 0 0. .1 17 76 6 - -. .9 92 29 99 99 97 7 . .1 17 70 06 66 60 04 4 r r_ _s sw w - -. .7 73 37 79 94 42 24 4 . .3 34 44 48 84 4 - -2 2. .1 14 4 0 0. .0 03 32 2 - -1 1. .4 41 13 38 81 16 6 - -. .0 06 62 20 06 68 84 4 r r_ _n nw w - -. .6 63 36 69 93 38 82 2 . .3 31 14 40 01 17 79 9 - -2 2. .0 03 3 0 0. .0 04 43 3 - -1 1. .2 25 52 24 40 02 2 - -. .0 02 21 14 47 74 44 4 r r_ _n nt th h - -. .6 62 27 70 09 99 93 3 . .2 29 94 40 05 54 44 4 - -2 2. .1 13 3 0 0. .0 03 33 3 - -1 1. .2 20 03 34 43 35 5 - -. .0 05 50 07 76 63 33 3 r r_ _w wl ls s - -. .4 42 25 51 15 57 79 9 . .3 38 86 62 26 62 21 1 - -1 1. .1 10 0 0 0. .2 27 71 1 - -1 1. .1 18 82 22 21 18 8 . .3 33 31 19 90 01 19 9 r r_ _s sc co o - -1 1. .1 18 83 32 25 56 6 . .3 34 41 13 31 12 28 8 - -3 3. .4 47 7 0 0. .0 00 01 1 - -1 1. .8 85 52 22 21 16 6 - -. .5 51 14 42 29 94 49 9 _ _c co on ns s - -3 3. .0 04 42 26 68 85 5 1 1. .1 13 33 37 77 71 1 - -2 2. .6 68 8 0 0. .0 00 07 7 - -5 5. .2 26 64 48 83 36 6 - -. .8 82 20 05 53 34 42 2 . . e es st ti im ma at te es s s st to or re e A AL LL L . . q qu ui ie et tl ly y l lo og gi it t w wo or rk k a ag ge e a ag ge e2 2 b ba ad dh he ea al lt th h p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g n nk ki id ds s i if f e e( (s sa am mp pl le e) ) . . e es st ti im ma at te es s s st to or re e D DR RO OP PR RE EG G . . q qu ui ie et tl ly y l lo og gi it t w wo or rk k a ag ge e a ag ge e2 2 b ba ad dh he ea al lt th h p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g n nk ki id ds s r r_ _s sc co o r r_ _s sw w r r_ _n nw w r r_ _n nt th h i if f e e( (s sa am mp pl le e) ) . . e es st ti im ma at te es s s st to or re e K KE EE EP P4 4 . . q qu ui ie et tl ly y l lo og gi it t w wo or rk k a ag ge e a ag ge e2 2 b ba ad dh he ea al lt th h p pa ar rt tn ne er r e ed d_ _s se ec c e ed d_ _d de eg g n nk ki id ds s r r_ _s sc co o i if f e e( (s sa am mp pl le e) ) . . e es st ti im ma at te es s s st to or re e K KE EE EP PS SC CO OT T

  29. LR test - example . . l lr rt te es st t A AL LL L D DR RO OP PR RE EG G REJECT nested specification L Li ik ke el li ih ho oo od d- -r ra at ti io o t te es st t L LR R c ch hi i2 2( (7 7) ) = = 1 14 4. .1 19 9 ( (A As ss su um mp pt ti io on n: : D DR RO OP PR RE EG G n ne es st te ed d i in n A AL LL L) ) P Pr ro ob b > > c ch hi i2 2 = = 0 0. .0 04 47 79 9 . . l lr rt te es st t A AL LL L K KE EE EP P4 4 L Li ik ke el li ih ho oo od d- -r ra at ti io o t te es st t L LR R c ch hi i2 2( (3 3) ) = = 3 3. .3 34 4 ( (A As ss su um mp pt ti io on n: : K KE EE EP P4 4 n ne es st te ed d i in n A AL LL L) ) P Pr ro ob b > > c ch hi i2 2 = = 0 0. .3 34 42 22 2 DON T REJECT nested spec . . l lr rt te es st t A AL LL L K KE EE EP PS SC CO OT T L Li ik ke el li ih ho oo od d- -r ra at ti io o t te es st t L LR R c ch hi i2 2( (6 6) ) = = 7 7. .6 60 0 ( (A As ss su um mp pt ti io on n: : K KE EE EP PS SC CO OT T n ne es st te ed d i in n A AL LL L) ) P Pr ro ob b > > c ch hi i2 2 = = 0 0. .2 26 68 89 9 . . l lr rt te es st t K KE EE EP P4 4 K KE EE EP PS SC CO OT T L Li ik ke el li ih ho oo od d- -r ra at ti io o t te es st t L LR R c ch hi i2 2( (3 3) ) = = 4 4. .2 26 6 ( (A As ss su um mp pt ti io on n: : K KE EE EP PS SC CO OT T n ne es st te ed d i in n K KE EE EP P4 4) ) P Pr ro ob b > > c ch hi i2 2 = = 0 0. .2 23 34 47 7 . . l lr rt te es st t K KE EE EP PS SC CO OT T D DR RO OP PR RE EG G L Li ik ke el li ih ho oo od d- -r ra at ti io o t te es st t L LR R c ch hi i2 2( (1 1) ) = = 6 6. .5 59 9 ( (A As ss su um mp pt ti io on n: : D DR RO OP PR RE EG G n ne es st te ed d i in n K KE EE EP PS SC CO OT T) ) P Pr ro ob b > > c ch hi i2 2 = = 0 0. .0 01 10 02 2 Reject dropping all regional variables against keeping full set Don t reject dropping all but 4, over keeping full set Don t reject dropping all but Scotland, over keeping full set Don t reject dropping all but Scotland, over dropping all but 4 [and just to check: DO reject dropping all regional variables against dropping all but Scotland]

  30. Again, specification is illustrative only This is not an example of a finished labour supply model! How could one improve the model? Model specification Theoretical considerations, Empirical considerations Parsimony Stepwise regression techniques Regression diagnostics Interpreting results Spotting unreasonable results

  31. Other models Other models to be aware of, but not covered on this course: Extensions to logit and probit Ordered models (ologit, oprobit) for ordered outcomes Levels of education, Number of children Excellent, good, fair or poor health Multinomial models (mlogit, mprobit) for multiple outcomes with no obvious ordering Working in public, private or voluntary sector Choice of nursery, childminder or playgroup for pre-school care Heckman selection model For modelling two-stage procedures Earnings, conditional on having a job at all Having a job is modelled as a probit, earnings are modelled as OLS Used particularly for women s earnings Tobit model for censored or truncated data Typically, for data where there are lots of zeros Expenditure on rarely-purchased items, eg cars Children s weights, in an experiment where the scales broke and gave a minimum reading of 10kg

  32. Competence in STATA Best results in this course if you already know how to use STATA competently. Check you know how to Get data into STATA (use and using commands) Manipulate data, (merge, append, rename, drop, save) Describe your data (describe, tabulate, table) Create new variables (gen, egen) Work with subsets of data (if, in, by) Do basic regressions (regress, logit, probit) Run sessions interactively and in batch mode Organise your datasets and do-files so you can find them again. If you can t do these, upgrade your knowledge ASAP! Could enroll in STATA net course 101 Costs $110 ESRC might pay Courses run regularly www.stata.com

  33. SC968 Panel data methods for sociologists Lecture 1, part 2 Introducing Longitudinal Data

  34. Overview Cross-sectional and longitudinal data Types of longitudinal data Types of analysis possible with panel data Data management merging, appending, long and wide forms Simple models using longitudinal data

  35. Cross-sectional and longitudinal data First, draw the distinction between macro- and micro-level data Micro level: firms, individuals Macro level: local authorities, travel-to-work areas, countries, commodity prices Both may exist in cross-sectional or longitudinal forms We are interested in micro-level data But macro-level variables are often used in conjunction with micro-data Cross-sectional data Contains information collected at a given point in time (More strictly, during a given time window) European Social Survey (ESS) Programme for International Student Assessment (PISA) Many cross-sectional surveys are repeated, but on different individuals Longitudinal data Contains repeated observations on the same subjects

  36. Types of longitudinal data Time-series data Eg, commodity prices, exchange rates Repeated interviews at irregular intervals UK cohort studies: NCDS (1958), BCS (1970), MCS (2000) Repeated interviews at regular intervals Panel surveys Usually annual intervals, sometimes two-yearly BHPS, SLID, PSID, SOEP Some surveys have both cross-sectional and panel elements Panels more expensive to collect LFS, EU-SILC both have a rolling panel element Other sources of longitudinal data Retrospective data (eg work or relationship history) Linkage with external data (eg, tax or benefit records) particularly in Scandinavia May be present in both cross-sectional or longitudinal data sets

  37. Analysis with longitudinal data The snapshot versus the movie Essentially, longitudinal data allow us to observe how events evolve Study flows as well as stocks . Example: unemployment Cross-sectional analysis shows steady 5% unemployment rate Does this mean that everyone is unemployed one year out of twenty? That 5% of people are unemployed all the time? Or something in between Very different implications for equality, social policy, etc

  38. The BHPS Interviews about 10,000 adults in about 6,000 households Interviews repeated annually People followed when they move People join the sample if they move in with a sample member Household-level information collected from head of household Individual-level information collected from people aged 17+ Young people aged 11-16 fill in a youth questionnaire BHPS is now part of Understanding Society Much larger and wider-ranging survey 40,000 households Data set used for this course is a 20% sample of BHPS, with selected variables

  39. The BHPS All files prefixed with a letter indicating the year All variables within each file also prefixed with this letter 1991: a 1992: b . and so on Several files each year, containing different information hhsamp information on sample households hhresp household-level information on households that actually responded indall info on all individuals in responding households indresp info on respondents to main questionnaire (adults) egoalt file showing relationship of household members to one another income incomes Extra files each year containing derived variables: Work histories, net income files And others with occasional modules, eg life histories in wave 2 bjobhist blifemst bmarriag bcohabit bchildnt

  40. Some BHPS files 624.3k cindsamp.dta 975.6k cindall.dta 11.0M cindresp.dta 1539.0k chhresp.dta 287.4k chhsamp.dta 1008.9k cincome.dta 542.2k cegoalt.dta 237.8k cjobhist.dta 1675.0k clifejob.dta 616.7k dindsamp.dta 943.7k dindall.dta 11.2M dindresp.dta 1508.9k dhhresp.dta 301.9k dhhsamp.dta 1019.7k dincome.dta 531.8k degoalt.dta 245.0k djobhist.dta 129.0k dyouth.dta 768.1k aindall.dta 10.7M aindresp.dta 1626.3k ahhresp.dta 330.6k ahhsamp.dta 1066.4k aincome.dta 541.3k aegoalt.dta 303.8k ajobhist.dta Following sample members 635.3k bindsamp.dta 978.2k bindall.dta 11.0M bindresp.dta 1499.7k bhhresp.dta 257.1k bhhsamp.dta 1073.0k bincome.dta 546.5k begoalt.dta 237.8k bjobhist.dta Youth module introduced 1994 Extra modules in Wave 2 23.5k bchildad.dta 284.4k bchildnt.dta 34.3k bcohabit.dta 766.4k blifemst.dta 272.4k bmarriag.dta 4977.3k xwaveid.dta 1027.7k xwlsten.dta Cross-wave identifiers

  41. Person and household identifiers BHPS (along with other panels such as ECHP, SOEP, ECHP) is a household survey so everyone living in sample households becomes a member Need identifiers to 1. Associate the same individual with him- or herself in different waves 2. Link members of same household with each other in the same wave - the HID identifier Note: no such thing as a longitudinal household! Household composition changes, household location changes .. HID is a cross-sectional concept only!

  42. What it looks like: 4 waves of data, sorted by pid and wave. . list pid wave hgsex age jbstat mastat in 1/30, clean Observations in rows, variables in columns. Blue stripes show where one individual ends & another begins p pi id d w wa av ve e h hg gs se ex x a ag ge e j jb bs st ta at t m ma as st ta at t 1. 1 10 00 01 19 90 05 57 7 1 1 f fe em ma al le e 5 59 9 r re et ti ir re ed d n ne ev ve er r m ma a 2. 1 10 00 01 19 90 05 57 7 2 2 f fe em ma al le e 6 60 0 r re et ti ir re ed d n ne ev ve er r m ma a 3. 1 10 00 01 19 90 05 57 7 3 3 f fe em ma al le e 6 61 1 r re et ti ir re ed d n ne ev ve er r m ma a 4. 1 10 00 01 19 90 05 57 7 4 4 f fe em ma al le e 6 62 2 r re et ti ir re ed d n ne ev ve er r m ma a 5. 1 10 00 02 28 80 00 05 5 1 1 m ma al le e 3 30 0 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 6. 1 10 00 02 28 80 00 05 5 2 2 m ma al le e 3 31 1 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 7. 1 10 00 02 28 80 00 05 5 3 3 m ma al le e 3 32 2 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 8. 1 10 00 02 28 80 00 05 5 4 4 m ma al le e 3 33 3 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 9. 1 10 00 04 42 25 57 71 1 1 1 m ma al le e 5 59 9 u un ne em mp pl lo oy y n ne ev ve er r m ma a Not present at 2nd wave 10. 1 10 00 04 42 25 57 71 1 3 3 m ma al le e 6 60 0 l lt t s si ic ck k, , n ne ev ve er r m ma a 11. 1 10 00 04 42 25 57 71 1 4 4 m ma al le e 6 62 2 r re et ti ir re ed d n ne ev ve er r m ma a 12. 1 10 00 05 51 15 53 38 8 1 1 f fe em ma al le e 2 22 2 u un ne em mp pl lo oy y n ne ev ve er r m ma a 13. 1 10 00 05 51 15 53 38 8 2 2 f fe em ma al le e 2 23 3 f fa am mi il ly y c c n ne ev ve er r m ma a 14. 1 10 00 05 51 15 53 38 8 3 3 f fe em ma al le e 2 24 4 u un ne em mp pl lo oy y n ne ev ve er r m ma a 15. 1 10 00 05 51 15 53 38 8 4 4 f fe em ma al le e 2 25 5 f fa am mi il ly y c c n ne ev ve er r m ma a 16. 1 10 00 05 51 15 56 62 2 1 1 f fe em ma al le e 4 4 . . . . 17. 1 10 00 05 51 15 56 62 2 2 2 f fe em ma al le e 5 5 . . . . A child, so no data on job or marital status 18. 1 10 00 05 51 15 56 62 2 3 3 f fe em ma al le e 6 6 . . . . 19. 1 10 00 05 51 15 56 62 2 4 4 f fe em ma al le e 7 7 . . . . 20. 1 10 00 05 59 93 37 77 7 1 1 f fe em ma al le e 4 46 6 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 21. 1 10 00 05 59 93 37 77 7 2 2 f fe em ma al le e 4 47 7 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 22. 1 10 00 05 59 93 37 77 7 3 3 f fe em ma al le e 4 48 8 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 23. 1 10 00 05 59 93 37 77 7 4 4 f fe em ma al le e 4 49 9 s se el lf f- -e em mp p n ne ev ve er r m ma a Surveyed twice in 70th 24. 1 10 00 06 64 49 96 66 6 1 1 m ma al le e 7 70 0 r re et ti ir re ed d w wi id do ow we ed d 25. 1 10 00 06 64 49 96 66 6 2 2 m ma al le e 7 70 0 r re et ti ir re ed d w wi id do ow we ed d 26. 1 10 00 06 64 49 96 66 6 3 3 m ma al le e 7 71 1 r re et ti ir re ed d w wi id do ow we ed d 27. 1 10 00 06 64 49 96 66 6 4 4 m ma al le e 7 72 2 r re et ti ir re ed d w wi id do ow we ed d 28. 1 10 00 07 76 61 16 66 6 1 1 f fe em ma al le e 7 77 7 r re et ti ir re ed d w wi id do ow we ed d 29. 1 10 00 07 76 61 16 66 6 2 2 f fe em ma al le e 7 78 8 r re et ti ir re ed d w wi id do ow we ed d 30. 1 10 00 07 76 61 16 66 6 3 3 f fe em ma al le e 7 79 9 r re et ti ir re ed d w wi id do ow we ed d

  43. (Can also use ,nol option) . list pid wave hgsex age jbstat mastat in 1/30, clean nol p pi id d w wa av ve e h hg gs se ex x a ag ge e j jb bs st ta at t m ma as st ta at t 1. 1 10 00 01 19 90 05 57 7 1 1 2 2 5 59 9 4 4 6 6 2. 1 10 00 01 19 90 05 57 7 2 2 2 2 6 60 0 4 4 6 6 3. 1 10 00 01 19 90 05 57 7 3 3 2 2 6 61 1 4 4 6 6 4. 1 10 00 01 19 90 05 57 7 4 4 2 2 6 62 2 4 4 6 6 5. 1 10 00 02 28 80 00 05 5 1 1 1 1 3 30 0 2 2 6 6 6. 1 10 00 02 28 80 00 05 5 2 2 1 1 3 31 1 2 2 6 6 7. 1 10 00 02 28 80 00 05 5 3 3 1 1 3 32 2 2 2 6 6 8. 1 10 00 02 28 80 00 05 5 4 4 1 1 3 33 3 2 2 6 6 9. 1 10 00 04 42 25 57 71 1 1 1 1 1 5 59 9 3 3 6 6 10. 1 10 00 04 42 25 57 71 1 3 3 1 1 6 60 0 8 8 6 6 11. 1 10 00 04 42 25 57 71 1 4 4 1 1 6 62 2 4 4 6 6 12. 1 10 00 05 51 15 53 38 8 1 1 2 2 2 22 2 3 3 6 6 13. 1 10 00 05 51 15 53 38 8 2 2 2 2 2 23 3 6 6 6 6 14. 1 10 00 05 51 15 53 38 8 3 3 2 2 2 24 4 3 3 6 6 15. 1 10 00 05 51 15 53 38 8 4 4 2 2 2 25 5 6 6 6 6 16. 1 10 00 05 51 15 56 62 2 1 1 2 2 4 4 . . . . 17. 1 10 00 05 51 15 56 62 2 2 2 2 2 5 5 . . . . 18. 1 10 00 05 51 15 56 62 2 3 3 2 2 6 6 . . . . 19. 1 10 00 05 51 15 56 62 2 4 4 2 2 7 7 . . . . 20. 1 10 00 05 59 93 37 77 7 1 1 2 2 4 46 6 2 2 6 6 21. 1 10 00 05 59 93 37 77 7 2 2 2 2 4 47 7 2 2 6 6 22. 1 10 00 05 59 93 37 77 7 3 3 2 2 4 48 8 2 2 6 6 23. 1 10 00 05 59 93 37 77 7 4 4 2 2 4 49 9 1 1 6 6 24. 1 10 00 06 64 49 96 66 6 1 1 1 1 7 70 0 4 4 3 3 25. 1 10 00 06 64 49 96 66 6 2 2 1 1 7 70 0 4 4 3 3 26. 1 10 00 06 64 49 96 66 6 3 3 1 1 7 71 1 4 4 3 3 27. 1 10 00 06 64 49 96 66 6 4 4 1 1 7 72 2 4 4 3 3 28. 1 10 00 07 76 61 16 66 6 1 1 2 2 7 77 7 4 4 3 3 29. 1 10 00 07 76 61 16 66 6 2 2 2 2 7 78 8 4 4 3 3 30. 1 10 00 07 76 61 16 66 6 3 3 2 2 7 79 9 4 4 3 3

  44. Joining data sets together p pi id d w wa av ve e h hg gs se ex x a ag ge e j jb bs st ta at t m ma as st ta at t h hl lg gh hq q1 1 h hl ls st ta at t p pi id d w wa av ve e h hg gs se ex x a ag ge e j jb bs st ta at t m ma as st ta at t h hl lg gh hq q1 1 h hl ls st ta at t p pi id d w wa av ve e h hg gs se ex x a ag ge e j jb bs st ta at t m ma as st ta at t h hl lg gh hq q1 1 h hl ls st ta at t 1. 1 10 00 01 19 90 05 57 7 1 1 f fe em ma al le e 5 59 9 r re et ti ir re ed d n ne ev ve er r m ma a 7 7 e ex xc ce el ll le en n 1. 1 10 00 01 19 90 05 57 7 1 1 f fe em ma al le e 5 59 9 r re et ti ir re ed d n ne ev ve er r m ma a 7 7 e ex xc ce el ll le en n 1. 1 10 00 01 19 90 05 57 7 1 1 f fe em ma al le e 5 59 9 r re et ti ir re ed d n ne ev ve er r m ma a 7 7 e ex xc ce el ll le en n 2. 1 10 00 01 19 90 05 57 7 2 2 f fe em ma al le e 6 60 0 r re et ti ir re ed d n ne ev ve er r m ma a 1 12 2 e ex xc ce el ll le en n 2. 1 10 00 01 19 90 05 57 7 2 2 f fe em ma al le e 6 60 0 r re et ti ir re ed d n ne ev ve er r m ma a 1 12 2 e ex xc ce el ll le en n 2. 1 10 00 01 19 90 05 57 7 2 2 f fe em ma al le e 6 60 0 r re et ti ir re ed d n ne ev ve er r m ma a 1 12 2 e ex xc ce el ll le en n 3. 1 10 00 01 19 90 05 57 7 3 3 f fe em ma al le e 6 61 1 r re et ti ir re ed d n ne ev ve er r m ma a 1 10 0 e ex xc ce el ll le en n 3. 1 10 00 01 19 90 05 57 7 3 3 f fe em ma al le e 6 61 1 r re et ti ir re ed d n ne ev ve er r m ma a 1 10 0 e ex xc ce el ll le en n 3. 1 10 00 01 19 90 05 57 7 3 3 f fe em ma al le e 6 61 1 r re et ti ir re ed d n ne ev ve er r m ma a 1 10 0 e ex xc ce el ll le en n 4. 1 10 00 01 19 90 05 57 7 4 4 f fe em ma al le e 6 62 2 r re et ti ir re ed d n ne ev ve er r m ma a 1 11 1 e ex xc ce el ll le en n 4. 1 10 00 01 19 90 05 57 7 4 4 f fe em ma al le e 6 62 2 r re et ti ir re ed d n ne ev ve er r m ma a 1 11 1 e ex xc ce el ll le en n 4. 1 10 00 01 19 90 05 57 7 4 4 f fe em ma al le e 6 62 2 r re et ti ir re ed d n ne ev ve er r m ma a 1 11 1 e ex xc ce el ll le en n 5. 1 10 00 02 28 80 00 05 5 1 1 m ma al le e 3 30 0 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 7 7 e ex xc ce el ll le en n 5. 1 10 00 02 28 80 00 05 5 1 1 m ma al le e 3 30 0 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 7 7 e ex xc ce el ll le en n 5. 1 10 00 02 28 80 00 05 5 1 1 m ma al le e 3 30 0 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 7 7 e ex xc ce el ll le en n Adding extra variables: merge command 6. 1 10 00 02 28 80 00 05 5 2 2 m ma al le e 3 31 1 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 8 8 f fa ai ir r 6. 1 10 00 02 28 80 00 05 5 2 2 m ma al le e 3 31 1 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 8 8 f fa ai ir r 6. 1 10 00 02 28 80 00 05 5 2 2 m ma al le e 3 31 1 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 8 8 f fa ai ir r 7. 1 10 00 02 28 80 00 05 5 3 3 m ma al le e 3 32 2 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 12 2 f fa ai ir r 7. 1 10 00 02 28 80 00 05 5 3 3 m ma al le e 3 32 2 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 12 2 f fa ai ir r 7. 1 10 00 02 28 80 00 05 5 3 3 m ma al le e 3 32 2 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 12 2 f fa ai ir r 8. 1 10 00 02 28 80 00 05 5 4 4 m ma al le e 3 33 3 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 7 7 g go oo od d 8. 1 10 00 02 28 80 00 05 5 4 4 m ma al le e 3 33 3 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 7 7 g go oo od d 8. 1 10 00 02 28 80 00 05 5 4 4 m ma al le e 3 33 3 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 7 7 g go oo od d 9. 1 10 00 04 42 25 57 71 1 1 1 m ma al le e 5 59 9 u un ne em mp pl lo oy y n ne ev ve er r m ma a 1 11 1 f fa ai ir r 9. 1 10 00 04 42 25 57 71 1 1 1 m ma al le e 5 59 9 u un ne em mp pl lo oy y n ne ev ve er r m ma a 1 11 1 f fa ai ir r 9. 1 10 00 04 42 25 57 71 1 1 1 m ma al le e 5 59 9 u un ne em mp pl lo oy y n ne ev ve er r m ma a 1 11 1 f fa ai ir r 10. 1 10 00 04 42 25 57 71 1 3 3 m ma al le e 6 60 0 l lt t s si ic ck k, , n ne ev ve er r m ma a 7 7 g go oo od d 10. 1 10 00 04 42 25 57 71 1 3 3 m ma al le e 6 60 0 l lt t s si ic ck k, , n ne ev ve er r m ma a 7 7 g go oo od d 10. 1 10 00 04 42 25 57 71 1 3 3 m ma al le e 6 60 0 l lt t s si ic ck k, , n ne ev ve er r m ma a 7 7 g go oo od d 11. 1 10 00 04 42 25 57 71 1 4 4 m ma al le e 6 62 2 r re et ti ir re ed d n ne ev ve er r m ma a 6 6 f fa ai ir r 11. 1 10 00 04 42 25 57 71 1 4 4 m ma al le e 6 62 2 r re et ti ir re ed d n ne ev ve er r m ma a 6 6 f fa ai ir r 11. 1 10 00 04 42 25 57 71 1 4 4 m ma al le e 6 62 2 r re et ti ir re ed d n ne ev ve er r m ma a 6 6 f fa ai ir r 12. 1 10 00 05 51 15 53 38 8 1 1 f fe em ma al le e 2 22 2 u un ne em mp pl lo oy y n ne ev ve er r m ma a 1 11 1 g go oo od d 12. 1 10 00 05 51 15 53 38 8 1 1 f fe em ma al le e 2 22 2 u un ne em mp pl lo oy y n ne ev ve er r m ma a 1 11 1 g go oo od d 12. 1 10 00 05 51 15 53 38 8 1 1 f fe em ma al le e 2 22 2 u un ne em mp pl lo oy y n ne ev ve er r m ma a 1 11 1 g go oo od d 13. 1 10 00 05 51 15 53 38 8 2 2 f fe em ma al le e 2 23 3 f fa am mi il ly y c c n ne ev ve er r m ma a 6 6 e ex xc ce el ll le en n 13. 1 10 00 05 51 15 53 38 8 2 2 f fe em ma al le e 2 23 3 f fa am mi il ly y c c n ne ev ve er r m ma a 6 6 e ex xc ce el ll le en n 13. 1 10 00 05 51 15 53 38 8 2 2 f fe em ma al le e 2 23 3 f fa am mi il ly y c c n ne ev ve er r m ma a 6 6 e ex xc ce el ll le en n 14. 1 10 00 05 51 15 53 38 8 3 3 f fe em ma al le e 2 24 4 u un ne em mp pl lo oy y n ne ev ve er r m ma a 8 8 e ex xc ce el ll le en n 14. 1 10 00 05 51 15 53 38 8 3 3 f fe em ma al le e 2 24 4 u un ne em mp pl lo oy y n ne ev ve er r m ma a 8 8 e ex xc ce el ll le en n 14. 1 10 00 05 51 15 53 38 8 3 3 f fe em ma al le e 2 24 4 u un ne em mp pl lo oy y n ne ev ve er r m ma a 8 8 e ex xc ce el ll le en n 15. 1 10 00 05 51 15 53 38 8 4 4 f fe em ma al le e 2 25 5 f fa am mi il ly y c c n ne ev ve er r m ma a 1 10 0 g go oo od d 15. 1 10 00 05 51 15 53 38 8 4 4 f fe em ma al le e 2 25 5 f fa am mi il ly y c c n ne ev ve er r m ma a 1 10 0 g go oo od d 15. 1 10 00 05 51 15 53 38 8 4 4 f fe em ma al le e 2 25 5 f fa am mi il ly y c c n ne ev ve er r m ma a 1 10 0 g go oo od d 16. 1 10 00 05 51 15 56 62 2 1 1 f fe em ma al le e 4 4 . . . . . . . . 16. 1 10 00 05 51 15 56 62 2 1 1 f fe em ma al le e 4 4 . . . . . . . . 16. 1 10 00 05 51 15 56 62 2 1 1 f fe em ma al le e 4 4 . . . . . . . . 17. 1 10 00 05 51 15 56 62 2 2 2 f fe em ma al le e 5 5 . . . . . . . . 17. 1 10 00 05 51 15 56 62 2 2 2 f fe em ma al le e 5 5 . . . . . . . . 17. 1 10 00 05 51 15 56 62 2 2 2 f fe em ma al le e 5 5 . . . . . . . . 18. 1 10 00 05 51 15 56 62 2 3 3 f fe em ma al le e 6 6 . . . . . . . . 18. 1 10 00 05 51 15 56 62 2 3 3 f fe em ma al le e 6 6 . . . . . . . . 18. 1 10 00 05 51 15 56 62 2 3 3 f fe em ma al le e 6 6 . . . . . . . . 19. 1 10 00 05 51 15 56 62 2 4 4 f fe em ma al le e 7 7 . . . . . . . . 19. 1 10 00 05 51 15 56 62 2 4 4 f fe em ma al le e 7 7 . . . . . . . . 19. 1 10 00 05 51 15 56 62 2 4 4 f fe em ma al le e 7 7 . . . . . . . . 20. 1 10 00 05 59 93 37 77 7 1 1 f fe em ma al le e 4 46 6 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 12 2 f fa ai ir r 20. 1 10 00 05 59 93 37 77 7 1 1 f fe em ma al le e 4 46 6 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 12 2 f fa ai ir r 20. 1 10 00 05 59 93 37 77 7 1 1 f fe em ma al le e 4 46 6 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 12 2 f fa ai ir r 21. 1 10 00 05 59 93 37 77 7 2 2 f fe em ma al le e 4 47 7 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 10 0 g go oo od d 21. 1 10 00 05 59 93 37 77 7 2 2 f fe em ma al le e 4 47 7 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 10 0 g go oo od d 21. 1 10 00 05 59 93 37 77 7 2 2 f fe em ma al le e 4 47 7 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 10 0 g go oo od d 22. 1 10 00 05 59 93 37 77 7 3 3 f fe em ma al le e 4 48 8 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 14 4 f fa ai ir r 22. 1 10 00 05 59 93 37 77 7 3 3 f fe em ma al le e 4 48 8 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 14 4 f fa ai ir r 22. 1 10 00 05 59 93 37 77 7 3 3 f fe em ma al le e 4 48 8 e em mp pl lo oy ye ed d n ne ev ve er r m ma a 1 14 4 f fa ai ir r 23. 1 10 00 05 59 93 37 77 7 4 4 f fe em ma al le e 4 49 9 s se el lf f- -e em mp p n ne ev ve er r m ma a 1 17 7 p po oo or r 23. 1 10 00 05 59 93 37 77 7 4 4 f fe em ma al le e 4 49 9 s se el lf f- -e em mp p n ne ev ve er r m ma a 1 17 7 p po oo or r 23. 1 10 00 05 59 93 37 77 7 4 4 f fe em ma al le e 4 49 9 s se el lf f- -e em mp p n ne ev ve er r m ma a 1 17 7 p po oo or r 24. 1 10 00 06 64 49 96 66 6 1 1 m ma al le e 7 70 0 r re et ti ir re ed d w wi id do ow we ed d m mi is ss si in ng g f fa ai ir r 24. 1 10 00 06 64 49 96 66 6 1 1 m ma al le e 7 70 0 r re et ti ir re ed d w wi id do ow we ed d m mi is ss si in ng g f fa ai ir r 24. 1 10 00 06 64 49 96 66 6 1 1 m ma al le e 7 70 0 r re et ti ir re ed d w wi id do ow we ed d m mi is ss si in ng g f fa ai ir r 25. 1 10 00 06 64 49 96 66 6 2 2 m ma al le e 7 70 0 r re et ti ir re ed d w wi id do ow we ed d m mi is ss si in ng g g go oo od d 25. 1 10 00 06 64 49 96 66 6 2 2 m ma al le e 7 70 0 r re et ti ir re ed d w wi id do ow we ed d m mi is ss si in ng g g go oo od d 25. 1 10 00 06 64 49 96 66 6 2 2 m ma al le e 7 70 0 r re et ti ir re ed d w wi id do ow we ed d m mi is ss si in ng g g go oo od d 26. 1 10 00 06 64 49 96 66 6 3 3 m ma al le e 7 71 1 r re et ti ir re ed d w wi id do ow we ed d 1 18 8 g go oo od d 26. 1 10 00 06 64 49 96 66 6 3 3 m ma al le e 7 71 1 r re et ti ir re ed d w wi id do ow we ed d 1 18 8 g go oo od d 26. 1 10 00 06 64 49 96 66 6 3 3 m ma al le e 7 71 1 r re et ti ir re ed d w wi id do ow we ed d 1 18 8 g go oo od d 27. 1 10 00 06 64 49 96 66 6 4 4 m ma al le e 7 72 2 r re et ti ir re ed d w wi id do ow we ed d 1 17 7 p po oo or r 27. 1 10 00 06 64 49 96 66 6 4 4 m ma al le e 7 72 2 r re et ti ir re ed d w wi id do ow we ed d 1 17 7 p po oo or r 27. 1 10 00 06 64 49 96 66 6 4 4 m ma al le e 7 72 2 r re et ti ir re ed d w wi id do ow we ed d 1 17 7 p po oo or r 28. 1 10 00 07 76 61 16 66 6 1 1 f fe em ma al le e 7 77 7 r re et ti ir re ed d w wi id do ow we ed d 6 6 e ex xc ce el ll le en n 28. 1 10 00 07 76 61 16 66 6 1 1 f fe em ma al le e 7 77 7 r re et ti ir re ed d w wi id do ow we ed d 6 6 e ex xc ce el ll le en n 28. 1 10 00 07 76 61 16 66 6 1 1 f fe em ma al le e 7 77 7 r re et ti ir re ed d w wi id do ow we ed d 6 6 e ex xc ce el ll le en n 29. 1 10 00 07 76 61 16 66 6 2 2 f fe em ma al le e 7 78 8 r re et ti ir re ed d w wi id do ow we ed d 7 7 e ex xc ce el ll le en n 29. 1 10 00 07 76 61 16 66 6 2 2 f fe em ma al le e 7 78 8 r re et ti ir re ed d w wi id do ow we ed d 7 7 e ex xc ce el ll le en n 29. 1 10 00 07 76 61 16 66 6 2 2 f fe em ma al le e 7 78 8 r re et ti ir re ed d w wi id do ow we ed d 7 7 e ex xc ce el ll le en n 30. 1 10 00 07 76 61 16 66 6 3 3 f fe em ma al le e 7 79 9 r re et ti ir re ed d w wi id do ow we ed d 7 7 e ex xc ce el ll le en n 30. 1 10 00 07 76 61 16 66 6 3 3 f fe em ma al le e 7 79 9 r re et ti ir re ed d w wi id do ow we ed d 7 7 e ex xc ce el ll le en n 30. 1 10 00 07 76 61 16 66 6 3 3 f fe em ma al le e 7 79 9 r re et ti ir re ed d w wi id do ow we ed d 7 7 e ex xc ce el ll le en n 31. 1 10 00 07 76 61 16 66 6 4 4 f fe em ma al le e 7 79 9 r re et ti ir re ed d w wi id do ow we ed d p pr ro ox xy y r re e e ex xc ce el ll le en n 31. 1 10 00 07 76 61 16 66 6 4 4 f fe em ma al le e 7 79 9 r re et ti ir re ed d w wi id do ow we ed d p pr ro ox xy y r re e e ex xc ce el ll le en n 31. 1 10 00 07 76 61 16 66 6 4 4 f fe em ma al le e 7 79 9 r re et ti ir re ed d w wi id do ow we ed d p pr ro ox xy y r re e e ex xc ce el ll le en n 32. 1 10 00 08 81 17 76 63 3 1 1 m ma al le e 7 71 1 . . . . . . . . 32. 1 10 00 08 81 17 76 63 3 1 1 m ma al le e 7 71 1 . . . . . . . . 32. 1 10 00 08 81 17 76 63 3 1 1 m ma al le e 7 71 1 . . . . . . . . 33. 1 10 00 08 81 17 76 63 3 2 2 m ma al le e 7 72 2 . . . . . . . . 33. 1 10 00 08 81 17 76 63 3 2 2 m ma al le e 7 72 2 . . . . . . . . 33. 1 10 00 08 81 17 76 63 3 2 2 m ma al le e 7 72 2 . . . . . . . . 34. 1 10 00 08 81 17 76 63 3 3 3 m ma al le e 7 73 3 . . . . . . . . 34. 1 10 00 08 81 17 76 63 3 3 3 m ma al le e 7 73 3 . . . . . . . . 34. 1 10 00 08 81 17 76 63 3 3 3 m ma al le e 7 73 3 . . . . . . . . 35. 1 10 00 08 81 17 76 63 3 4 4 m ma al le e 7 74 4 . . . . . . . . 35. 1 10 00 08 81 17 76 63 3 4 4 m ma al le e 7 74 4 . . . . . . . . 35. 1 10 00 08 81 17 76 63 3 4 4 m ma al le e 7 74 4 . . . . . . . . Adding extra observations: append command 36. 1 10 00 08 81 17 79 98 8 1 1 f fe em ma al le e 7 72 2 r re et ti ir re ed d m ma ar rr ri ie ed d 8 8 g go oo od d 36. 1 10 00 08 81 17 79 98 8 1 1 f fe em ma al le e 7 72 2 r re et ti ir re ed d m ma ar rr ri ie ed d 8 8 g go oo od d 36. 1 10 00 08 81 17 79 98 8 1 1 f fe em ma al le e 7 72 2 r re et ti ir re ed d m ma ar rr ri ie ed d 8 8 g go oo od d 37. 1 10 00 08 81 17 79 98 8 2 2 f fe em ma al le e 7 73 3 r re et ti ir re ed d m ma ar rr ri ie ed d 5 5 e ex xc ce el ll le en n 37. 1 10 00 08 81 17 79 98 8 2 2 f fe em ma al le e 7 73 3 r re et ti ir re ed d m ma ar rr ri ie ed d 5 5 e ex xc ce el ll le en n 37. 1 10 00 08 81 17 79 98 8 2 2 f fe em ma al le e 7 73 3 r re et ti ir re ed d m ma ar rr ri ie ed d 5 5 e ex xc ce el ll le en n 38. 1 10 00 08 81 17 79 98 8 3 3 f fe em ma al le e 7 74 4 r re et ti ir re ed d m ma ar rr ri ie ed d 7 7 e ex xc ce el ll le en n 38. 1 10 00 08 81 17 79 98 8 3 3 f fe em ma al le e 7 74 4 r re et ti ir re ed d m ma ar rr ri ie ed d 7 7 e ex xc ce el ll le en n 38. 1 10 00 08 81 17 79 98 8 3 3 f fe em ma al le e 7 74 4 r re et ti ir re ed d m ma ar rr ri ie ed d 7 7 e ex xc ce el ll le en n 39. 1 10 00 08 81 17 79 98 8 4 4 f fe em ma al le e 7 75 5 r re et ti ir re ed d m ma ar rr ri ie ed d m mi is ss si in ng g e ex xc ce el ll le en n 39. 1 10 00 08 81 17 79 98 8 4 4 f fe em ma al le e 7 75 5 r re et ti ir re ed d m ma ar rr ri ie ed d m mi is ss si in ng g e ex xc ce el ll le en n 39. 1 10 00 08 81 17 79 98 8 4 4 f fe em ma al le e 7 75 5 r re et ti ir re ed d m ma ar rr ri ie ed d m mi is ss si in ng g e ex xc ce el ll le en n 40. 1 10 00 09 91 18 83 31 1 1 1 m ma al le e 4 49 9 . . . . . . . . 40. 1 10 00 09 91 18 83 31 1 1 1 m ma al le e 4 49 9 . . . . . . . . 40. 1 10 00 09 91 18 83 31 1 1 1 m ma al le e 4 49 9 . . . . . . . . 41. 1 10 00 09 91 18 83 31 1 2 2 m ma al le e 5 50 0 . . . . . . . . 41. 1 10 00 09 91 18 83 31 1 2 2 m ma al le e 5 50 0 . . . . . . . . 41. 1 10 00 09 91 18 83 31 1 2 2 m ma al le e 5 50 0 . . . . . . . . 42. 1 10 00 09 91 18 83 31 1 3 3 m ma al le e 5 50 0 . . . . . . . . 42. 1 10 00 09 91 18 83 31 1 3 3 m ma al le e 5 50 0 . . . . . . . . 42. 1 10 00 09 91 18 83 31 1 3 3 m ma al le e 5 50 0 . . . . . . . . 43. 1 10 00 09 91 18 83 31 1 4 4 m ma al le e 5 51 1 . . . . . . . . 43. 1 10 00 09 91 18 83 31 1 4 4 m ma al le e 5 51 1 . . . . . . . . 43. 1 10 00 09 91 18 83 31 1 4 4 m ma al le e 5 51 1 . . . . . . . . 44. 1 10 00 09 91 18 86 66 6 1 1 f fe em ma al le e 4 48 8 m ma at te er rn ni it t m ma ar rr ri ie ed d 1 17 7 g go oo od d 44. 1 10 00 09 91 18 86 66 6 1 1 f fe em ma al le e 4 48 8 m ma at te er rn ni it t m ma ar rr ri ie ed d 1 17 7 g go oo od d 44. 1 10 00 09 91 18 86 66 6 1 1 f fe em ma al le e 4 48 8 m ma at te er rn ni it t m ma ar rr ri ie ed d 1 17 7 g go oo od d 45. 1 10 00 09 91 18 86 66 6 2 2 f fe em ma al le e 4 48 8 e em mp pl lo oy ye ed d m ma ar rr ri ie ed d 1 11 1 g go oo od d 45. 1 10 00 09 91 18 86 66 6 2 2 f fe em ma al le e 4 48 8 e em mp pl lo oy ye ed d m ma ar rr ri ie ed d 1 11 1 g go oo od d 45. 1 10 00 09 91 18 86 66 6 2 2 f fe em ma al le e 4 48 8 e em mp pl lo oy ye ed d m ma ar rr ri ie ed d 1 11 1 g go oo od d 46. 1 10 00 09 91 18 86 66 6 3 3 f fe em ma al le e 4 49 9 e em mp pl lo oy ye ed d m ma ar rr ri ie ed d p pr ro ox xy y r re e g go oo od d 46. 1 10 00 09 91 18 86 66 6 3 3 f fe em ma al le e 4 49 9 e em mp pl lo oy ye ed d m ma ar rr ri ie ed d p pr ro ox xy y r re e g go oo od d 46. 1 10 00 09 91 18 86 66 6 3 3 f fe em ma al le e 4 49 9 e em mp pl lo oy ye ed d m ma ar rr ri ie ed d p pr ro ox xy y r re e g go oo od d 47. 1 10 00 09 91 18 86 66 6 4 4 f fe em ma al le e 5 50 0 f fa am mi il ly y c c m ma ar rr ri ie ed d m mi is ss si in ng g g go oo od d 47. 1 10 00 09 91 18 86 66 6 4 4 f fe em ma al le e 5 50 0 f fa am mi il ly y c c m ma ar rr ri ie ed d m mi is ss si in ng g g go oo od d 47. 1 10 00 09 91 18 86 66 6 4 4 f fe em ma al le e 5 50 0 f fa am mi il ly y c c m ma ar rr ri ie ed d m mi is ss si in ng g g go oo od d 48. 1 10 00 09 91 19 90 04 4 1 1 m ma al le e 1 11 1 . . . . . . . . 48. 1 10 00 09 91 19 90 04 4 1 1 m ma al le e 1 11 1 . . . . . . . . 48. 1 10 00 09 91 19 90 04 4 1 1 m ma al le e 1 11 1 . . . . . . . . 49. 1 10 00 09 91 19 90 04 4 2 2 m ma al le e 1 11 1 . . . . . . . . 49. 1 10 00 09 91 19 90 04 4 2 2 m ma al le e 1 11 1 . . . . . . . . 49. 1 10 00 09 91 19 90 04 4 2 2 m ma al le e 1 11 1 . . . . . . . . 50. 1 10 00 09 91 19 90 04 4 3 3 m ma al le e 1 12 2 . . . . . . . . 50. 1 10 00 09 91 19 90 04 4 3 3 m ma al le e 1 12 2 . . . . . . . . 50. 1 10 00 09 91 19 90 04 4 3 3 m ma al le e 1 12 2 . . . . . . . .

  45. Whether appending or merging Whether appending or merging The data set you are using at the time is called the master data The data set you want to merge it with is called the using data Make sure you can identify observations properly beforehand Make sure you can identify observations uniquely afterwards

  46. Appending Use this command to add more observations Relatively easy Check first that you are really adding observations you don t already have (or that if you are adding duplicates, you really want to do this) Syntax: append using using_data STATA simply sticks the using data on the end of the master data STATA re-orders the variables if necessary. If the using data contain variables not present in the master data, STATA sets the values of these variables to missing in the using data (and vice versa if the master data contains variables not present in the using data)

  47. Merging is more complicated Use merge to add more variables to a data set Master data: age.dta pid wave age 28005 1 19057 1 28005 2 19057 3 19057 4 28005 4 Using data: sex.dta pid wave sex 19057 1 19057 3 28005 1 28005 2 28005 4 42571 1 42571 3 30 59 31 61 62 33 female female male male male male male Notice that both data sets don t contain the same observations Merge 1:1 pid wave using sex pid 19057 19057 19057 28005 28005 28005 42571 42571 wave 1 3 4 1 2 4 1 3 age sex 59 61 62 30 31 33 . . _merge 3 3 1 3 3 3 2 2 female female . male male male male male

  48. Merging STATA creates a variable called _merge after merging 1: observation in master but not using data 2: observation in using but not master data 3: observation in both data sets Options available for discarding some observations see manual

  49. More on merging Previous example showed one-to-one merging Not every observation was in both data sets, but every observation in the master data was matched with a maximum of only one observation in the using data and vice versa. Many-to-one merging: Household-level data sets contain only one observation per household (usually <1 per person) Regional data (eg, regional unemployment data), usually one observation per region Sample syntax: merge m:1 hid wave using hhinc_data hid pid age 1604 19057 59 2341 28005 30 3569 42571 59 4301 51538 22 4301 51562 4 4956 59377 46 5421 64966 70 6363 76166 77 6827 81763 71 6827 81798 72 hid h/h income 1604 780 2341 1501 3569 268 4301 394 4956 1601 5421 225 6363 411 6827 743 hid pid age h/h income 1604 19057 59 780 2341 28005 30 1501 3569 42571 59 268 4301 51538 22 394 4301 51562 4 394 4956 59377 46 1601 5421 64966 70 225 6363 76166 77 411 6827 81763 71 743 6827 81798 72 743 One-to-many merging Job and relationship files contain one observation per episode (potentially >1 per person) Income files contain one observation per source of income (potentially >1 per person) Sample syntax: merge 1:m pid wave using births_data

  50. Long and wide forms The data we have here is in long form One row for each person/wave combination From a few slides back: p pi id d w wa av ve e h hg gs se ex x a ag ge e 1. 1 10 00 01 19 90 05 57 7 1 1 f fe em ma al le e 5 59 9 2. 1 10 00 01 19 90 05 57 7 2 2 f fe em ma al le e 6 60 0 3. 1 10 00 01 19 90 05 57 7 3 3 f fe em ma al le e 6 61 1 4. 1 10 00 01 19 90 05 57 7 4 4 f fe em ma al le e 6 62 2 5. 1 10 00 02 28 80 00 05 5 1 1 m ma al le e 3 30 0 6. 1 10 00 02 28 80 00 05 5 2 2 m ma al le e 3 31 1 7. 1 10 00 02 28 80 00 05 5 3 3 m ma al le e 3 32 2 8. 1 10 00 02 28 80 00 05 5 4 4 m ma al le e 3 33 3 9. 1 10 00 04 42 25 57 71 1 1 1 m ma al le e 5 59 9 10. 1 10 00 04 42 25 57 71 1 3 3 m ma al le e 6 60 0 11. 1 10 00 04 42 25 57 71 1 4 4 m ma al le e 6 62 2 12. 1 10 00 05 51 15 53 38 8 1 1 f fe em ma al le e 2 22 2 13. 1 10 00 05 51 15 53 38 8 2 2 f fe em ma al le e 2 23 3 14. 1 10 00 05 51 15 53 38 8 3 3 f fe em ma al le e 2 24 4 15. 1 10 00 05 51 15 53 38 8 4 4 f fe em ma al le e 2 25 5 16. 1 10 00 05 51 15 56 62 2 1 1 f fe em ma al le e 4 4 17. 1 10 00 05 51 15 56 62 2 2 2 f fe em ma al le e 5 5 18. 1 10 00 05 51 15 56 62 2 3 3 f fe em ma al le e 6 6 19. 1 10 00 05 51 15 56 62 2 4 4 f fe em ma al le e 7 7 20. 1 10 00 05 59 93 37 77 7 1 1 f fe em ma al le e 4 46 6 21. 1 10 00 05 59 93 37 77 7 2 2 f fe em ma al le e 4 47 7 22. 1 10 00 05 59 93 37 77 7 3 3 f fe em ma al le e 4 48 8 23. 1 10 00 05 59 93 37 77 7 4 4 f fe em ma al le e 4 49 9 24. 1 10 00 06 64 49 96 66 6 1 1 m ma al le e 7 70 0 25. 1 10 00 06 64 49 96 66 6 2 2 m ma al le e 7 70 0 26. 1 10 00 06 64 49 96 66 6 3 3 m ma al le e 7 71 1 27. 1 10 00 06 64 49 96 66 6 4 4 m ma al le e 7 72 2 28. 1 10 00 07 76 61 16 66 6 1 1 f fe em ma al le e 7 77 7 29. 1 10 00 07 76 61 16 66 6 2 2 f fe em ma al le e 7 78 8 30. 1 10 00 07 76 61 16 66 6 3 3 f fe em ma al le e 7 79 9

Related


More Related Content