
Logistic Regression: A Comprehensive Overview
Explore the fundamental concepts of logistic regression, including dichotomous response variables, the logit transformation, logistic regression model, effect measures, and more. Gain insights into how this statistical analysis technique is used to predict probabilities and estimate regression coefficients.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Logistic Regression Nina Gunnes December 3, 2020 12/03/2020 Lecture 7 1
Dichotomous response variable Response variable ? with two possible outcomes ? = 1: diseased ? = 0: healthy Related to a binomial experiment of one single trial (i.e., ? = 1) Probability of being diseased: ? ? = 1 = ? Probability of being healthy: ? ? = 0 = 1 ? Following a Bernoulli distribution (binomial distribution with ? = 1) Expected (or mean) value: ? ? = ? 12/03/2020 Lecture 7 2
Logit transformation Also known as the logit link Natural logarithm of the odds: logit ? = ln ?: probability (or risk) of the outcome ? 1 ? : odds of the outcome (e.g., ? = 25% odds = Inverse logit returning the probability ranging from 0 to 1 ? = ?? 1 + ??, where ? = logit ? When logit ? approaching minus infinity: lim When logit ? approaching infinity: lim ? 1 ? 1 3) ? ? = 0 ? ? = 1 12/03/2020 Lecture 7 3
Logistic regression model Relation between a dichotomous outcome and risk factor(s) Using the logit transformation Prevents the model from predicting probabilities < 0 or > 1 General formula: logit ? = ?0+ ?1?1+ ?2?2+ + ???? ?: probability (or risk) of disease ?: number of independent variables (or predictors) ?0: intercept (i.e., log odds when all ?? s are equal to 0) ??: effect of ??on the log odds, where ? = 1,2, ,? 12/03/2020 Lecture 7 4
Logistic regression model, cont. Obtaining expression for the odds by rearranging the formula logit ? = ln 1 ? = ?0+ ?1?1+ + ???? Statistical analysis aiming to estimate the regression coefficients Intercept: ?0 Effects of independent variables: ??, where ? = 1,2, ,? Possible to predict probability for a given combination of risk factors E.g., risk of cardiovascular disease given male, smoker, and 50+ years of age ? ? 1 ?= ??0+?1?1+ +???? 12/03/2020 Lecture 7 5
Effect measure Comparing log odds in two groups Group 1: logit ?1 = ln Group 2: logit ?2 = ln Difference: ln 1 ?1 Regression coefficients corresponding to differences in log odds Usually reported in the form of odds ratios Dichotomous variable ??: ?? = ???(compared to reference group) Continuous variable ??, with ? units increase: ?? = ??? ?= ??? ?1 ?2 1 ?1 1 ?2 ?2 1 ?2 ?1 ?1 ?2 1 ?1 1 ?2 ln = ln = ln ?? ? 12/03/2020 Lecture 7 6
Effect measure, cont. Comparing odds in two groups that differ in one single variable (?2) Odds in group 1: ?1 1 ?1 = ??0+?1?1+?2?21+?3?3+?4?4 Odds in group 2: ?2 1 ?2 = ??0+?1?1+?2?22+?3?3+?4?4 Odds ratio: ?? = ?2 Value of OR giving the direction of difference between two groups ?? < 1: decreased odds in group 1 as compared to group 2 ?? = 1: equal odds in group 1 and group 2 ?? > 1: increased odds in group 1 as compared to group 2 1 ?1 1 ?2=??0+?1?1+?2?21+?3?3+?4?4 ?1 ??0+?1?1+?2?22+?3?3+?4?4= ??2?21 ?22 12/03/2020 Lecture 7 7
Example: Coronary heart disease Data from the Framingham study Association between systolic blood pressure and coronary heart disease Complete data set available at http://www.stat.tamu.edu/~carroll/data.php Considering a random subset of data from the main study Men aged 31 65 years (? = 500) Described in Chapter 3 of the book by Laake et al. (2007) http://www.stat.tamu.edu/~carroll/data.php https://www.med.uio.no/imb/personer/vit/veierod/boker/epidemiologiske-og-kliniske-forskningsmetoder/datafiler/ Data subset available at https://www.med.uio.no/imb/personer/vit/veierod/boker/epidemiologiske- og-kliniske-forskningsmetoder/datafiler/ https://www.med.uio.no/imb/personer/vit/veierod/boker/epidemiologiske-og-kliniske-forskningsmetoder/datafiler/ 12/03/2020 Lecture 7 8
Example: Coronary heart disease, cont. One response variable firstchd: Four explanatory variables smoke: meansbp: cholesterol: age: coronary heart disease (0 no, 1 yes) smoking (0 no, 1 yes) mean systolic blood pressure (mmHg) serum cholesterol (mg/dl) age (years) 12/03/2020 Lecture 7 9
Example: Coronary heart disease, cont. 12/03/2020 Lecture 7 10
Example: Coronary heart disease, cont. 12/03/2020 Lecture 7 11
Example: Coronary heart disease, cont. 12/03/2020 Lecture 7 12
Example: Coronary heart disease, cont. Defining risk of coronary heart disease in two groups Smokers: ?1= 29 377 Non-smokers: ?2= 8 123 Calculating odds ratio of coronary heart disease between the groups ?? =????1 ????2= ?2 Odds of coronary heart disease 20% higher among smokers than non-smokers Statistically significant? ?1 1 ?1 1 ?2= 29 348 8 115=29 115 8 348 1.20 12/03/2020 Lecture 7 13
Logistic regression in Stata Two available commands in Stata logit: reporting coefficients as default (odds ratios also possible!) logistic: reporting odds ratios as default (coefficients also possible!) More information given in Stata s documentation help logit help logistic Many users prefer the logistic command to logit. Results are the same regardless of which you use both are the maximum-likelihood estimator. 12/03/2020 Lecture 7 14
Example: Coronary heart disease, cont. 12/03/2020 Lecture 7 15
Example: Coronary heart disease, cont. 12/03/2020 Lecture 7 16
Example: Coronary heart disease, cont. Crude model, smoke: ?1= 0.18 ?? = ??1= 1.20 (non-sig.!) Odds 19.8% higher among smokers than non-smokers Crude model, meansbp: ?1= 0.02 ?? = ??1= 1.02 Odds increased by 2.5% for each unit increase in mean systolic blood pressure Crude model, cholesterol: ?1= 0.01 ?? = ??1= 1.01 Odds increased by 0.8% for each unit increase in serum cholesterol Crude model, age: ?1= 0.07 ?? = ??1= 1.08 Odds increased by 7.7% for each unit increase in age 12/03/2020 Lecture 7 17
Example: Coronary heart disease, cont. 12/03/2020 Lecture 7 18
Example: Coronary heart disease, cont. Adjusted model, smoke: ?1= 0.41 ?? = ??1= 1.50 (non-sig.!) Odds 50.4% higher among smokers than non-smokers Adjusted model, meansbp: ?2= 0.02 ?? = ??2= 1.02 Odds increased by 1.7% for each unit increase in mean systolic blood pressure Adjusted model, cholesterol: ?3= 0.01 ?? = ??3= 1.01 Odds increased by 0.8% for each unit increase in serum cholesterol Adjusted model, age: ?4= 0.07 ?? = ??4= 1.07 Odds increased by 6.9% for each unit increase in age 12/03/2020 Lecture 7 19
References Aalen OO, Frigessi A, Moger TA, Scheel I, Skovlund E, Veier d MB. Statistiske metoder i medisin og helsefag. Oslo: Gyldendal akademisk; 2006. Laake P, Hjart ker A, Thelle DS, Veier d MB. Epidemiologiske og kliniske forskningsmetoder. Oslo: Gyldendal akademisk; 2007. https://www.med.uio.no/imb/forskning/publikasjoner/boker/2007/e pidemiolgiske-kliniske-forskningsmetoder.html. 12/03/2020 Lecture 7 20